Gradient Flow #28: Metadata, Speech Synthesis + NLU, Data Science Tools
This edition has 446 words which will take you about 3 minutes to read.
“The single biggest problem in communication is the illusion that it has taken place.” - George Bernard Shaw
Data Exchange podcast
Tools for building robust, state-of-the-art machine learning models Mike Mahoney is affiliated with several groups within UC Berkeley: RISELab, the Dept. of Statistics, and ICSI. In this conversation we discuss his recent research projects, including work that led to one of the Best Papers awards at NeurIPS 2020.
Creating Master Data at Scale with AI Sonal Goyal, founder of Aficx, is using machine learning to build tools for “data mastering”, an extremely important aspect of data preparation.
[Image: HapagLloyd12-Detailby Dean Wampler, used with permission.]
Machine Learning tools and infrastructure
Speech synthesis and TTS will be central to the next wave of innovative voice applications While much of the recent media coverage and attention-grabbing applications have focused on automatic speech recognition, Yishay Carmiel and I explain why you should be paying close attention to technologies for the artificial creation of human speech.
The Growing Importance of Metadata Management Systems Assaf Araki and I describe why Metadata will be the foundation for data catalogs, data lineage, data governance solutions, and many other enterprise data applications.
2020 Kaggle Machine Learning & Data Science Survey Based on 2,600+ respondents who are currently employed as data scientists. Top two development environments are Jupyter Lab and Visual Studio Code, most popular libraries are scikit-learn, TensorFlow, Keras, XGBoost, and PyTorch.
spaCy 3.0 The democratization of state-of-the-art natural language technologies continues with the release of the the latest version of one of the most widely used libraries in the space.
Apache Arrow 3.0 Arrow has quietly become one of the most important open source projects in data. For more on Arrow, check out my 2020 conversation with Wes McKinney.
Featured Virtual Events
AI Week in Tel Aviv is free and virtual. This year’s lineup includes Fei-Fei Li, Manuela Veloso, Regina Barzilay and many other outstanding researchers.
Call for Speakers for Ray Summit closes Feb 24th Join us and speak before the community of developers, machine learning practitioners, data scientists, engineers and architects interested in building scalable data and AI applications.
Funding Updates
Iteratively raises $5.4 million in seed funding Data quality and data pipelines are super important to all organizations serious about data, no wonder there are several startups rushing to build solutions.
[Image: Lau Pa Sat Hawker Center in Singapore by Lily Banse on Unsplash.]
Recommendations
We See It All: Liberty and Justice in an Age of Perpetual Surveillance
Dataset evaluation and Responsible AI Speaking of surveillance technology, a new survey paper examines over a hundred datasets used by computer vision researchers. We need data reporting standards if we want to improve transparency and accountability in technologies that are gaining widespread adoption.
Three Steps to Fight Online Disinformation and Extremism Enhancing “digital literacy” or “cyber citizenship” skills are key to building more resilience into our social media platforms.
Closing Short: One of the best things about Singapore is the Street Food, specifically the hawker centers. This short documentary paints a grim future: COVID-19 and an aging workforce pose serious challenges to the future of this unique food culture.
If you enjoyed this newsletter please support our work by encouraging your friends and colleagues to subscribe:
Ben Lorica edits the Gradient Flow newsletter. He is co-chair of the Ray Summit, chair of the NLP Summit, and host of the Data Exchange podcast. You can follow him on Twitter @BigData. This newsletter is produced by Gradient Flow.