Data For AI
The Data Integration Market
As much as I like talking and writing about machine learning and AI, the truth is that there are probably more impressive startups in the data engineering and data infrastructure (DE) category. DE companies address fundamentals that need to be in place before companies can rely on reports and metrics. Any organization wishing to scale their use of AI and machine learning also needs DE tools. In fact almost all the tools in the buzzy category of MLOps assume that users already have their DE act together.
To understand the data integration landscape I draw on the following sources: job postings, Linkedin profiles, and startup databases. This helps us gain a deeper understanding of the demand and supply sides of the data integration market, as well as the startups providing the next-generation solutions.
Data Exchange podcast
An open source and end-to-end library for causal inference: Amit Sharma (Senior Researcher) and Emre Kiciman (Senior Principal Researcher) of Microsoft Research, are part of the team behind DoWhy, a new library for estimating causal effects based on historical data alone. I like what the DoWhy crew have built and I’m looking forward to using it to explore applications of causal inference and causal learning in future projects.
AI Risk Management Framework: I discuss the new AI Risk Management Framework from the National Institute of Standards and Technology (NIST) with Elham Tabassi (of NIST) and Andrew Burt (Managing Partner of BNH.ai). In the cybersecurity realm, a host of businesses and cybersecurity leaders have adopted the NIST Cybersecurity Framework and many consider it to be the gold-standard in that field. Consequently, I believe that this new NIST initiative will have a significant impact on how we manage AI risks in the future.
Data Science at Shopify: Wendy Foster, Director of Engineering & Data Science, explains in detail how they use data science and machine learning for search and recommendations.
Free Report: AI in Healthcare Survey Results
AI applications in healthcare present a number of challenges and considerations, and many of these same considerations and lessons also apply to other sectors. Topping the list of priorities for 2022: Data Integration and Language Models.
State-Of-The-Art AI Systems Are Trained With Extra Data
The 2022 AI Index Report recently came out - one of my favorite annual reads. This year's index highlights the need for additional training data in order to achieve state-of-the-art results across multiple technical benchmarks. In a short post, I discuss my favorite bits - including tools for detoxifying large language models - and I throw in bonus charts on the global talent pool for reinforcement learning and computer vision.
If you enjoyed this newsletter please support our work by encouraging your friends and colleagues to subscribe:
Ben Lorica edits the Gradient Flow newsletter. He helps organize the Ray Summit, the NLP Summit, and the Data+AI Summit. He is the host of the Data Exchange podcast. You can follow him on Twitter @BigData. This newsletter is produced by Gradient Flow.