Gradient Flow #34: Modernizing Data Governance, DataOps for ML, Declarative Interfaces
This edition has 510 words which will take you about 3 minutes to read.
“If something cannot go on forever it will stop.” - Herbert Stein
Data Exchange podcast
Injecting Software Engineering Practices and Rigor into Data Governance As the amount and importance of data grows within organizations, there is growing interest in tools that enable them to strategically utilize, manage, and unlock their data resources. I speak with Steve Touw, cofounder and CTO of Immuta, a startup at the forefront of data governance, data discovery, data privacy and security.
AI Beyond Automation Jenn Webb and I sit down with Jerry Overton, who up until recently served as a DXC Fellow, Head of AI at DXC Technology. One of the things we discussed was his leadership role in helping establish a Center of Excellence for AI within DXC.
Data & Machine Learning Tools and Infrastructure
Why you should build your AI Applications with Ray Ion Stoica and I explain why Ray is the ideal platform for building a diverse set of compute-intensive applications.
Data Validation for Machine Learning Models and Applications A special edition of IEEE’s Data Engineering Bulletin focused on data quality and data validation in the context of MLOps and Responsible AI.
TabNet Deep learning has taken over computer vision (images, video), speech technologies, and most recently natural language models. But many companies continue to need models for structured data, and for tabular data, decision trees and XGBoost still reign supreme. I’ve been playing with TabNet and I suspect that once such models can be made available to non-experts - through a declarative interface - deep learning will begin capturing its share of models for structured data in the future.
The NLP Index This new site houses 3,000+ [code repositories + papers], organized in a 2-level taxonomy that captures the most important topics in natural language technologies. The breadth of tools and techniques to choose from bodes well for companies who are developing tools that can vastly simplify things for developers. Think AutoNLP solutions or even declarative interfaces along the lines of Ludwig for deep learning.
[Image: Japan, by SGL]
Graph Deep Learning Slides from a recent talk by Simone Scardapane.
Geopolitical Alpha My first job after academia was as lead quant at a hedge fund and ever since I’ve been an avid reader of books about the industry. My favorite topic to read about (and my favorite hedge fund style) is global macro, which can be broadly described as trades that profit from political or economic events. With that said, you need not be a finance junkie to benefit from this book. The author introduces a broadly applicable and compelling “forecasting framework” that non-traders would benefit from.
New AI regulations are coming … Are you ready? Brush up on three key trends that unite current and proposed AI regulations.
Featured Virtual Conference
I helped put together the outstanding program for the upcoming Data+AI Summit, a FREE virtual conference with over a hundred sessions on data infrastructure, analytics, data science and machine learning. Among the keynote speakers is 2014 Nobel Laureate, Malala Yousafzai. This event takes place May 24-28:
Closing short: A taxonomy discovered through Charles Martin (on Linkedin).
If you enjoyed this newsletter please support our work by encouraging your friends and colleagues to subscribe:
Ben Lorica edits the Gradient Flow newsletter. He is co-chair of the Ray Summit, external chair of the NLP Summit, and host of the Data Exchange podcast. You can follow him on Twitter @BigData. This newsletter is produced by Gradient Flow.