Gradient Flow #41: What’s New in Data Engineering; MLOps Anti-Patterns
“Time matters most when decisions are irreversible.” - Peter Bernstein
Data Exchange podcast
Changes to the data science role and to data science tools Sean Taylor is a Data Science Manager at Lyft, and was previously a research scientist and manager at Facebook. While at Facebook he was instrumental in the creation and release of Prophet, a very popular open source library for time-series forecasting.
What’s new in data engineering Jenn Webb hosts a mid year panel with Jesse Anderson and me.
[Image: Langkawi Sky Bridge from pxfuel.]
Data & Machine Learning Tools and Infrastructure
An Enterprise Software Roadmap for Sky Computing Assaf Araki and I explain how the enterprise software market will evolve alongside a more commoditized version of cloud computing.
Facebook’s Blender Bot 2.0 A new, open source chatbot that builds long-term memory and adds to its knowledge by searching the internet. Facebook has a related open source project - ParlAI - an open-source software platform for dialog research implemented in Python.
gorse, an open source recommendation system that incorporates AutoML and horizontal scaling.
Using Anti-patterns to avoid MLOps Mistakes A taxonomy of recurring anti-patterns (defective practices and methodologies) that surfaced while deploying machine learning at BNY Mellon.
[Image: The Dahlia Garden by BL]
Recommendations
Growing open-source: from Torch to PyTorch Soumith Chintala explains how PyTorch's focus on usability allowed them to grow their user base very quickly.
How to retract an open dataset Open datasets are important for the advancement of machine learning research. In this comprehensive survey and study, researchers from Princeton provide stewardship guidelines that go beyond data creation but span the lifecycle of a dataset.
The analytical application stack In the very early days of "data science" here in the SF Bay Area, the first "data scientists" in places like Linkedin, Twitter, etc. were very much focused on building data products. In recent years the emphasis has been on internal data tooling and internal analytics (rightfully so). This is a welcome update on opportunities and gaps in the suite of tools for building data applications.
Closing Short: #Mesmerizing
If you enjoyed this newsletter please support our work by encouraging your friends and colleagues to subscribe:
Ben Lorica edits the Gradient Flow newsletter. He is co-chair of the Ray Summit, external chair of the NLP Summit, and host of the Data Exchange podcast. You can follow him on Twitter @BigData. This newsletter is produced by Gradient Flow.