Gradient Flow #36: Model Monitoring, Hydrofoils, Data Portability

Subscribe • Previous Issues

This edition has 372 words which will take you about 2 minutes to read.

“Preferences are optional and subject to constraints, whereas constraints are neither optional nor subject to preferences.” - Marko Papic

Data Exchange podcast


Upcoming Free Virtual Event

As the external co-chair for the Ray Summit, I’m excited about the outstanding program we’ve put together for developers, machine learning practitioners, data scientists, DevOps professionals, and architects. See you online in a few weeks!

Register Now


Data & Machine Learning Tools and Infrastructure

  • Model Monitoring Enables Robust Machine Learning Applications    Paco Nathan and I detail key challenges in monitoring ML models, and we outlined key components of a model monitoring platform.  This is a very active area with many startups rolling out new offerings. We believe that companies will gravitate towards holistic MLOps platforms that include model monitoring, as opposed to stitching together disparate components.

  • Introducing Delta Live Tables   Through a combination of declarative pipeline development, improved data reliability and cloud-scale production operations, DLT makes the ETL lifecycle easier.  Data engineers will be able to leverage existing data pipelines by building production ETL pipelines while writing only SQL queries.

  • Greykite: Linkedin’s new open source library for time series forecasting I’ve been experimenting with Greykite (paper, code) and I love its speed and flexibility. This is a relatively new release and the documentation can be somewhat overwhelming, but if you invest time learning it I believe you’ll end up using this library in production. At the very least you should add it to your toolbox alongside more mature options like Prophet.

  • A gentle introduction to knowledge graphs, with sample use cases from search, data integration, and AI.

  • immudb → Blockchain Concepts  ∪ SQL    A new TimeTravel feature allows you to run queries across your data’s change history.

  • Ray Clusters provide users with a serverless experience  Ray Clusters can automatically scale up and down based on an application’s resource demands while maximizing utilization and minimizing costs.


Funding Updates

[Image: Berlin from pxhere.]

Recommendations


Closing Short:


If you enjoyed this newsletter please support our work by encouraging your friends and colleagues to subscribe:


Ben Lorica edits the Gradient Flow newsletter. He is co-chair of the Ray Summit, external chair of the NLP Summit, and host of the Data Exchange podcast. You can follow him on Twitter @BigData. This newsletter is produced by Gradient Flow.

Gradient Flow #35: Optimizing Inference, Workflow Tools, RL in Large Enterprises

Subscribe • Previous Issues

This edition has 415 words which will take you about 2 minutes to read.

“The thing about machine learning scientists is that they never admit defeat because all of their problems can be solved with more data.” - William Tunstall-Pedoe

Data Exchange podcast

  • Why You Should Optimize Your Deep Learning Inference Platform   As companies deploy deep learning to critical products and services, the number of predictions that models have to render can easily reach millions per day (even hundreds of trillions, in the case of Facebook). I speak with Yonatan Geifman, CEO of Deci, as well as with Ran El-Yaniv, Chief Scientist of Deci and Professor of Computer Science at Technion. We discuss new tools for systematically optimizing inference platforms.

  • The Future of Machine Learning Lies in Better Abstractions   Travis Addair previously led the team at Uber that was responsible for building Uber’s deep learning infrastructure. Travis is deeply involved with two popular open source projects related to deep learning: he is maintainer of Horovod, a distributed deep learning training framework, and he is a co-maintainer of Ludwig, a toolbox that allows users to train and test deep learning models without the need to write code.

[Image from pxhere]

Data & Machine Learning Tools and Infrastructure

[Image: Shoreditch 2015, by SGL]

Recommendations


Closing Short → The return of in-person events: “Convenience and time savings were key factors for using video for events in our survey, but respondents in every country were adamant on their preference that events like concerts and religious services be in-person going forward. Virtual options were welcomed for those who needed a distraction or when in-person was not available.”


If you enjoyed this newsletter please support our work by encouraging your friends and colleagues to subscribe:


Ben Lorica edits the Gradient Flow newsletter. He is co-chair of the Ray Summit, external chair of the NLP Summit, and host of the Data Exchange podcast. You can follow him on Twitter @BigData. This newsletter is produced by Gradient Flow.

Gradient Flow #34: Modernizing Data Governance, DataOps for ML, Declarative Interfaces

Subscribe • Previous Issues

This edition has 510 words which will take you about 3 minutes to read.

“If something cannot go on forever it will stop.” - Herbert Stein

Data Exchange podcast

  • Injecting Software Engineering Practices and Rigor into Data Governance  As the amount and importance of data grows within organizations, there is growing interest in tools that enable them to strategically utilize, manage, and unlock their data resources. I speak with Steve Touw, cofounder and CTO of Immuta,  a startup at the forefront of data governance, data discovery, data privacy and security.

  • AI Beyond Automation   Jenn Webb and I sit down with Jerry Overton, who up until recently served as a DXC Fellow, Head of AI at DXC Technology. One of the things we discussed was his leadership role in helping establish a Center of Excellence for AI within DXC.

Data & Machine Learning Tools and Infrastructure

[Image: Japan, by SGL]

Recommendations

  • Graph Deep Learning    Slides from a recent talk by Simone Scardapane.

  • Unpacking Tiger Global’s Venture Capital playbook

  • Geopolitical Alpha   My first job after academia was as lead quant at a hedge fund and ever since I’ve been an avid reader of books about the industry. My favorite topic to read about (and my favorite hedge fund style) is global macro, which can be broadly described as trades that profit from political or economic events. With that said, you need not be a finance junkie to benefit from this book. The author introduces a broadly applicable and compelling “forecasting framework” that non-traders would benefit from.

  • New AI regulations are coming … Are you ready?    Brush up on three key trends that unite current and proposed AI regulations.


Featured Virtual Conference

I helped put together the outstanding program for the upcoming Data+AI Summit, a FREE virtual conference with over a hundred sessions on data infrastructure, analytics, data science and machine learning. Among the keynote speakers is 2014 Nobel Laureate, Malala Yousafzai. This event takes place May 24-28:

Register Now


Funding Updates


Closing short: A taxonomy discovered through Charles Martin (on Linkedin).


If you enjoyed this newsletter please support our work by encouraging your friends and colleagues to subscribe:


Ben Lorica edits the Gradient Flow newsletter. He is co-chair of the Ray Summit, external chair of the NLP Summit, and host of the Data Exchange podcast. You can follow him on Twitter @BigData. This newsletter is produced by Gradient Flow.

Gradient Flow #33: DataOps, Natural Language Benchmarks, Multimodal ML

Subscribe • Previous Issues

This edition has 548 words which will take you about 3 minutes to read.

“While you are looking, you might as well also listen, linger and think about what you see.”  - Jane Jacobs

Data Exchange podcast

  • How Technology Companies Are Using Ray  Zhe Zhang is an Engineering Manager at Anyscale where he leads the team that works on the Ray and its ecosystem of libraries and partners. We discussed the Ray ecosystem and large-scale use cases at Ant Group, Uber, Amazon, and more.

  • Building a data store for unstructured data and deep learning applications  The main bottleneck at most companies remains data and fortunately there are many new startups in data infrastructure. I speak with Davit Buniatyan, founder and CEO of ActiveLoop, a startup building data management tools for unstructured data types commonly associated with deep learning.

[Image by Pashminu Mansukhani from Pixabay]

Data & Machine Learning Tools and Infrastructure


Funding Updates

[Image: Paris Street Art, by Ben Lorica]

Recommendations

  • Multimodal Machine Learning   Lectures from a very popular Carnegie Mellon course, on building models that utilize information and generate signals from multiple modalities (vision, speech, language, etc.).

  • Machine Learning with Graphs   Videos from an ongoing Stanford course taught by Jure Lescovec. Graphs are used to describe entities with relations or interactions and in many cases they can provide more accurate representations of your data. It stands to reason that if one can leverage the relational structure inherent in graphs,  this will translate to more accurate machine learning models. To quote Jure: “Graphs are the new frontier of deep learning.”

  • Data for Better Lives   This new World Bank report calls “for a new social contract that enables the use and reuse of data to create economic and social value, ensures equitable access to that value, and fosters trust that data will not be misused in harmful ways”.

  • Strategic Prediction: Transparency and Accuracy in Predictive Decision Making “When a measure becomes a target it ceases to be a good measure.”  A recent ACM tutorial that describes tools for machine learning developers who increasingly need to address a phenomenon familiar to economists (Goodhart’s Law) and social scientists (Campbell’s Law). 

  • Technology Radar (Thoughtworks Advisory Board)   Recommendations include tools and technologies that cut across many areas of interest to developers, managers, and CTOs.


Closing Short: This well executed video essay arrives at a time when we are close to being able to travel again!


If you enjoyed this newsletter please support our work by encouraging your friends and colleagues to subscribe:


Ben Lorica edits the Gradient Flow newsletter. He is co-chair of the Ray Summit, external chair of the NLP Summit, and host of the Data Exchange podcast. You can follow him on Twitter @BigData. This newsletter is produced by Gradient Flow.

Gradient Flow #32: Data Cascades, Demand for Data Engineers, Exploiting ML models

Subscribe • Previous Issues

This edition has 428 words which will take you about 2 minutes to read.

“I would believe only in a god who could dance.” - Friedrich Nietzsche.

Data Exchange podcast

  • Machine Learning in Healthcare  I speak with Parisa Rashidi, Associate Professor at the Department of Biomedical Engineering and Director of the Intelligent Health Lab at the University of Florida. 

  • Data quality is the key to great AI products and services  Abe Gong is the CEO and co-founder at Superconductive,  a startup founded by the team behind the Great Expectations (GE) open source project. GE is one of a growing number of tools aimed at improving data quality through tools for validation and testing.


Featured Virtual Event

I am co-chair of Ray Summit, a FREE conference that brings together developers, engineers, data scientists, and architects interested in machine learning, AI, and other compute-intensive applications. The Ray community and ecosystem have significantly expanded since last year’s conference and we have another outstanding series of keynotes, talks, and tutorials for you.

Register Now


Data & Machine Learning tools and infrastructure

[Image: Flatiron Building in Shanghai, by SGL]

Funding Updates


Recommendations


Closing Short:  If you enjoy Japanese food or Sushi, you’ll enjoy this illuminating documentary from NHK.


If you enjoyed this newsletter please support our work by encouraging your friends and colleagues to subscribe:


Ben Lorica edits the Gradient Flow newsletter. He is co-chair of the Ray Summit, external chair of the NLP Summit, and host of the Data Exchange podcast. You can follow him on Twitter @BigData. This newsletter is produced by Gradient Flow.

Loading more posts…