Gradient Flow #33: DataOps, Natural Language Benchmarks, Multimodal ML

Subscribe • Previous Issues

This edition has 548 words which will take you about 3 minutes to read.

“While you are looking, you might as well also listen, linger and think about what you see.”  - Jane Jacobs

Data Exchange podcast

  • How Technology Companies Are Using Ray  Zhe Zhang is an Engineering Manager at Anyscale where he leads the team that works on the Ray and its ecosystem of libraries and partners. We discussed the Ray ecosystem and large-scale use cases at Ant Group, Uber, Amazon, and more.

  • Building a data store for unstructured data and deep learning applications  The main bottleneck at most companies remains data and fortunately there are many new startups in data infrastructure. I speak with Davit Buniatyan, founder and CEO of ActiveLoop, a startup building data management tools for unstructured data types commonly associated with deep learning.

[Image by Pashminu Mansukhani from Pixabay]

Data & Machine Learning Tools and Infrastructure


Funding Updates

[Image: Paris Street Art, by Ben Lorica]

Recommendations

  • Multimodal Machine Learning   Lectures from a very popular Carnegie Mellon course, on building models that utilize information and generate signals from multiple modalities (vision, speech, language, etc.).

  • Machine Learning with Graphs   Videos from an ongoing Stanford course taught by Jure Lescovec. Graphs are used to describe entities with relations or interactions and in many cases they can provide more accurate representations of your data. It stands to reason that if one can leverage the relational structure inherent in graphs,  this will translate to more accurate machine learning models. To quote Jure: “Graphs are the new frontier of deep learning.”

  • Data for Better Lives   This new World Bank report calls “for a new social contract that enables the use and reuse of data to create economic and social value, ensures equitable access to that value, and fosters trust that data will not be misused in harmful ways”.

  • Strategic Prediction: Transparency and Accuracy in Predictive Decision Making “When a measure becomes a target it ceases to be a good measure.”  A recent ACM tutorial that describes tools for machine learning developers who increasingly need to address a phenomenon familiar to economists (Goodhart’s Law) and social scientists (Campbell’s Law). 

  • Technology Radar (Thoughtworks Advisory Board)   Recommendations include tools and technologies that cut across many areas of interest to developers, managers, and CTOs.


Closing Short: This well executed video essay arrives at a time when we are close to being able to travel again!


If you enjoyed this newsletter please support our work by encouraging your friends and colleagues to subscribe:


Ben Lorica edits the Gradient Flow newsletter. He is co-chair of the Ray Summit, external chair of the NLP Summit, and host of the Data Exchange podcast. You can follow him on Twitter @BigData. This newsletter is produced by Gradient Flow.