Gradient Flow #45: Top Places to Work for Data Scientists; Model Serving; Tuning Language Models

Subscribe • Previous Issues

“There's no sense in being precise when you don't even know what you're talking about.” - John von Neumann

Data Exchange podcast

  • Deploying Machine Learning Models Safely and Systematically  Hamel Husain is a Staff Machine Learning Engineer at GitHub and a core developer for fastai. 

  • Machine Learning in Astronomy and Physics    Dr. Viviana Acquaviva, Associate Professor at the CUNY Graduate Center, is an Astrophysicist with a strong interest in Data Science and Machine Learning.

  • Large-scale machine learning and AI on multi-modal data    Bob Friday is VP and CTO at Mist Systems a Juniper Company.  His team uses data, machine learning, and AI to “optimize user experiences and simplify operations across the wireless access, wired access, and SD-WAN domains”. They’ve deployed deep learning models for anomaly detection, and virtual assistants that provide insight and guidance to IT staff via a conversational interface.

[Photo by Paul Skorupskas on Unsplash.]

Data & Machine Learning Tools and Infrastructure

  • Immediate 3X serving speed up with Ray Serve   Ray Serve is quietly becoming one of the more popular open source libraries for model serving. Learn how Wildlife Studios - one of the largest mobile gaming companies in the world - successfully deployed Ray Serve to deliver in-game offers. 

  • cleanlab: machine learning with noisy labels   An open source library for confident learning, an approach that involves pruning noisy data (as opposed to fixing label errors), and ranking examples to train with confidence.

  • Designing data ingestion pipelines   ML practitioners understand that scaling data ingestion pipelines is crucial and inefficiencies at this stage can really cripple training throughput. Through the lens of deep learning for recommendation systems, a team from Facebook and Stanford present an architecture for end-to-end training data ingestion. 

  • Zingg    We live in an age where companies have data in disparate systems. In this context, scalable entity resolution and master data management systems bring tremendous benefits to downstream analytic and machine learning applications. Zingg is a new open source library for large-scale entity resolution. It’s built on top of Apache Spark.

Recommendations


If you enjoyed this newsletter please support our work by encouraging your friends and colleagues to subscribe:


Ben Lorica edits the Gradient Flow newsletter. He is co-chair of the Ray Summit, external chair of the NLP Summit, and host of the Data Exchange podcast. You can follow him on Twitter @BigData. This newsletter is produced by Gradient Flow.

Loading more posts…