Gradient Flow #39: Becoming TikTok, Next-gen Workflow Orchestration and Forecasting
This edition has 450 words which will take you about 2 minutes to read.
“There's always a way if you're not in a hurry.” - Paul Theroux
Data Exchange podcast
Towards a next-generation dataflow orchestration and automation system Chris White is the CTO of Prefect, a startup building tools to help companies build, monitor, and manage dataflows. Prefect originated from lessons Chris and his co-founder learned while they were at Capital One, where they were early users and contributors to related projects like Apache Airflow.
Building a flexible, intuitive, and fast forecasting library Reza Hosseini and Albert Chen of Linkedin, are part of the team behind one my favorite new open source tools: Greykite, a flexible and fast library for time-series forecasting.
[Image: Books in Byeol-Madang Library, at the Starfield COEX Shopping Mall in Seoul from Wikimedia.]
Data & Machine Learning Tools and Infrastructure
The Road to Intelligent Process Automation We examine the state of process automation technologies in the Fortune 1000 and in key technology hubs in the US.
BytePlus According to the FT, this new division of ByteDance is selling the technology that powers its viral video app TikTok to websites and apps outside China. The company has several SaaS offerings including recommendation models and tools for testing new data products and services. Given the rather frosty relationship between China and the West, BytePlus faces an uphill battle in Europe, the Five Eyes, and their allies.
EdgeQL A new, strictly typed query language that aims to surpass SQL for graph applications (the parent project EdgeDB stores and describes data as strongly typed objects and relationships between them). It is functional in nature and designed to be composable and easy to learn.
IBM open sources CodeFlare Built on top of Ray, CodeFlare simplifies the integration and scaling of analytic and machine learning workflows in hybrid clouds.
The Geography of Open Source Software A team of economists measure open source software contribution from 2010-2020 at a national, regional, and local level using data from GitHub and adjacent platforms. The overall share of active developers has become more evenly distributed between countries, but in a nod to the importance of technology hubs, within-country regional differences persist. They hope to include GitLab and Bitbucket in future versions.
2021 Data Engineering Survey
Tell us which data tools you are most likely to adopt in the next 12-24 months—and what criteria your DataOps team uses to evaluate them. The survey takes about 5 minutes to fill out and we'll share the report of the survey findings with you. You'll also be entered in a drawing for free copies of Jesse Anderson’s Data Teams book and other prizes.
Recommendations
Goomics Fabulous series of comics about life at Google (2010-2021) from former Google engineer Manu Cornet (book version).
The Document-based Meeting Culture of Amazon Depending on the length of the document for the meeting, attendees start by reading anywhere from ten minutes to half an hour.
Write a time-series database engine from scratch A recent tutorial highlights why time-series are ideal for learning about storage engines.
Bessemer’s Data Infrastructure roadmap This interesting guide to hot areas in data engineering, showcases some of Bessemer’s portfolio companies.
How to build a data team A short story describing the path to transforming an organization to be truly data-native.
Closing Short: When the media is a few steps ahead of the story.
If you enjoyed this newsletter please support our work by encouraging your friends and colleagues to subscribe:
Ben Lorica edits the Gradient Flow newsletter. He is co-chair of the Ray Summit, external chair of the NLP Summit, and host of the Data Exchange podcast. You can follow him on Twitter @BigData. This newsletter is produced by Gradient Flow.