How Tech-Forward Organizations Build Custom AI Platforms: A Feature Breakdown
In my previous article, "Why Digital-First Companies Are Building Their Own AI Platforms", I explored why many tech-forward companies are opting to build their own AI platforms rather than relying on off-the-shelf solutions. The piece sparked considerable interest, with readers eager to explore the specifics of these custom platforms. A common reaction was: now that you've convinced us that many companies have good reasons for building custom AI platforms, what features did these companies prioritize?
To answer this question, I returned to the same sources I mined for that article and compiled a list of the key features these companies focused on. These categories encompass core infrastructure, data management, model development and lifecycle, model serving and deployment, evaluation and monitoring, and multi-modal capabilities. Within these categories, we see a focus on scalability, flexibility, and efficiency in handling massive datasets and complex workloads. There's also a strong emphasis on end-to-end lifecycle management, from data processing to model deployment and monitoring.
In the following sections, I'll explore in detail a few of these features and items, shedding light on why they're crucial for companies building custom AI platforms and how they contribute to the competitive edge these organizations seek to gain in the AI race.
![](https://substackcdn.com/image/fetch/w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5501c7ca-6eff-4d99-8e42-eeae10051af1_5118x3248.png)
Evaluation, Monitoring, and Optimization
I'm reminded of a recent remark by Andrew Ng that perfectly encapsulates the current state of affairs in generative AI and LLM applications. He highlighted the contrast between the simplicity of creating AI applications and the complexity of evaluating their performance. This disparity poses a significant challenge for AI teams, as the time-intensive nature of thorough assessments often overshadows the relatively quick development process. Consequently, teams find themselves cutting corners on comprehensive evaluations, potentially compromising the quality and reliability of their AI solutions.
To address this gap, leading teams are crafting tailored monitoring systems for their custom AI platforms by developing custom evaluation and observability tools, drawing on insights from recent solutions. For instance, Phoenix is an open-source AI observability platform that enables tracing of LLM applications, LLM-powered performance evaluation, dataset management, and experimentation tracking, supporting various frameworks and LLM providers. With its simple-to-use interface, DeepEval enables developers to optimize RAG pipeline hyperparameters, prevent prompt drifting, and confidently transition between different LLM models. By enabling step-by-step tracking and debugging, Opik allows developers to analyze each component of their LLM applications, even in complex multi-agent setups.
With TruLens , developers can objectively measure the quality and effectiveness of their LLM-based applications using customizable feedback functions for inputs, outputs, and intermediate results. By combining synthetic data generation with fine-tuned classifiers, ARES efficiently assesses RAG models while minimizing the need for extensive human annotations. These examples highlight the growing ecosystem of tools empowering AI teams to navigate the complexities of evaluation and deliver more robust and reliable AI solutions.
AI teams are finding that accuracy is only one part of the equation. As we’ve discussed in a previous article, the real challenge lies in managing the broader array of risks that these systems bring—ranging from bias to privacy concerns to compliance with evolving regulations. Effective evaluation is not just about measuring performance; it's about proactively identifying and mitigating these potential harms. While promising, the tools mentioned above must fit into a larger risk management strategy.
Model Development and Lifecycle
As AI initiatives expand, companies face an increasing need to manage vast amounts of data, metrics, and experiments efficiently. Companies like Spotify and Uber have tackled this by embedding experiment tracking tools directly into their platforms, allowing their teams to track and visualize experiments in real-time. Essential components of this process include version control, hyperparameter tracking, and metric analysis, all of which contribute to reproducibility and team collaboration. Without these capabilities, firms risk squandering resources and missing crucial insights.
The need for scalable experiment trackers has become particularly acute with the rise of frontier models. The advent of LLMs has pushed the boundaries of what traditional tracking tools can handle. Teams working on frontier models are now grappling with the challenge of monitoring tens of thousands of unique metrics over training runs that can last months. This scale of operation demands tools that can not only ingest and store vast amounts of data but also provide real-time visualizations to detect anomalies and system errors promptly.
Specialized solutions, such as Neptune.ai, are emerging to address the unique demands of managing these large-scale AI experiments. Neptune’s ability to handle the immense scale of modern AI experiments, support long-running jobs, and provide advanced visualization capabilities addresses the pain points many organizations face. Even for teams not training frontier models, the lessons learned from these extreme use cases are proving valuable. As fine-tuning becomes more ambitious and specialized foundation models more common, the need for robust & scalable experiment tracking will only grow.
Core Infrastructure
At the heart of any custom AI platform is the core infrastructure, which lays the groundwork for everything from model training to deployment & monitoring. For companies looking to scale AI capabilities, flexible and efficient distributed computing has become indispensable. This is where the open-source distributed computing framework Ray shines. Serving as the cornerstone for many custom AI platforms, Ray's versatility and scalability have made it an indispensable tool for organizations seeking to develop bespoke AI applications aligned with their specific requirements.
Pinterest leverages Ray's distributed computing capabilities to handle petabyte-scale datasets for their recommender systems. Similarly, Netflix has integrated Ray into its production platform to support various generative AI tasks, including large-scale data processing and model training. Airbnb integrates Ray into its infrastructure to fine-tune LLMs, aligning them with customer preferences through advanced techniques like Supervised Fine Tuning (SFT) and Direct Preference Optimization (DPO). Apple utilizes Ray to enhance its GPU resource management capabilities, improving the efficiency of its machine learning operations. By incorporating Ray into their AI ecosystem, Canva has achieved notable enhancements in model scaling capabilities, leading to measurable benefits in terms of time and resources.
![](https://substackcdn.com/image/fetch/w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff819cc27-9484-4476-af12-75e517b531de_1872x1068.png)
Ray's capabilities extend far beyond basic model training and deployment, as demonstrated by its successful implementation in various demanding AI projects. At Zoox, Ray facilitates the training and deployment of machine learning models that manage live sensor data, demonstrating its value in real-time, high-priority applications. ByteDance relies on Ray to optimize multimodal data processing pipelines, particularly for managing the massive video datasets that underpin their generative AI models. Shopify utilizes Ray's capabilities to customize and roll out sophisticated visual-linguistic AI models, supporting wide-ranging generative AI functionalities.
What makes Ray truly powerful is its flexibility. For instance, Roblox has integrated Ray to efficiently handle large-scale batch inference jobs for deep learning models and LLMs across their platform, significantly reducing costs and enhancing throughput for their machine learning operations. Reddit has leveraged Ray and KubeRay as core technologies within its internal ML Platform to scale ML training and serving workloads, achieving a 6x reduction in model training time and improving developer velocity. Uber incorporated Ray to parallelize different operations in their marketplace optimization processes, achieving up to 40x speed improvements for some parts of the optimization process and allowing more frequent iterations.
To better understand how companies are developing custom AI platforms, I’m looking forward to hearing firsthand from organizations of all sizes at Ray Summit 2024. From established tech giants to innovative startups, the summit promises a diverse range of perspectives on the evolving landscape of custom AI platforms. Register with code AnyscaleBen15 to save 15% on this can't-miss AI event.
Data Exchange Podcast
Unlocking the Power of LLMs with Data Prep Kit. IBM Research's Petros Zerfos and Hima Patel discuss Data Prep Kit, an open-source toolkit for processing unstructured data at scale for LLM applications. They explore its capabilities in handling various data types and its scalable, cloud-native architecture.
Monthly Roundup: AI Regulations, GenAI for Analysts, Inference Services, and Military Applications. This monthly discussion explores AI's growing role in various sectors, from revolutionizing healthcare processes to shaping international conflicts, while also examining recent technological advancements and industry trends, and CA SB 1047.
If you enjoyed this newsletter please support our work by encouraging your friends and colleagues to subscribe:
Ben Lorica edits the Gradient Flow newsletter. He helps organize the AI Conference, the NLP Summit, Ray Summit, and the Data+AI Summit. He is the host of the Data Exchange podcast. You can follow him on Linkedin, Twitter, Reddit, Mastodon, or TikTok. This newsletter is produced by Gradient Flow.
Ha
Ha