Baseten Takes on Hyperscalers with New AI Training Platform
Baseten, the AI infrastructure company, is making a significant move into the AI training space with the launch of Baseten Training. This platform aims to help companies fine-tune open-source AI models without the operational challenges typically associated with managing GPU clusters and cloud capacity.
The Problem with Existing Solutions
Many companies are looking to reduce their reliance on expensive API calls to services like OpenAI. However, the path from an off-the-shelf open-source model to a production-ready custom AI solution is often complex. It requires specialized expertise in machine learning operations and infrastructure management.
Baseten’s Solution
Baseten provides the infrastructure while allowing companies to retain full control over their training code, data, and model weights. This approach stems from lessons learned from a previous product, Blueprints, which attempted to create a more abstracted experience. That approach failed because users lacked the intuition to make the right choices about base models, data quality, or hyperparameters.
Key Features of Baseten Training
Baseten Training offers several key capabilities, including multi-node training support across clusters of NVIDIA H100 or B200 GPUs, automated checkpointing to protect against node failures, and sub-minute job scheduling. The platform also integrates with Baseten’s Multi-Cloud Management (MCM) system, which allows for dynamic provisioning of GPU capacity across multiple cloud providers, potentially leading to cost savings.
Multi-Cloud GPU Orchestration
Baseten’s MCM system allows for dynamic provisioning of GPU capacity across multiple cloud providers and regions, offering cost savings and avoiding the constraints of hyperscaler contracts. This is a significant differentiator, as it allows for more flexibility in managing resources.
Developer Experience
The platform is designed for production deployments rather than just experimentation. This focus on the developer experience is a key part of Baseten’s strategy to win enterprise customers.
Customer Benefits and Early Success
Early adopters of Baseten Training have reported significant cost savings and performance improvements. For instance, AlliumAI, a startup, saw an 84% cost reduction with Baseten. Parsed, another early customer, achieved 50% lower end-to-end latency for transcription use cases.
The Strategic Rationale
Baseten’s expansion into training is driven by a belief that the lines between training and inference are blurring. By owning both, Baseten can optimize the entire lifecycle of a model, from training to deployment. This allows the company to offer a more complete solution, which is a key part of its long-term strategy. The company is positioning itself to capitalize on the growing trend of enterprises fine-tuning open-source AI models to reduce reliance on proprietary providers.
The Competitive Landscape
Baseten operates in a crowded market, facing competition from hyperscalers like AWS, Google Cloud, and Microsoft Azure, as well as specialized providers and vertically integrated platforms. Baseten’s differentiation lies in its multi-cloud capacity management, performance optimization expertise, and developer-focused approach.
The Future of Baseten
Baseten’s roadmap includes potential abstractions for common training patterns, expansion into fine-tuning for various data types, and deeper integration of advanced techniques. The company’s pragmatic approach, as demonstrated by its willingness to abandon Blueprints, suggests a strong potential for success in a rapidly evolving market.