In an ambitious endeavor to bridge the ML (Machine Learning) compute gap, Voltage Park unveiled a formidable cloud infrastructure for AI development on October 29, 2023. The current market scenario is gripped by a severe shortfall in advanced ML compute resources, with startups, researchers, and large AI labs vying to acquire or lease the latest chips for ML training. This scarcity, favoring only the well-resourced, significantly stifles innovation across the board.
Eric Park, the CEO of Voltage Park, underscored the adverse impacts of this compute shortage on AI innovators, stating, “ML teams and AI founders have to wait months or pay exorbitant sums to access the latest hardware to train their models. We hope to redress this imbalance and accelerate cutting-edge work in AI.”
Navigating the ML Compute Quagmire
The ML compute market is beleaguered with numerous challenges:
Long-Term Contracts: Most providers enforce rigid contracts, obliging companies to lease large compute clusters for multiple years—a scenario far from ideal for smaller entities desiring more flexibility.
Availability: Extended lead times for those able to purchase keep them waiting while competitors advance.
Cost: Hefty GPU rental rates from major cloud providers often become prohibitive for startups and research labs, more so for teams engrossed in larger models where cost efficiency is crucial.
Unveiling Voltage Park’s Cloud Expanse
Touted among the largest ML compute clouds globally, Voltage Park’s infrastructure, valued at a whopping $1 billion, encompasses approximately 24,000 NVIDIA H100 GPUs. The clusters, equipped with 80GB H100 SXM5 GPUs, are fully interconnected with 3.2T InfiniBand, initially offering bare-metal access for large-scale users in need of peak performance. The roadmap includes extending support for short-term leases and hourly billing, alongside incorporating tools like Slurm, Kubernetes, and Mosaic for seamless integration into existing training frameworks.
Voltage Park, a subsidiary of the Navigation Fund founded by Ripple co-founder Jed McCaleb, has already initiated service to prominent AI firms like Imbue and is in the finalization stages for other AI leaders like Character.ai and Atomic AI. The entire compute capacity is slated to be operational by early next year.
Kanjun Qiu, CEO of Imbue, commended Voltage Park for facilitating quicker access to crucial compute resources, thereby substantially bolstering their model training performance.
As the infrastructure continues to unfold, Voltage Park is soliciting feedback from potential customers to tailor the clusters to accommodate a myriad of use cases, ranging from experimentation, training, fine-tuning, to inference.
Image source: Shutterstock