SecurityBrief US - Technology news for CISOs & cybersecurity decision-makers
Flux result a1045358 89bb 4189 917c bfae989f0e7c

Google Cloud expands AI deal with Thinking Machines Lab

Fri, 24th Apr 2026 (Today)

Google Cloud has signed an agreement with Thinking Machines Lab to expand the group's AI infrastructure on its platform. The deal extends Thinking Machines Lab's use of Google Cloud systems for model research, platform development and training.

Under the agreement, Thinking Machines Lab will use A4X Max virtual machines with NVIDIA GB300 GPUs through Google Cloud. The lab is among the first Google Cloud customers to use NVIDIA GB300 NVL72 through the service.

Early testing showed a twofold increase in training and serving speed compared with an earlier generation of GPUs. Thinking Machines Lab is also using Google Cloud's Jupiter network, designed to handle the rapid transfer of model weights used in reinforcement learning workloads.

The expanded arrangement builds on a relationship that began in 2025. It also gives Thinking Machines Lab broader access to Google Cloud services including Google Kubernetes Engine, Spanner, Cluster Director, Cloud Storage and Anywhere Cache.

Infrastructure stack

Thinking Machines Lab is using those services to support both frontier model development and Tinker, its fine-tuning product. It combines Cloud Storage, Spanner for transactional metadata and a custom node-level caching system so training can continue while production workloads are served at global scale.

The agreement reflects growing demand among AI research groups for larger clusters and tighter integration between computing, storage and networking. As models grow larger and reinforcement learning workloads become more demanding, access to infrastructure is becoming a bigger competitive factor for companies building and testing advanced systems.

Google Cloud has been expanding its AI infrastructure offering as cloud providers compete to attract model developers and AI start-ups. NVIDIA's latest Blackwell-based systems have become a focal point in that contest, with cloud providers seeking to secure supply and make the hardware available through managed services.

Thinking Machines Lab's use of Google Cloud also shows how AI developers are relying on a mix of general cloud services and specialised hardware rather than standalone computing clusters. Orchestration, metadata management, storage and automated fault handling are now central parts of model training environments.

"By leveraging A4X Max and the AI Hypercomputer integrated stack, Google Cloud got us running at record speed with the reliability we demand," said Myle Ott, founding researcher at Thinking Machines Lab.

"This seamless integration of high-performance compute, fast storage, GKE orchestration and automated remediation via Cluster Director has allowed us to focus on the unique aspects of the stack like Tinker and reinforcement learning," Ott said.

Platform push

Google Cloud said the agreement reflects a broader effort to support AI companies with infrastructure built around its AI Hypercomputer offering, which combines hardware, software and operational tools for large-scale training and inference workloads.

Mark Lohmeyer, vice president and general manager of AI and computing infrastructure at Google Cloud, said the company sees Thinking Machines Lab producing research and products that could help organisations use AI more effectively.

"The team at Thinking Machines Lab is generating very exciting research and product offerings that will help organizations more effectively utilize AI," said Mark Lohmeyer, vice president and general manager of AI and computing infrastructure at Google Cloud.

"Through this new agreement, and our deep partnership with NVIDIA, we'll help Thinking Machines accelerate even further using Google Cloud's AI Hypercomputer, which brings together purpose-built hardware, open software and flexible consumption models in an optimized architecture," added Lohmeyer.

NVIDIA also framed the deal as part of a wider shift toward system-level tuning as AI workloads increase in scale and complexity.

"As model sizes grow and reinforcement learning workflows become more complex, system-level optimization becomes critical," said Ian Buck, vice president and general manager of hyperscale and HPC at NVIDIA.

"NVIDIA GB300 NVL72 provides the performance leap and interconnect bandwidth needed to reduce bottlenecks and improve goodput. Running on Google Cloud's integrated AI stack, these advancements strengthen the platform, making it faster and smarter, so TML can extend and build on what the world's researchers are creating with NVIDIA," added Buck.