By Vinay Kumar, Chief Product & Technology Officer
I’ve spent the last fifteen years building cloud services: early days of AWS building S3 and EBS, helping launch Oracle Cloud Infrastructure from inception, and now building the agentic cloud at DigitalOcean for AI-natives. Every cloud I’ve worked on was designed for the workloads of its era. Those clouds were built for human-centric SaaS applications: a few users, a handful of requests per session, predictable data flows.
AI workloads break every one of those assumptions.
AI runs in loops. Agents think, then act, then think again. A single user task can span hundreds of thousands of tokens, traverse half a dozen tools, hit a knowledge base, write code, execute it, and persist state, all before returning an answer. The clouds we have weren’t built for this. Hyperscalers give you hundreds of services built for yesterday’s applications, and leave the integration to you. Inference-only providers sit on someone else’s compute and stack their margin on top. GPU rental shops (frequently referred to as “Neoclouds”) give you silicon, but not a system.
This week at Deploy 2026, we launched the DigitalOcean AI-Native Cloud, a purpose-built platform for the inference and agentic era that integrates five layers from silicon to agents into a single open stack.
We shipped fifteen products on Tuesday. Here’s what’s inside.
Our AI-Native Cloud is composed of five layers, each addressing a real workload pattern we’ve watched our customers wrestle with.

They’re independently useful and beautifully integrated:
Open source isn’t an add-on at any of these layers. It’s the foundation: PostgreSQL, MySQL, MongoDB, Valkey, OpenSearch, Kafka, Weaviate, vLLM, SGLang, OpenCode, LangGraph, CrewAI. Open all the way down. You bring your weights, your harness, your tools. We provide the runtime.
Let me walk through it, from the ground up.
Our global footprint now spans 19 data centers and 200+ network points of presence, with future capacity coming online in Kansas City and Memphis. That includes our first liquid-cooled racks, purpose-built for next-generation high-density GPU workloads.
Our Richmond data center is now generally available, with NVIDIA HGX™ B300 and AMD Instinct™ MI350X GPUs available alongside the H100, H200, and MI300/MI325 silicon already running across our fleet. We co-engineer at the kernel level with both NVIDIA and AMD. We don’t rent capacity. We own it. That’s why your unit economics improve as you scale on us, instead of getting worse.
Hundreds of thousands of customers already run on our core cloud every day: Droplets, Kubernetes (DOKS), VPC networking, and object/block/network file storage. We’ve extended it for AI workloads with a non-blocking RDMA fabric, RDMA-enabled NFS, and VPC-native inference out of the box.
At Deploy we announced Burstable CPU and MicroVM Droplets, currently in Private Preview. These are Firecracker-based instances that start in roughly 200 milliseconds, ideal for agent sandboxes and lightweight, spiky workloads. Agents need GPUs for thinking and CPUs for doing. We have both, and now they’re sized for how agents actually behave.
This is the layer we’ve rebuilt from the ground up. We co-developed it with design partners like Hippocratic AI, and the result is one of the highest-performing inference engines on the market today: fastest inference for Qwen 3.5 and DeepSeek V3.2 in independent Artificial Analysis benchmarks for token throughput.
Here’s what’s new:
The Router deserves a closer look. It’s a preference-aware control plane that picks the best model for each request, balancing cost, latency, and quality without touching application code. Unlike static routing rules, it runs on a purpose-built small language model that resolves intent in 200 milliseconds and ranks candidates against live cost and latency data, so the right model wins at 2am and at 2pm. Most AI builders start on a single frontier model. Then PMF happens, the bill scales linearly with usage, and the unit economics get painful fast. Most successful AI natives we work with run three or more models in production. The leading edge is running twenty or more. The Router makes that possible without a rewrite.
Take Celiums.AI, across 29.2M tokens processed through the Inference Router, 83% of their traffic now lands on open-source models, up from zero.
“Our AI Ethics Engine was built with open-source AI, so running it on closed-source models felt backwards. DigitalOcean’s Inference Router closed the loop: we swapped frontier closed-source models for open alternatives and cut per-token cost by 61% while pulling p95 latency under 400ms. Same API. Zero code changes. The Router routes to the optimal model on every request. We just build.”
— Mario Gutiérrez CTO at Unity Financial Network and Founder of Celiums.AI
We also expanded the Model Catalog with over 25 new models, including:
Stateful agents need context, memory, and the ability to learn from what happens in production. The Data & Learning layer is built on the managed services tens of thousands of customers already trust, extended for how AI systems actually run.
What’s new:
Transactional databases remain the foundation for AI. We made them production-grade for the agentic era.
This is the newest layer of the stack, and the one where we’ve spent the most time listening. We’ve watched customers deploy tens of thousands of agents on App Platform as containers. We’ve also watched them hit a wall when the agent loop, tool calls, state, observability, and code execution all live tangled together inside a single monolith.
So we asked a simple question: what would help you actually move faster? The answer became Managed Agents: five primitives that separate the plumbing from the business logic of your agent.
What’s new:
Any single layer of this stack is useful on its own. The reason to run them together is that the optimization compounds.
When your agents, your inference, your data, and your compute live in the same VPC, on the same silicon, billed on the same invoice, you eliminate the egress taxes, the margin stacking, and the integration debt that come from stitching across three vendors and three bills.
We’ve seen customers like Workato run a trillion automation tasks at 67% lower cost. Character.AI handle over a billion queries a day at 2x inference throughput. LawVo cut inference costs 42% with no code changes by routing through us. Hippocratic AI is powering 20M+ patient interactions with 40% lower latency. None of these are demos. They’re production workloads at scale.
If you’re an AI builder, whether you’re writing your first line of code or accelerating past product-market fit, this stack is for you. You don’t need to wait in a hyperscaler queue behind a frontier lab. You don’t need to glue together a Neo Cloud, an inference wrapper, and a vector database vendor. You don’t need to compromise on openness, on economics, or on developer experience.
Welcome to the AI-Native Cloud. Let’s build.
→ Get started at digitalocean.com
Read the full Deploy 2026 announcement and CEO Paddy Srinivasan’s perspective on why this moment requires a new cloud.


Vinay Kumar, DigitalOcean Chief Product & Technology Officer
