From Borg to Block: How Kubernetes Evolved to Power the AI Revolution with VAST Data AI OS

The Artificial Intelligence revolution runs on a specific and powerful foundation: Kubernetes. Once an internal Google project, it has become the cloud’s de facto operating system, orchestrating the complex, data-intensive workloads that define modern AI. This post explores the journey of Kubernetes and how it forms the core of a unified AI stack, enabled by NVIDIA’s Inference Microservices (NIM) and the VAST Data Platform. We’ll focus on how this ecosystem provides a complete solution, from compute orchestration down to a pivotal new capability: high-performance block storage over NVMe/TCP.

The Genesis of Kubernetes

Kubernetes is the direct descendant of Google’s internal container management systems, Borg and Omega. Developed over a decade, these systems ran services like Gmail and Google Search at a massive scale. However, they were proprietary and tightly coupled to Google’s infrastructure. In 2014, Google engineers distilled the lessons from Borg and Omega into a new, open-source project: Kubernetes.

The design of Kubernetes directly addressed the pain points of its predecessors. Concepts like the Pod (the atomic unit of scheduling for co-located containers), Labels (for flexible resource organization), and an IP-per-Pod networking model were not theoretical but practical solutions born from years of operational experience.

Google’s most strategic move was donating Kubernetes to the newly formed Cloud Native Computing Foundation (CNCF) in 2015. This vendor-neutral governance model fostered a massive collaborative community, leading to its rapid adoption as the industry standard for container orchestration.

Kubernetes: The Undisputed Engine for AI

The features that made Kubernetes a powerful general-purpose orchestrator also made it the ideal platform for demanding AI/ML workloads.

  • Scalability and Portability: Kubernetes excels at automatically scaling resources up and down to meet the fluctuating demands of AI training and inference. Its container-based model ensures that ML environments are portable and reproducible, eliminating the “it works on my machine” problem.
  • Hardware Abstraction for GPUs: Modern AI relies on specialized hardware like GPUs. Kubernetes abstracts this complexity through its device plugin framework.9 Platform teams manage the complex hardware and driver installations, while data scientists simply request a GPU in their pod configuration (e.g.nvidia.com/gpu: 1). The scheduler handles the rest, making powerful hardware a simple, consumable resource.
  • An Extensible Platform: Kubernetes’s greatest strength is its extensibility. Through Custom Resource Definitions (CRDs) and Operators, an entire ecosystem of AI/ML platforms has been built on top of Kubernetes. Tools like
    Kubeflow provide a complete ML lifecycle toolkit, turning Kubernetes into the foundational “operating system” for a universe of AI innovation.

NVIDIA NIM: Packaging AI for a Kubernetes World

NVIDIA simplified AI deployment with NVIDIA Inference Microservices (NIM), a collection of pre-built, optimized containers that package AI models into enterprise-ready microservices. Each NIM is a self-contained Docker container that includes not just the AI model, but also a high-performance inference engine like Triton, all necessary CUDA libraries, and a standard API endpoint compatible with the OpenAI API specification.This packaging transforms complex AI models into standard, cloud-native applications.

NIMs are fine-tuned for specific hardware, leveraging tools like NVIDIA TensorRT to optimize the model graph, fuse layers, and reduce precision to maximize inference throughput and efficiency. They also include an embedded Prometheus endpoint for monitoring GPU utilization, latency, and other critical telemetry. NVIDIA offers a vast catalog of NIMs for various domains, including LLMs like Llama 3.1, speech AI for translation and text-to-speech, digital biology models for molecular docking, and simulation models for generating 3D worlds with OpenUSD.

This turns AI models into standard, cloud-native applications. Deployment is handled through familiar Kubernetes tools. While Helm charts provide an easy entry point for deploying a single NIM, the NIM Operator offers production-grade lifecycle management for complex AI pipelines. The operator, first released in 2024 and now at version 2.0, uses CRDs to automate complex tasks. Its

NIMCache feature intelligently pre-caches large models to a persistent volume, dramatically reducing startup times for new pods.The

NIMPipeline CRD allows operators to manage an entire graph of interdependent NIMs such as a RAG system combining embedding, reranking, and LLM models—as a single, cohesive unit.

With version 2.0, the NIM Operator expanded its capabilities to manage the lifecycle of NVIDIA NeMo microservices. This includes tools for building complete AI data flywheels, such as the NeMo Customizer for fine-tuning models, NeMo Evaluator for performance benchmarking, and NeMo Guardrails for adding content safety and topic controls to LLM chatbots. By managing both inference (NIM) and customization (NeMo) microservices, the NIM Operator provides a unified, declarative interface for deploying and managing the entire lifecycle of enterprise-grade, production AI applications on Kubernetes.

VAST Data: A Unified Data Foundation for Kubernetes

AI requires a high-performance, scalable data foundation, a role filled by the VAST Data Platform. VAST’s relationship with Kubernetes is symbiotic; it not only serves data to Kubernetes but is built on the same cloud-native principles.

VAST’s Disaggregated, Shared-Everything (DASE) architecture decouples compute logic (CNodes) from physical storage (DNodes), allowing performance and capacity to scale independently—a philosophy that mirrors Kubernetes’s own design.

VAST Data’s CNodes (Compute Nodes) play a critical role in the DASE architecture, handling system coordination, cluster metadata services, management APIs, and cloud integration. Within these CNodes, VAST does use containers extensively, especially for modularity, scalability, and cloud portability.

VAST even uses containers internally for its serverless DataEngine, proving its deep alignment with the cloud-native world.

For external Kubernetes clusters, VAST provides storage via its full-featured Container Storage Interface (CSI) driver. This allows developers to dynamically provision persistent volumes from VAST using standard Kubernetes objects like

PersistentVolumeClaim and StorageClass, automating storage management and making VAST a frictionless data backend for any stateful application.

The New Frontier: Customer Choice with Block Storage over NVMe/TCP

Historically, data centers were split between Network Attached Storage (NAS) for files and Storage Area Networks (SAN) for block-based workloads like databases. VAST’s most significant recent innovation is the addition of

native block storage, a strategic move that collapses this final silo and delivers on the promise of a truly unified data platform.

This unification is centered on customer choice. Architects can now select the optimal protocol—File (NFS), Object (S3), or Block (NVMe/TCP)—for any given workload, all provisioned from the same underlying VAST storage pool. This dramatically simplifies infrastructure and reduces costs.

The choice of NVMe over TCP (NVMe/TCP) for block storage is critical. It delivers SAN-like, microsecond-level latency over standard Ethernet networks, eliminating the need for expensive, specialized SAN fabrics. This is perfect for latency-sensitive AI components like vector databases and feature stores, which can become bottlenecks if they are waiting on slow storage. VAST provides a dedicated

Block CSI driver (block.csi.vastdata.com), allowing Kubernetes pods to consume raw block devices with the same ease as file or object storage, ensuring GPUs are always fed with data at maximum speed.

The Future is Agentic: The VAST DataEngine and AgentEngine

VAST is positioning its platform not just for today’s AI workloads, but for the next frontier: agentic AI. This new paradigm involves intelligent, autonomous agents that can reason, plan, and act to achieve complex goals, often by orchestrating multiple smaller, specialized models and tools. To power this vision, VAST has developed a comprehensive AI Operating System that unifies storage, database, and compute.

The foundation of this is the VAST DataEngine, a serverless, containerized function execution environment built directly into the platform.The DataEngine allows event-driven processing, where functions written in Python are automatically triggered by data events—such as a new file being written to perform tasks like data transformation, indexing, or enrichment in real-time.

Building on this foundation is the upcoming VAST AgentEngine, an AI agent deployment and orchestration system scheduled for release in the second half of 2025. The AgentEngine is designed to be the application management layer for agentic AI, providing the runtime, tooling, and observability needed to deploy and manage agents at scale. Key features include:

  • A Dedicated Runtime for Agents: An operational framework that handles not just container startup, but also loading models into GPU memory and verifying the tools an agent needs to function.
  • AI-Native Resiliency: For long-running agents that may operate for hours or days, AgentEngine provides checkpointing of the agent’s memory and reasoning state. This allows for seamless recovery from failures without having to restart the entire process from scratch.
  • The AgentEngine Studio: An integrated environment where developers can define how agents interact with tools and data, configure access rules and security, and manage their lifecycle.
  • An Agent Tool Server: A core component that allows agents to securely invoke data, functions, web searches, or even other agents using the emerging Model Context Protocol (MCP) standard for agent-tool interaction.

Together, the DataEngine and AgentEngine aim to provide the end-to-end infrastructure needed to transform experimental AI agents into scalable, recoverable, and observable production applications.

KubeVirt: Bridging the Past and Future

While the future is cloud-native, enterprises have vast investments in legacy applications running in virtual machines (VMs). KubeVirt provides the crucial bridge, extending Kubernetes to manage VMs alongside containers on the same cluster.

KubeVirt transforms Kubernetes into a universal infrastructure control plane, capable of managing any workload. A legacy application in a VM can now run on the same cluster, use the same network policies, and mount storage from the same VAST CSI driver as a modern containerized microservice. This allows organizations to modernize their infrastructure management without a disruptive “all-or-nothing” rewrite of every application, consolidating operations onto a single, future-proof platform.

Conclusion: The Unified AI Stack

The modern AI infrastructure stack has converged on a powerful, cohesive, and unified architecture:

  • A Universal Control Plane: Kubernetes, augmented with KubeVirt, provides a single control plane for both modern containers and legacy VMs.
  • Cloud-Native AI Services: NVIDIA NIMs package AI models into standardized, easy-to-deploy microservices managed by the Kubernetes-native NIM Operator.
  • A Unified Data Platform: The VAST Data Platform, built on a cloud-native DASE architecture, provides the essential data foundation. Its comprehensive CSI driver offers a choice of high-performance file, object, and now ultra-low-latency block storage via NVMe/TCP. Looking forward, its DataEngine and AgentEngine are built to power the next generation of agentic AI.

This convergence eliminates infrastructure silos and operational friction, creating a future-proof stack that allows organizations to accelerate their journey from raw data to transformative, AI-driven insight.

Comments

Leave a Reply

Discover more from Lots of Data - Thoughts around AI Workloads

Subscribe now to keep reading and get access to the full archive.

Continue reading