Lots of Data – Thoughts around AI Workloads

Lots of Data

The VAST Data Revolution: Moving Beyond 4KB IOPS to Unified Throughput in 2026

Prologue: The Night the Dashboards Turned Red

It starts, as so many infrastructure stories do, with a red bar on a dashboard.

It’s 2012. You’re the person everyone calls when the data is slow. The new flagship OLTP database just went live on a premium all flash array. The vendor flew in engineers, ran demos, and plastered “10 million 4KB IOPS” across every slide. The CIO signed the PO partly because of that number.

Yet tonight, the database is fine. What’s broken is everything around it.

The data warehouse job that populates the CFO’s morning dashboard is still running. ETL that once finished at 4 a.m. now drifts toward midmorning. Analysts stand in the kitchen with coffee, refreshing reports that stubbornly show data as of yesterday.

IOPS aren’t the problem. The problem is that your world doesn’t look anything like the benchmark your storage was sized against.

You know, deep down, that 4KB numbers don’t tell the real story. But at this point, the industry doesn’t have better language. Your storage vendor celebrates their benchmark wins. Your CIO congratulates herself on the procurement decision. Your database team is satisfied. But your data warehouse team, the people who actually drive business value, sits in meetings watching clock hands move while yesterday’s data trickles in.

**The times, as it turns out, are about to change. But first, you have to understand how we got here.**

Chapter 1: Living in the 4KB World (While OLAP Starved)

In those early flash years, the narrative is simple: more IOPS equals more performance. Everything, from RFPs to tradeshow booths revolves around 4KB random reads. It sounds authoritative and technical, and it gives people something to compare.

But here’s the truth nobody talks about: **4KB random IOPS are an OLTP metric.** They measure index probes, log writes, and tiny transactional updates. They tell you almost nothing about OLAP workloads, the analytical queries, warehouse scans, and batch jobs that actually consume most of your storage bandwidth.

When you profile your jobs in detail actually run trace tools and measure what’s happening, you see 128KB and 1MB blocks dominating the landscape. 4KB appears like seasoning, not the main dish.

The OLTP system does exactly what the vendor promised. Index lookups are fast. Tiny, random IO hums along. Database administrators are happy.

Yet when you walk over to the analytics team, the story changes completely:

– **Nightly fact table loads** run on large, sequential blocks 1MB chunks flowing out of staging areas into the warehouse

– **Reporting queries** churn on full table scans where entire indexes get read sequentially

– **Batch jobs** read and write hundreds of gigabytes at a time

– **Month end closing** requires scanning terabytes of transactional data

– **Year end reconciliation** means sequential reads that dwarf any random IO pattern by orders of magnitude

In OLAP terms, a “10M 4KB IOPS” slide tells you almost nothing about scan speed across terabytes, join performance, or how fast you can rebuild a cube or refresh a CFO dashboard.

The array that wins the benchmark bake off doesn’t necessarily win your real workload. In fact, sometimes the opposite happens: the system optimized for tiny random reads shows its weakness precisely when you need sequential throughput.

You file this away as “just how it is.” Storage is a tradeoff—one system for OLTP, another for analytics, and a lot of glue in between. But something in you keeps wondering: **what if storage didn’t have to be this fragmented?**

Chapter 2: The Spark Years and the Tiering Maze

Fast-forward to 2017. Your world has grown considerably. Now there’s a Spark cluster sitting in your infrastructure. Data scientists have appeared, dozens of them bringing Python notebooks and ambitious expectations.

Instead of one big warehouse controlled by a central team, you have a lake full of Parquet and ORC files, growing daily, accessed by teams across the organization. Queries run over terabytes, sometimes petabytes, of data. Machine learning models train on historical data. ETL pipelines have multiplied.

This is where the gap between benchmarks and reality becomes impossible to ignore. **These are all OLAP-style workloads**, large sequential scans, columnar formats, complex aggregations and they depend on sustained throughput, not tiny IOPS.

The analytics team doesn’t care about 4KB anything. They care about:

– How quickly a Spark job can scan billions of rows when executing a complex aggregation

– Whether exploratory queries run in seconds or minutes when they’re iterating on a hypothesis

– Whether they can run five different model training jobs in a day or whether they’re limited to one careful, scheduled attempt

– Why a query that should take five minutes takes forty five when executed against your storage

You sit with them in troubleshooting sessions. They show you job logs. **The CPU cores sit idle while IO completes.** The network interface isn’t saturated. The issue isn’t network congestion. It’s storage throughput, the platform simply cannot feed data to the Spark executors fast enough to keep them busy.

The Tiering Answer

The storage answer you’re given is tiering. “Put hot data on fast tiers. Warm data on NAS. Cold data on object storage. Archive the rest.”

So you adopt it. You build it. You create automation to move data between tiers. You write policies about what lives where.

Soon, your environment looks like this:

– A block array for databases and hot OLTP data

– A NAS fleet for file shares, metadata, and some hot analytics

– An object store for the data lake where Parquet files live

– A separate analytics appliance for warehouse queries

– An archive tier for compliance and historical data

Every new project triggers a conversation: *Where does this data belong?* Every new workload requires a decision about which tier to start with. A typical pipeline looks like: ingest to object → land to file → stage to block for analysis → export back to object for sharing. **Each hop means copies, data validation, new failure modes**.

You’ve gone from one problem: “my storage doesn’t match my workload” to five separate problems glued together with custom orchestration.

And still, performance conversations keep coming back to IOPS and point benchmarks that never quite match your real OLAP jobs.

Chapter 3: When AI Arrives and Breaks Everything

Then AI shows up. And it doesn’t politely integrate into your tiered architecture. **It breaks it.**

At first it’s small and experimental, a few GPUs in a lab box, models that fit comfortably on NVMe inside a single server. Nobody expects it to last.

But curiosity becomes pilot, and pilot becomes platform. Suddenly your board is approving capital for GPU clusters. You’re signing contracts for racks of DGX servers, each costing half a million dollars.

Training jobs stream massive datasets continuously. A trillion parameter language model needs to read the same data multiple times during training epochs. Checkpoints write 14 terabytes every two hours, complete model weights, optimizer states, training metadata.

You sit in a design workshop with your GPU vendor. They talk about feeding GPUs at hundreds of gigabytes per second. They explain that stalls in the data pipeline translate directly to GPU idle time and at $10,000 per GPU per day in cloud terms, idle GPUs become very expensive, very fast.

At that moment, you realize something uncomfortable: **your tiered storage estate was never designed for this workload pattern.**

The architecture assumes cold data, periodic access, acceptable latency in the hundreds of milliseconds. AI training assumes hot data, continuous streaming access, sub-millisecond latency requirements. The gap is fundamental.

The Pilot That Revealed the Truth

The first training run is exhilarating. GPUs light up. Utilization metrics show 85% occupancy. The model trains.

Then, as real usage patterns emerge:

– Checkpoints start to drag.** A 14TB state that should flush in minutes takes an hour or more. GPU clusters sit idle waiting for IO

– A drive failure triggers rebuilds** that cut throughput in half for hours—that one failure cost you $100K in lost compute capacity

– Agent version mismatches** during an OS upgrade cause intermittent node failures. Debugging consumes two weeks of engineering time

– New data pipelines need integration** with three different systems: the AI storage, the traditional analytics tier, and the object lake

Suddenly IO becomes the pacing factor of the most expensive system you’ve ever deployed. You have racks of GPUs that cost millions of dollars **waiting for storage IO**.

You begin to wonder: what if there was a different way?

Chapter 4: When OLAP and AI Exposed the Benchmark Lie

Around this time, you encounter VAST Data. Not through sales email, but through a customer who runs their AI infrastructure on VAST. They mention, casually, that checkpoint writes that used to take 45 minutes now complete in 6 minutes. That GPU utilization improved from 68% to 91%. That they eliminated three separate storage platforms.

You’re skeptical. Every vendor makes claims. You’ve heard promises before.

**But you run a test.** Not a vendor controlled benchmark. A side by side test using your actual patterns, not 4KB random reads, but a mixture of 1MB training data loads, 128KB analytical queries, and 14TB checkpoint writes.

### The OLAP-Relevant Results

| IO Pattern | Local SSD | VAST Cluster | Advantage |

|————|———–|————–|———–|

| 1MB sequential reads | <5 GiB/s | 50+ GiB/s | **10x** |

| 128KB analytical reads | 3-4 GiB/s | 46+ GiB/s | **13x** |

| 1MB sequential writes | 0.9 GiB/s | 11 GiB/s | **12x** |

The tiny 4KB tests still look flattering for SSD. VAST shows competitive but not dominant performance there.

**But when you overlay those results with actual job profiles from your clusters, the story flips completely.** Almost all of your critical throughput lives in the 128KB–1MB space where VAST is multiple times faster. The 4KB pattern that dominated the benchmarks you’ve been reading for fifteen years accounts for maybe 2% of your actual storage IO.

It’s like realizing that your whole career, you’ve been judging cars by how quickly they can parallel park, while what you actually needed was highway speed and fuel (or battery) range.

This benchmark reframes the entire conversation:

– Instead of “how many IOPS?” you start asking **”how many TB/s can I sustain for hours while feeding GPU clusters?”**

– Instead of “how many tiers?” you ask **”why aren’t all my workloads on one system if the performance is better?”**

– Instead of “which agent version is compatible with which OS?” you ask **”why do agents exist here at all?”**

Chapter 5: Discovering a Different Kind of Architecture

When you look under the hood of that VAST system, the differences are structural, not cosmetic. This isn’t a faster version of what you already have. **This is a fundamentally different design**.

The DASE Architecture

**CNodes (Compute/Protocol Layer):** Stateless frontend nodes speak all the languages your applications need—NFS for file access, SMB for Windows shares, S3 for object protocols, NVMe/TCP for direct block access, even GPU-Direct Storage for DMA transfers that bypass CPU entirely. No kernel drivers installed on your hosts. No compatibility matrices.

**DNodes (Capacity Layer):** Backend nodes own pools of dense QLC flash, treated not as blocks and LUNs but as a sea of fine-grained elements. Data isn’t owned by a particular RAID set or controller. Every C-node can reach every piece of data with no fixed affinity.

**NVMe Fabric:** The connection isn’t a traditional SAN protocol limited by distance and latency. It’s NVMe-over-TCP at TB/s speeds per rack, clustered globally, with linear scaling.

Global Similarity Reduction

Data reduction isn’t an afterthought bolted onto the side. It’s baked into the architecture. The similarity engine finds patterns across terabytes of data—repeated model checkpoints, duplicated Parquet partitions, common container layers—at 4K-128K granularity across the entire global namespace.

**Protection** isn’t about carving RAID sets and praying rebuilds finish before business hours. Erasure coding spans wide across nodes—14+2 distribution means any two nodes can fail and the system continues at full speed.

You notice what **isn’t** there:

– No kernel drivers deployed on your GPU nodes

– No compatibility matrix between host OS version and storage plugin version

– No separate AI tier, analytics tier, database tier, and backup tier

– No agents consuming CPU cycles while your models train

– No licensing per protocol

Chapter 6: VAST Database, The OLAP Engine Built Into the Platform

Here’s where the story takes another turn. VAST isn’t just unified storage. **It includes a native database engine designed to eliminate the OLTP/OLAP divide entirely**.

Breaking the OLTP/OLAP Tradeoff

Traditional architectures force you to choose: row-based databases optimized for fast, small transactions (OLTP), or columnar systems designed for large-scale analytical scans (OLAP). You end up with separate systems, ETL pipelines to move data between them, and all the latency and complexity that entails.

VAST Database combines both capabilities into a single system:

– **Transactional properties** for real-time data ingestion—ACID-compliant, with transaction boundaries that can span multiple tables

– **Columnar storage format** optimized for analytical queries—data transforms from standard record form into columnar objects during the write path

– **32KB atomic objects** that enable fine-grained access—when you query the system, it retrieves precisely the data you need, making the system extremely fast for both fine-grained operations and large scans

Native Integration with Analytics Engines

VAST Database supports native SQL querying and integrates with popular query engines through push-down plugins:

– **Apache Spark** push-down for accelerated lakehouse queries

– **Trino** integration for federated analytics

– **Dremio** support for self-service BI

When your Spark job queries data stored in VAST, the query logic pushes down to the storage layer. Instead of pulling raw data across the network and processing it on Spark executors, the heavy lifting happens where the data lives.

The Write Buffer Architecture

VAST Database uses a write buffer in storage-class memory—the fastest persistent media available—to absorb incoming data. This provides time for data manipulation before storing it in low-cost QLC flash.

During this process, data transforms from standard database record form into a columnar format optimized for analytics. The result: you can ingest transactional data at high speed while simultaneously running analytical queries against the same dataset.

Real-Time OLAP Without ETL

The traditional data warehouse pattern looks like this: OLTP database → nightly ETL → staging → warehouse → BI tool. Each step adds latency. By the time analysts see data, it’s hours or days old.

With VAST Database, **the transactional system and the analytical system are the same system**. Data ingested for operational purposes is immediately available for analytical queries. The CFO dashboard that was running on yesterday’s data can now run on data from minutes ago.

Vector Search at Scale

For AI workloads, VAST Database includes vector storage capabilities designed for trillion-scale vector search. Key characteristics:

– Real-time search and retrieval across operational, analytical, and vector workloads

– Constant-time search across large vector spaces

– Native integration with embedding models for RAG applications

This isn’t a separate vector database bolted onto the side. It’s part of the same unified platform that serves your block storage, file shares, and object lake.

Chapter 7: Watching Your Workloads Transform

The first real change you notice isn’t on a benchmark chart. It’s in how your workday feels.

Spark and Analytical Workloads

Your Spark jobs that once required careful staging to local SSD—staging that consumed both engineering time and storage space—start running directly against the VAST cluster. The 128KB and 1MB scans that used to saturate servers and cause cluster-wide pauses now complete in a fraction of the time.

Rerunning a full pipeline goes from “schedule it overnight and check results in the morning” to “kick it off at 10am and have results by lunch.” **Iteration becomes practical.** Data scientists can explore hypotheses instead of waiting for batch windows.

AI Training and Inference

Your AI training runs stop treating IO as a rare resource. You stream datasets at speeds that keep GPUs busy instead of hungry. The pattern appears immediately: **GPU utilization climbs from 68% to 92%**.

That’s not a rounding difference. That’s GPU clusters that are actually busy, actually training, actually productive. Your $10M annual GPU investment starts returning value proportional to the investment.

Checkpoints no longer dictate the rhythm of experimentation—they happen in the background during training instead of defining the training pace.

Database Consolidation

Your databases stop being privileged snowflakes that need their own pet storage arrays. You provision NVMe/TCP volumes from the same pool that powers everything else. The same storage that feeds your Spark clusters, stores your AI checkpoints, and serves as your object lake now handles your database IO.

With VAST Database, you can go further: eliminate the separate OLTP and OLAP systems entirely. Ingest transactional data and query it analytically from the same platform, with columnar storage and push-down acceleration built in.

The Object Lake Unification

The object lake that used to live in separate hardware now shares the same platform. S3 reads hit the same NVMe flash as native block. Your Parquet files stored as objects perform identically to files accessed through NFS.

**You quietly retire migration scripts** because you no longer need to bounce data between tiers to make it usable.

Chapter 8: The Human Impact

The technical story is compelling, but the human impact is what sticks with you.

**The storage team** that once spent nights babysitting rebuilds and tracking agent versions finds itself with time for strategy instead of firefighting. They become partners in planning infrastructure, not obstacles to work around.

**Data scientists** notice the shift immediately. Their experience transforms from “open a ticket, wait for infrastructure team, point at a NAS mount, run your query, wait for results that never come” to “point at this bucket/share/volume and go.” The friction between idea and experiment shrinks from weeks to days.

**Your finance team** starts noticing different numbers. The GPU clusters that used to have 60% utilization now run 90%. That’s 50% more effective use of hardware. A 256-GPU cluster effectively behaves like a 384-GPU cluster.

You also see something more subtle: **trust returning to storage conversations.** For years, performance numbers felt negotiable fast in one corner case, slow everywhere else. Now, when you say “this platform can feed 256 GPUs” or “this system will compress your lakehouse by 12x,” you have production behavior to back it up.

Chapter 9: What OLAP Workloads Actually Need in 2026

Standing in 2026, with all these lessons behind you, the requirements for analytical workloads have crystallized:

Throughput Over IOPS

Modern OLAP workloads—Spark SQL, Trino queries, warehouse scans, ML feature engineering—are characterized by large sequential scans, 128KB–1MB IO sizes, and multi-TB datasets. The metric that matters is **sustained throughput**: 140+ GB/s per rack, multiple racks clustered together delivering TB/s. Not peak throughput in ideal conditions—sustained, predictable, available-every-day throughput.

Unified Data Access

A single platform should serve all protocols from the same storage. There’s no “which tier should this data live on?” question because it’s all the same tier. OLAP queries against Parquet in S3, OLTP transactions against NVMe/TCP volumes, and ML training against NFS shares—all from the same capacity pool.

Native Query Acceleration

For analytical workloads, the database engine matters as much as the storage layer. VAST Database provides columnar storage, push-down query execution, and native integration with Spark and Trino—eliminating the ETL tax that traditional architectures impose.

Inline Data Reduction

OLAP workloads often involve repetitive data: similar schemas, duplicated Parquet partitions, versioned datasets. Global similarity reduction captures this automatically, achieving 5-15x reduction factors without sacrificing performance.

Zero Agent Architecture

Every protocol works natively. No custom drivers. No compatibility concerns between OS versions. No agent upgrades that require coordination with security patches.

Epilogue: Looking Back at the Red Dashboard

If you could visit that 2012 version of yourself, the one staring at the stuck ETL job and the triumphant IOPS slides, you’d have a lot to say.

You’d tell them that the real story was never about a single metric, a single protocol, or a single array. It was about all the workloads that would one day converge and demand service simultaneously: OLTP databases with random access patterns, **analytical warehouses with sequential scans**, data lakes with mixed access patterns, AI training with streaming requirements, inference with latency constraints.

You’d explain that architecture, not clever caching, not another tier, not agent density, is what ultimately changes the game. That once you have a platform capable of feeding GPUs at TB/s, serving warehouses with **13x performance advantage** on OLAP style scans, and maintaining databases with native protocols, all from the same pool, questions about “which system owns which workload” simply disappear.

You’d tell them about VAST Database: how the OLTP/OLAP divide that seemed fundamental turned out to be an artifact of architectural limitations, not a law of physics. How columnar storage and transactional capabilities can coexist. How push-down query engines eliminate the ETL pipelines that once consumed entire teams.

Most of all, you’d reassure them that the frustration they felt—the sense that the industry was optimizing for the wrong things, that benchmarks didn’t match reality, that storage innovation had plateaued into complexity instead of simplification, was justified.

**The times really were changing. It just took a while for storage to catch up.**

In 2026, block workloads no longer live in a 4KB world. They live in a world of TB/s, shared data, unified platforms, and intelligent reduction. The silly 4KB benchmarks that dominated a decade of vendor marketing have finally been demoted to what they always were: a minor supporting detail for a narrow slice of workloads.

**The red dashboard from 2012 would never be red again.**

February 2, 2026
The History and Future of KV Cache and how VAST Data is transforming the KV cache from a temporary, disposable byproduct of inference into a persistent, manageable, and valuable data asset.
In the rapidly evolving world of artificial intelligence, we often hear about massive parameter counts, powerful GPUs, and breakthrough model architectures. But there’s a silent workhorse behind the scenes, a critical component that has enabled the incredible growth in LLM capabilities and is now at the center of the next great leap in AI infrastructure. That component is the KV Cache.

This is the story of KV Cache: what it is, how it became the biggest bottleneck in AI, and the revolutionary solution announced by NVIDIA and VAST Data at CES 2026 that promises to unlock the era of true “Agentic AI.”

Part 1: What is KV Cache? A Simple Explanation

Imagine you’re a chef in a busy kitchen. You get an order for a complex dish. As you prepare it, you need to keep track of all the ingredients you’ve already chopped, the spices you’ve added, and the cooking times for each component. If you had to re-chop every vegetable and re-measure every spice for every new step of the recipe, you’d never finish. Instead, you keep the prepared ingredients in bowls on your counter—a “cache” of past work—ready to be used instantly.

In the world of Large Language Models (LLMs) like GPT-4, the process is similar. When an LLM generates text, it does so one word (or token) at a time. To generate the next token, it needs to understand the context of all the tokens that came before it.

This is where the Key-Value (KV) Cache comes in. Inside the model’s “attention mechanism,” for every token processed, two mathematical vectors are created: a Key (which helps identify the token) and a Value (which holds its semantic meaning).

Without a cache, the model would have to re-compute these Key and Value vectors for every single previous token for each new word it generates. This would be incredibly slow and inefficient. The KV cache stores these vectors in the GPU’s high-speed memory, so they only need to be computed once. When generating the next token, the model simply looks up the pre-computed Keys and Values from the cache, saving a massive amount of computational work.

Part 2: The History of KV Cache: From Novelty to Bottleneck

The concept of KV caching is as old as the Transformer architecture itself, which powers nearly all modern LLMs. In the early days, models were relatively small, and the “context window”—the amount of text the model could consider at once—was limited to a few hundred or a few thousand tokens. The KV cache for a single request could easily fit within the ample memory of a data center GPU. It was a neat optimization, a solved problem.

Then, the AI race began. Models grew exponentially in size, and so did the demand for longer context windows. We went from processing paragraphs to entire books, from simple Q&A to complex, multi-step reasoning tasks.

This created a massive problem. The size of the KV cache grows linearly with the sequence length and the batch size (the number of simultaneous requests). For a model with a 100k token context window, the KV cache for a single user can be tens of gigabytes. Multiply that by dozens of concurrent users, and you quickly exhaust the memory of even the most powerful GPUs.

The AI workload shifted from being compute-bound (limited by how fast the GPU could do math) to being memory-bound (limited by how much data the GPU could hold). The GPU’s precious high-bandwidth memory (HBM) was no longer just for model weights; it was being swallowed whole by the KV cache.

The Era of Optimization

Faced with this bottleneck, researchers and engineers developed a series of clever optimizations to squeeze more performance out of existing hardware:
- Multi-Query Attention (MQA) & Grouped-Query Attention (GQA): These techniques modify the model architecture to use fewer Key and Value heads, significantly reducing the memory footprint of the cache at the cost of a small amount of model quality.
- FlashAttention: A groundbreaking software technique that optimizes how the GPU reads and writes data to memory, reducing the time spent moving data back and forth and speeding up attention calculations.
- Quantization: Instead of storing the Keys and Values in high-precision 16-bit formats (FP16), they can be compressed into 8-bit (INT8) or even 4-bit formats. This dramatically reduces memory usage but requires careful implementation to avoid losing accuracy.
- PagedAttention (from vLLM): Inspired by operating system virtual memory, PagedAttention manages the KV cache in non-contiguous memory blocks. This eliminates memory fragmentation and allows for much more efficient use of the GPU’s available memory, enabling larger batch sizes.
These innovations were crucial for deploying models like Llama 2 and Mistral, but they were all fighting a losing battle against the insatiable demand for more context.

Part 3: The “Agentic AI” Problem and the CES 2026 Revolution

The next frontier of AI is Agentic AI: systems of autonomous agents that can plan, reason, use tools, and collaborate to solve complex, long-horizon problems. Think of an AI software engineer that doesn’t just write a function but architects an entire application, debugging and iterating over days or weeks.

For these agents to be effective, they need persistent, long-term memory. They need to remember what they did yesterday, what their goals are, and the context of their collaboration with other agents. The KV cache is the perfect representation of this memory. But storing terabytes of KV cache in scarce, expensive GPU memory for days is simply not feasible.

This was the problem that NVIDIA and VAST Data set out to solve, culminating in their game-changing announcements at CES 2026.

NVIDIA’s Inference Context Memory Storage Platform

At CES 2026, Jensen Huang took the stage to announce a new class of AI infrastructure: the NVIDIA Inference Context Memory Storage Platform. The core idea is simple but revolutionary: disaggregate the KV cache.

Instead of trapping the KV cache inside the GPU, this new platform allows it to be stored in a specialized, high-performance external storage system. The GPU can then fetch only the parts of the cache it needs, when it needs them, over an ultra-fast network.

The key enablers for this are:
1. NVIDIA BlueField-4 DPU (Data Processing Unit): The “brains” of the operation. The BlueField-4 sits between the GPU and the storage, managing the data placement, handling security, and offloading the complex task of managing the KV cache from the GPU.
2. NVIDIA Spectrum-X Ethernet: The high-speed network fabric. Using RDMA (Remote Direct Memory Access), Spectrum-X allows the GPUs to access the remote KV cache with incredibly low latency, almost as if it were in their own local memory.
This architecture provides massive benefits:
- Virtually Unlimited Context: The size of the KV cache is no longer limited by GPU memory but by the capacity of the storage system, which is far cheaper and more scalable.
- Context Sharing: Multiple GPUs and even multiple different AI agents can share the same KV cache, enabling seamless collaboration.
- Increased Throughput: By freeing up GPU memory, more batches can be processed simultaneously, boosting the number of tokens generated per second.
- Improved Power Efficiency: It’s much more power-efficient to store data in a dedicated storage system than in power-hungry GPU HBM.
VAST Data: The AI Operating System for the Agentic Era

VAST Data, a leader in AI data platforms, announced its role as a key partner in this new ecosystem. The VAST Data Platform is the first to run its software, the VAST AI Operating System, directly on the NVIDIA BlueField-4 DPUs.

By running on the DPU, VAST’s software sits right in the data path, managing the flow of KV cache between the GPUs and VAST’s scalable all-flash storage. This integration is what makes the entire system practical for enterprise use.

VAST’s contribution goes beyond just raw storage. They are providing the data services needed for a world of persistent AI agents:
- Data Management: Efficiently storing, retrieving, and managing the lifecycle of billions of KV cache objects.
- Security & Isolation: Ensuring that one agent’s context is secure and cannot be accessed by unauthorized agents.
- Auditability: Tracking who accessed what context and when, which is crucial for regulated industries.
In essence, VAST Data is transforming the KV cache from a temporary, disposable byproduct of inference into a persistent, manageable, and valuable data asset.

Conclusion: A New Foundation for AI

The journey of the KV cache from a simple optimization to a central pillar of AI infrastructure is a testament to the incredible pace of innovation in this field. The announcements from NVIDIA and VAST Data at CES 2026 are not just about faster chips or bigger drives; they represent a fundamental rethinking of how we build AI systems.

By disaggregating memory and enabling persistent, shared context, they have laid the foundation for the next generation of AI: agents that can think, plan, and collaborate over long periods to solve the world’s most complex problems. The silent workhorse has finally taken center stage.
January 7, 2026
The Great Storage Squeeze of the 2020s: Why Skyrocketing Component Costs Are Exposing the Fatal Economic Flaws of Legacy Architectures (And Why VAST DASE is the Only Viable Escape Route)
We are currently living through a “perfect storm” in the data infrastructure world, a convergence of trends that is putting unprecedented pressure on IT budgets.

On one front, the sheer gravitational pull of data demand is becoming exponential. We are past the era of simple file storage. We are now in the age of consolidating virtual machines and containers, generative AI training sets measured in hundreds of terabytes, high resolution volumetric video, relentless IoT sensor logging, and modernized enterprise backup strategies that require instant recovery capabilities. The world is generating petabytes of data that doesn’t just need to be stored; it needs to be instantly accessible, highly performant, and “always on.”

On the other front, the underlying economics of the hardware required to store that data have turned hostile. For nearly a decade, IT leaders got used to a comfortable trend: flash prices went down, and density went up. We took it for granted.

That trend has violently reversed.

We have entered an era of the “Great Storage Squeeze.” The cost of NAND flash (SSDs) is rising sharply, driven by calculated production cuts by major manufacturers and supply chain constraints. Simultaneously, the cost of DRAM is skyrocketing, driven by the insatiable appetite of AI GPU servers gobbling up high performance memory (HBM) and the industry transition to more expensive DDR5 standards.

If you are relying on traditional storage architectures designed fifteen or twenty years ago, this squeeze isn’t just uncomfortable, it’s financial anemia.

In this new economic reality, efficiency is no longer just a “nice to have” feature on a datasheet; it is the single most critical metric for Total Cost of Ownership (TCO). If your architecture wastes flash or squanders RAM, your budget is bleeding.

This post will take a deep dive into why legacy “shared everything or nothing,” dual controller architectures are financially disastrous in the current market, and how VAST Data’s unique Disaggregated Shared Everything (DASE) architecture and revolutionary Similarity Engine offer the only viable economic escape route for petabyte scale organizations.

The Legacy Trap: Why Dual Controller Architectures Can’t Survive the New Economics

For over two decades, the “shared everything ” HA pair (high availability dual controller) has been the standard Lego brick of enterprise storage. You buy a physical chassis containing two controllers for redundancy, and a set of drives plugged into the back.

Those two controllers “own” those drives. No other controller in your data center can touch them.

This “shared everything” approach worked perfectly fine when enterprise data sets were measured in tens of terabytes. But as organizations scale into multiple petabytes, this architecture introduces massive, expensive inefficiencies that are magnified tenfold by rising component costs.

The Silo Problem and the Curse of Stranded Capacity

Traditional architectures scale by adding more “silos.” If you fill up Array A, you must buy Array B. If you fill up Array B, you buy Array C. If you need more performance than what the dual controllers can provide, yes, you guessed it properly, you add another dual controller cluster..

The fundamental economic flaw is that these arrays are islands. They do not share resources.

Imagine you have ten separate legacy arrays. It is statistically improbable that all ten will be perfectly utilized at 85%.
- Array A, hosting a mature database, might be 95% full, constantly triggering capacity alarms.
- Array B, bought for a project that got delayed, might be sitting at 20% utilization.
In a legacy world, Controller Pair A cannot utilize the stranded capacity trapped behind Controller Pair B. You have terabytes of expensive flash sitting idle in one rack, yet you are forced to issue a purchase order for more expensive flash for the adjacent rack simply because the empty space is trapped in the wrong silo.

Industry analysts estimate that in large scale traditional environments, 25% to 35% of total purchased flash capacity is perpetually stranded due to imperfect silo balancing. When SSD prices are skyrocketing, paying for 30% more flash than you actually use is a massive, unacceptable tax on your organization.

The “Dirty Little Secret” of Legacy Data Reduction

This is perhaps the most critical financial deficiency of traditional architectures in a multi petabyte environment, yet it is rarely discussed openly by legacy vendors.

Every vendor touts their deduplication and compression capabilities. They promise 3:1, 4:1, or sometimes 5:1 data reduction ratios. But there is a massive caveat hidden in the architectural fine print: Deduplication is bounded by the cluster boundary.

Because traditional architectures are based on independent silos, they only “know” about the data within their specific domain. They have no global awareness.

The Multi Cluster Duplication Disaster: A Real World Scenario

Let’s visualize a very common enterprise workflow to understand the scale of this economic waste.
1. Production: You have a primary, high performance legacy all-flash array (Cluster A) holding 1PB of critical production data.
2. Disaster Recovery: You require a remote copy. You replicate that 1PB to a secondary cluster (Cluster B) at a different site.
3. Dev/Test: Your development teams need realistic data to work with. You spin up a third cluster (Cluster C) and clone the production environment for them.
4. Analytics: Your data science team needs to run heavy queries without impacting production. You extract that data to a fourth data lake cluster (Cluster D).
In a traditional, shared nothing world, Cluster B, C, and D have absolutely no knowledge of the data sitting on Cluster A.

Even though the 1PB of data across all four sites is 99% identical, every single cluster will re-ingest it, re-process it, re-hash it, and store it as unique physical blocks.

The Economic Reality: You have 1PB of actual corporate information, but you have purchased and are powering 4PB of expensive flash to store it.

In a world of rising SSD costs, this inability to deduplicate globally across your entire environment is a financial catastrophe. It forces you to buy the same expensive terabyte over and over again.

The Paradigm Shift: VAST Data’s Disaggregated Shared Everything (DASE)

To solve an efficiency crisis of this magnitude, you cannot just tweak the old model with faster CPUs. You have to break the architecture completely.

VAST Data realized that the shared nothing, dual controller approach was a dead end for petabyte scale. Instead, VAST built the DASE (Disaggregated Shared Everything) architecture from the ground up to align with modern hardware realities.

How DASE Breaks the Silos

DASE fundamentally separates the “brains” of storage (compute logic) from the “media” (persistence).
1. The Compute Layer (The Brains): These are stateless Docker containers running on standard servers. They handle all the complex logic—NFS/S3/SMB/Block protocols, erasure coding, encryption, and data reduction. Crucially, they hold no persistent state. If a node fails, another one instantly takes over without a long rebuild process. You can scale performance linearly just by adding more stateless containers.
2. The Persistence Layer (The Media): This is a giant, shared pool of highly available NVMe JBOFs (Just A Bunch Of Flash). These enclosures contain no logic, only media. They hold a mix of expensive, ultra fast Storage Class Memory (SCM) for write buffering and metadata, and dense, affordable QLC flash for long term storage.
3. The Interconnect (NVMe-oF): A high speed, low latency Ethernet fabric connects everything to everything.
The crucial difference that changes the economics: Every single compute node can see and access every single SSD in the entire cluster directly over the network at NVMe speeds.

Why DASE is an Economic Fortress

Because everything is shared, there are no silos. There is absolutely no stranded capacity.

The entire cluster is one single pool of storage. If you are at 70% capacity, you are at 70% utilization across every drive. You never have to over provision one resource just to get more of the other.

Furthermore, DASE unlocks the economic potential of QLC Flash. QLC (Quad Level Cell) flash is significantly denser and cheaper than the TLC flash used by most legacy arrays. However, QLC has low endurance—it wears out quickly if you write to it randomly, the way legacy controllers do.

VAST’s DASE architecture uses the ultra fast SCM layer to absorb all incoming writes, organizing them into massive, perfectly sequential stripes before laying them down gently onto the cheap QLC flash. This allows VAST systems to use low cost QLC for 98% of their capacity while offering a 10 year endurance guarantee, something legacy architectures simply cannot achieve.

The Secret Weapon: The VAST Similarity Engine

We established that rising SSD costs make capacity efficiency paramount. But we also established that rising RAM (DDR5) costs are painful.

Traditional deduplication is terrible at both.
- It’s Fragile (Bad Capacity Efficiency): Old school dedupe breaks data into fixed blocks (e.g., 8KB). It creates a mathematical “hash” (a fingerprint) of that block. If a single bit changes in that 8KB block, the hash changes completely, and the dedupe fails. It only catches exact matches. It fails miserably on encrypted data, compressed logs, or genomic sequencing data where blocks are almost identical but not perfect matches.
- It’s RAM Hungry (Bad Memory Efficiency): To know if a block is a duplicate, the storage controller must keep a massive table of every single hash it has ever seen. Where does that table live? In incredibly expensive, fast DRAM. As your data grows to petabytes, the required DRAM table grows linearly, becoming prohibitively expensive.
Enter Similarity Based Data Reduction

VAST didn’t just rebuild the hardware architecture; they reinvented data reduction for the modern era.

The VAST Similarity Engine doesn’t just look for exact block matches. It looks for similar blocks.

Using advanced algorithms derived from hyperscale search engine technology, VAST breaks data down into very small chunks and compares them against “reference blocks” already stored in the system. If a new block is 99% similar to an existing block, VAST compresses it against that reference block, storing only the tiny delta of differences.

The Twin Economic Benefits of Similarity:
1. Far Better Reduction (Saving SSDs): Similarity works amazingly well on data types that traditional dedupe gives up on—like voluminous log files, machine generated IoT data, genomic data, and even pre encrypted backup streams. VAST routinely achieves dramatic data reduction on datasets considered “uncompressible” by legacy vendors.
2. Massive RAM Savings Because Similarity doesn’t rely on a rigid, massive hash table of every single 8KB block, it requires a fraction of the DRAM that legacy systems need to manage petabytes of data. The metadata footprint is radically smaller. In an era where filling a server with DDR5 memory can cost as much as the CPUs, this is a massive cost advantage.
The Global Data Reduction Knockout Punch

Remember the “Multi Cluster Disaster” scenario where you paid for 4PB of flash to store 1PB of data across Production, DR, Dev, and Analytics?

Because VAST DASE is a single, scalable global namespace that can grow to exabytes without performance degradation, you never have that problem.

Whether you have one PB or fifty PBs, it is all managed by one loosely coupled DASE cluster. The Similarity Engine sees everything.

If you create a clone of your 1PB production database for testing, VAST recognizes it instantly. It doesn’t copy the data. It just creates pointers. Even as the test team modifies that data, the Similarity engine only stores the tiny unique changes.

With VAST, you store 1PB of information once. Period. You don’t pay the “silo tax” ever again.

Conclusion: The Economics Have Changed. Will You?

The era of cheap flash and abundant, inexpensive RAM masking inefficient storage architectures is over. The storage market has entered a new phase of harsh economic reality defined by supply scarcity and exploding demand.

Sticking with traditional dual controller architectures in this environment means voluntarily accepting stranded capacity, paying for duplicate data copies across multiple silos, and buying excessive amounts of overpriced RAM just to manage inefficient legacy dedupe tables.

VAST Data’s DASE architecture and Similarity Engine were designed specifically for this petabyte scale reality. By breaking down physical silos through Disaggregated Shared Everything, and by reinventing data reduction to be both globally aware and radically RAM efficient, VAST doesn’t just offer better technology.

It offers the only viable economic path forward for large scale data infrastructure in the 2020s. Stop paying the legacy tax.
December 23, 2025
VAST Data NVMe/TCP Block Support: Eliminating Silos and Unifying Workloads for VMware vSphere
In modern enterprise IT, complexity is the enemy of agility. Organizations have long struggled with multiple storage silos: Fibre Channel SANs for block workloads, NFS for file, and object storage for analytics, backup, or cloud-native applications. Managing these separate systems increases cost, operational overhead, and risk, especially in VMware vSphere environments running critical workloads.

VAST Data’s Element Store architecture changes that paradigm by delivering a unified, disaggregated, all-flash platform capable of handling block, file, and object workloads simultaneously. With the addition of NVMe/TCP block support, VMware customers can now consolidate mission critical virtual machines, databases, analytics, and unstructured workloads onto a single platform, eliminating silos while maintaining high performance and predictable latency.

The Problem with Traditional Silos

Enterprises typically maintain multiple arrays because each workload has historically required a different storage protocol:
- Block storage for VMware VMFS datastores and transactional databases.
- File storage for shared directories, home folders, or application file systems.
- Object storage for analytics, backup, and cloud-native workloads.
This siloed approach creates challenges:
1. High management overhead – Separate monitoring, patching, replication, and capacity planning.
2. Fragmented data – Moving data between silos is slow and inefficient.
3. Inefficient hardware usage – Some arrays are underutilized, others are overprovisioned.
4. Limited flexibility – Adapting to new workloads often requires deploying another silo.
VAST Element Store: One Platform, All Protocols

VAST’s Element Store is fundamentally different. It provides:
- Unified architecture – Block, file, and object data coexist on the same underlying storage hardware, sharing the same pool of NVMe SSDs.
- Disaggregated design – Compute and storage scale independently, so adding capacity or performance does not require forklift upgrades.
- High-performance NVMe/TCP block support – Low latency for VMware vSphere workloads, with VMFS datastores.
- File (NFS) and object (S3) access – Fully supported alongside block volumes, all running on the same Element Store.
This architecture eliminates the need for separate SANs, NAS, or object systems, no more silos. Whether your organization runs transactional databases, virtual desktops, analytics pipelines, or backup workloads, VAST can handle them simultaneously on one platform.

NVMe/TCP: Bringing Block into the Unified Store

NVMe/TCP brings enterprise grade block storage to Ethernet networks, allowing VMware administrators to leverage standard infrastructure while achieving the performance traditionally associated with Fibre Channel:
- Low-latency, high-throughput block volumes – Ideal for virtual machines.
- Seamless integration with vSphere 8.x – Supports VMFS datastores.
- Concurrent protocol access – Block volumes do not interfere with NFS shares or S3 object storage on the same Element Store.
By supporting multiple protocols simultaneously, VAST ensures that all workloads are served efficiently without carving the system into silos, maximizing hardware utilization while simplifying operations.

Benefits of a Unified Platform

VAST’s single platform approach delivers several tangible benefits:
1. Silo elimination – Consolidate block, file, and object workloads without deploying separate arrays.
2. Operational simplicity – One management plane, one namespace, and consistent data services across protocols.
3. Performance consistency – NVMe/TCP for block workloads, low-latency NFS for file, and high-throughput S3 for object, all coexisting without interference.
4. Scalability – Disaggregated architecture allows linear scale of capacity and performance.
5. Enterprise features – Instant snapshots, clones, replication, and automated tiering across all protocols from the same Element Store.
Real-World Impact

Imagine a hospital IT environment running VMware vSphere workloads for Epic databases, departmental file shares, and long-term imaging archives. Traditionally, each workload would live on a different silo: a SAN for Epic block volumes, a NAS for shared files, and object storage for images. VAST eliminates that complexity:
- All workloads live on the same Element Store.
- NVMe/TCP block volumes power VMware VMs and databases.
- NFS handles file sharing and departmental apps.
- S3 object storage supports long-term retention and analytics.
- Snapshots, clones, and replication apply consistently across all protocols.
The result: one platform, one namespace, one management model, and maximum hardware efficiency.

Conclusion

VAST Data’s NVMe/TCP block support is not just about performance; it’s about breaking down storage silos and enabling a truly unified data architecture. VMware administrators can now deploy mission critical block workloads alongside file and object workloads on the same Element Store, simplifying operations, improving utilization, and accelerating innovation.

With VAST, enterprises can finally stop managing multiple arrays and start managing data as a single, universal resource, future-proofing their infrastructure while delivering the high performance and reliability modern workloads demand.

below, you can see a demo how it all works:

The official documentation: https://support.vastdata.com/s/document-item?bundleId=vast-cluster-administrator-s-guide5.3&topicId=managing-access-protocols%2Fblock-storage-protocol%2Fconfiguring-an-nvme-tcp-client-on-vmware-vsphere-for-vast-cluster-block-storage.html&_LANG=enus
September 10, 2025
Kubernetes COSI: Simplifying Object Storage with VAST Data
When Kubernetes was first designed, it came with strong support for compute and block storage through the Container Storage Interface (CSI). CSI standardized how workloads could consume persistent volumes, enabling automation, portability, and ecosystem growth. But object storage—one of the most critical storage paradigms for modern applications—was left behind.

That gap led to the creation of the Container Object Storage Interface (COSI), an emerging Kubernetes standard that allows applications to dynamically request, provision, and consume object storage buckets in the same way that CSI enables block storage volumes.

In this post, we’ll explore why COSI exists, what problems it solves compared to “just provisioning object storage manually,” and how VAST Data integrates with and extends COSI to deliver enterprise-grade capabilities.

Why COSI Exists

At first glance, object storage might seem simpler than block or file storage. A bucket is just a logical container—you can create one with a single aws s3 mb command or a line of YAML through your object storage system’s API. Why do we need an entire Kubernetes API around this?

The answer lies in scale, repeatability, and automation.
- Manual provisioning doesn’t scale: In traditional environments, administrators pre-create buckets, hand out access keys, and manually wire credentials into workloads. This model quickly becomes unmanageable when dealing with hundreds or thousands of microservices.
- Dynamic provisioning is critical: Developers want to define their object storage needs in their deployment YAML. Just like a PersistentVolumeClaim for block storage, a BucketClaim in COSI lets them request storage resources without needing to talk to a storage admin.
- Consistent lifecycle management: Buckets, access policies, and credentials should follow the lifecycle of the Kubernetes resource. When the app is deleted, the bucket (or its credentials) can be reclaimed or cleaned up automatically.
- Portability and standardization: COSI provides a common API that works across different object storage backends, eliminating the need for cluster operators to re-architect automation when switching vendors.
In short, COSI brings self-service, automation, and policy-driven lifecycle management to object storage in Kubernetes.

How COSI Works

COSI introduces three key Kubernetes resources:
1. BucketClass – Defines storage classes for object storage (e.g., standard vs archive tier, data protection settings).
2. BucketClaim – A request for a bucket by an application.
3. BucketAccess – Manages credentials and access permissions.
Behind the scenes, a COSI driver runs in the cluster and interfaces with the object storage backend. It provisions buckets, generates access keys, and enforces policies defined in the BucketClass.

For developers, this means a simple YAML declaration:
```
apiVersion: objectstorage.k8s.io/v1alpha1
kind: BucketClaim
metadata:
  name: my-app-bucket
spec:
  bucketClassName: standard
```
The cluster handles the rest—provisioning the bucket on the backend, wiring access credentials into the pod, and ensuring lifecycle consistency.

VAST Data and Kubernetes COSI

VAST Data extends COSI with the unique benefits of its universal storage platform, enabling Kubernetes workloads to consume object storage with enterprise-grade guarantees:
1. Unified Storage Engine: With VAST, COSI buckets live on the same platform that powers block, file, and database services. This simplifies operations and reduces silos while still supporting S3-compatible access.
2. Performance at Scale: Unlike traditional object stores optimized for capacity but not speed, VAST delivers low-latency and high-throughput S3 performance—critical for modern data-intensive Kubernetes workloads like AI/ML pipelines, analytics, and media processing.
3. Policy-Driven BucketClasses: VAST lets administrators expose bucket classes tied to real backend policies—data protection (erasure coding, snapshots), security (encryption, immutability), and tiering—so Kubernetes developers consume object storage aligned with enterprise governance.
4. Lifecycle Automation: Buckets and credentials created via COSI on VAST can be tied to namespace or workload lifecycles, ensuring compliance and reducing orphaned resources.
5. Deep Integration with VAST Ecosystem: By using VAST’s COSI driver, Kubernetes clusters benefit from the same consistency, scale, and global namespace that VAST provides across object, file, and block.
Why It Matters

The promise of Kubernetes has always been about self-service, automation, and portability. COSI brings that promise to object storage, enabling developers to request and consume buckets without manual intervention.

For organizations standardizing on VAST Data, this means:
- Faster developer velocity – No tickets, no manual bucket provisioning.
- Reduced operational overhead – Policies and automation handle lifecycle.
- Enterprise-grade data services – All powered by VAST’s universal storage engine.
COSI is not just about buckets—it’s about unlocking cloud-native agility for data-driven applications. With VAST Data’s support, enterprises can extend these capabilities to mission-critical workloads at petabyte scale.

Sample Kubernetes YAML Workflow with VAST’s COSI Driver

1. Install the VAST COSI Driver

First, you’d install the necessary Custom Resource Definitions (CRDs) and controller into your cluster as outlined in VAST’s documentation .

2. Create a Kubernetes Secret

You’ll need a Kubernetes secret containing credentials or a VMS authentication token for the VAST COSI driver to authenticate with your VAST cluster:
```
apiVersion: v1
kind: Secret
metadata:
  name: vast-cosi-auth
  namespace: vast-cosi-system
type: Opaque
stringData:
  # either:
  # - Vast Access Key and Secret Key
  # - or an authentication token from VMS (preferred for VAST Cluster 5.3+)
  access_key: "<YOUR_VAST_ACCESS_KEY>"
  secret_key: "<YOUR_VAST_SECRET_KEY>"
  # OR for token-based auth (VAST 5.3+):
  # token: "<YOUR_VAST_VMS_AUTH_TOKEN>"
```
Documentation notes that from VAST Cluster version 5.3 or later, authentication using VMS authentication tokens is supported . This is the preferred method over static credentials.

3. Define a BucketClass

Define how buckets should be provisioned—e.g., performance, data protection, or snapshot policies:
```
apiVersion: objectstorage.k8s.io/v1alpha1
kind: BucketClass
metadata:
  name: vast-standard
spec:
  # backend-specific class name that maps to a policy on the VAST platform
  provisioner: vast.cosi.vastdata.com
  parameters:
    # The “policy” should correspond to configured policies in VAST (e.g., erasure coding, snapshot-enabled)
    policy: standard-performance-s3
```
This ties Kubernetes bucket requests to VAST backend policies for performance, protection, and lifecycle management.

4. Create a BucketClaim

Applications can request a bucket just like a PersistentVolumeClaim:
```
apiVersion: objectstorage.k8s.io/v1alpha1
kind: BucketClaim
metadata:
  name: myapp-data-bucket
spec:
  bucketClassName: vast-standard
  # optional: request a specific bucket name or let VAST assign one
  # bucketName: myapp-data-123
```
The COSI controller will then provision an S3 bucket on the VAST platform according to the defined BucketClass.

5. Access Credentials with BucketAccess

To get credentials and endpoint information for your newly provisioned bucket:
```
apiVersion: objectstorage.k8s.io/v1alpha1
kind: BucketAccess
metadata:
  name: myapp-data-access
spec:
  bucketClaimName: myapp-data-bucket
```
This resource enables Kubernetes (and your workloads) to retrieve credentials and other relevant metadata automatically.

6. Use in a Pod or Deployment

You can now mount or inject access credentials via Kubernetes secrets generated by the COSI controller. They may look like this in a pod:
```
apiVersion: v1
kind: Pod
metadata:
  name: app-using-object-storage
spec:
  containers:
    - name: app
      image: your-app-image:latest
      env:
        - name: AWS_ACCESS_KEY_ID
          valueFrom:
            secretKeyRef:
              name: myapp-data-access
              key: accessKeyID
        - name: AWS_SECRET_ACCESS_KEY
          valueFrom:
            secretKeyRef:
              name: myapp-data-access
              key: secretAccessKey
        - name: S3_ENDPOINT
          valueFrom:
            secretKeyRef:
              name: myapp-data-access
              key: endpoint
```
Your application now consumes S3-compatible object storage provisioned via VAST, fully automated via Kubernetes.

How This Reflects VAST’s COSI Capabilities
- Dynamic Provisioning: The spec-driven bucket creation (via BucketClaim) automates request handling, bucket creation, and credentials—all without manual intervention .
- Modern Authentication: Support for VMS token-based authentication (available in VAST Cluster 5.3+) enhances security and reduces the need for static credentials .
- Tight Integration with VAST Policies: The BucketClass maps directly to backend policy configurations in VAST—e.g., snapshot-enabled classes, performance tiers, etc.—so storage behavior is precisely controlled .
Summary

With this YAML workflow, developers can request object storage through Kubernetes, and the VAST COSI driver will handle provisioning, access control, and lifecycle—all aligned with enterprise-grade policies and authentication strategies.

Let me know if you’d like to include a full end-to-end blog outline or further illustrations—like diagramming this workflow, best practices on bucket naming, cleanup strategies, or operational guidance!

see a demo all it all looks, below

please always use the official documentation as things may change

https://support.vastdata.com/s/document-item?bundleId=vast-cosi-driver-2.6-administrator-s-guide&topicId=about-vast-cosi-driver.html&_LANG=enus
September 9, 2025