AI workloads, data platforms, and infrastructure notes, written from the engineering edge between benchmarks and production.

RSS feed
/ /

VAST Polaris on GCP: How to deploy it

Running VAST on premises is a known quantity. Running VAST in a cloud, with the same data, same protocols, and same operational model as your on-prem cluster and being able to stand it up and tear it down on demand,…

I

Itzik — VP Mission Alignment, VAST Data

·

·

11 min read


Running VAST on premises is a known quantity. Running VAST in a cloud, with the same data, same protocols, and same operational model as your on-prem cluster and being able to stand it up and tear it down on demand, is a newer story. That’s what VAST Polaris is: the portal and tooling that turns a GCP project into a place where VAST clusters come and go like any other cloud resource.

This is a comprehensive walkthrough: what Polaris is, how it fits into the rest of the VAST platform, how to prepare a GCP project for it, how to actually deploy a cluster (both portal and CLI)and how to seed data from on premises.

What Polaris actually is

Polaris is VAST’s online deployment portal for cloud clusters. You log in at admin.gcp.polaris.vastdata.com, point it at a prepared GCP project, and it provisions a full VAST cluster inside that project. The cluster can be ephemeral (spin up for a job, tear down after) or persistent (runs as long as you need it). Either way, you get the same services you’d get on premises:

  • File (NFS, SMB)
  • Object (S3)
  • Block (NVMe / TCP)
  • Database (structured and vector, via VAST DataBase)
  • Full VAST data reduction (global dedup, similarity-based reduction)
  • VAST DataSpace connectivity to move data between cloud and on-prem clusters

Polaris sits above a concept called a tenant. A tenant is assigned entitlement resources, essentially a capacity budget and Tenant Administrators consume that entitlement by creating cluster deployments. You, the blog reader deploying a cluster, are a Tenant Admin. The entitlement is set up by VAST and is outside the portal’s scope; your job is to consume it wisely.

Under the hood, Polaris-deployed clusters on GCP run on z3 instances with attached Local SSD. That’s important because it drives the quota conversation, more on that below.

Where Polaris fits in the VAST cloud story

VAST has a few overlapping terms worth untangling:

  • VAST on Cloud (VoC) — the general concept of running VAST clusters in a public cloud. Two deployment paths exist: the Polaris portal (newer, managed feeling) and a Terraform-based direct deployment (older, more hands-on, uses n2/n2d instances with CNodes/DNodes).
  • VAST Polaris : the portal + VastCloud CLI that provisions VoC clusters. This post’s focus.
  • VAST DataSpace : the global namespace fabric that lets on premises and cloud clusters share data without migration.
  • Global Snapshot Clone : the primary mechanism for making on-prem data instantly available to a Polaris cluster.

If you’ve seen the older “VAST on Cloud for GCP” Terraform workflow, Polaris is the evolution of it. The Terraform path still exists and is documented, but Polaris is the cleaner flow for most people , especially for ephemeral use cases.

Part 1: Preparing your GCP project

Polaris doesn’t install into a raw GCP project. You prep the project first; if you skip this, deployment fails.

1.1 Enable the required APIs

Required:

  • Compute API (Compute / VM Instances)
  • Secret Manager API (Security / Secret Manager)

Recommended: enable these unless you have a specific reason not to; most real workloads touch at least some of them:

  • Artifact Registry API
  • Compute Engine API
  • Network Management API
  • Service Networking API
  • Network Security API
  • Cloud Monitoring API
  • Cloud Logging API

1.2 Set up private networking

In the VPC Networks page, configure Private Services Access to your VPC:

  1. Allocate an IP range for services.
  2. Create a private connection to Service Networking.

Size the allocated range generously. Renumbering after the fact is painful, and Polaris clusters plus future expansion eat address space faster than you’d expect.

1.3 Cloud NAT — one per region per cluster

For every region that will host a Polaris cluster, create a Cloud NAT Gateway:

  • Region: the region containing the cluster
  • Router: create a new router
  • Network Tier: Premium

The cluster itself sits in private subnets; NAT is how it reaches the outside world for management and telemetry traffic.

1.4 Firewall rule for cluster traffic

Polaris clusters use the network tag voc-internal for intra-cluster communication. Create an ingress firewall rule:

  • Direction: ingress
  • Action: allow
  • Target tags: voc-internal
  • Source tags: voc-internal
  • Protocols and ports:
ProtocolPorts
TCP22, 80, 111, 389, 443, 445, 636, 2049, 3128, 3268, 3269, 4000–4001, 4100–4101, 4200–4201, 4420, 4520, 5000, 5200–5201, 5551, 6000–6001, 6126, 7000–7001, 7100–7101, 8000, 9090, 9092–9093, 20048, 20106–20108, 49001–49002
UDP4005, 4105, 4205, 5205–5240, 6005, 7005, 7105
ICMPenabled

Leave all other rule settings at their defaults.

1.5 Quotas

Polaris uses z3 VMs with substantial Local SSD

You need, per region that will host a cluster:

  • Local SSD quota: at least 36 TB per ENode. Default quota is almost certainly not enough. Quota increases aren’t instant — they go through a review queue at GCP, which can take a day or more. File the request early.
  • z3 CPU quota: a single ENode requires 88 CPUs. For an 8-node cluster that’s 704 CPUs; for a 14-node cluster, 1,232. Scale the request to your peak plus headroom.
  • Static routes per VPC network: enough for every IP you’ll use to reach the cluster.
  • Static routes per peering group: if your VPCs are peered across projects, the quota has to cover every connection route in the group.

1.6 Organization policy constraints

Any org policy that forbids z3 VM creation, restricts Local SSD attachment, mandates specific machine types, or forces specific images will block the deployment. Check constraints with your GCP org admin before you schedule the work. A common gotcha: policies that restrict VM types to a whitelist that omits z3.

Part 2: Getting into the Polaris portal

The first time you use Polaris, you’ll receive an invite email from a Polaris administrator (typically from VAST or your internal VAST admin). The email contains a link to sign in and set your password. You must complete this on first access.

After that, you go to admin.gcp.polaris.vastdata.com and sign in with your email and the password you set.

From the portal you can:

  • Create deployment configurations
  • Launch new clusters
  • View and manage existing clusters
  • Download the VastCloud CLI

The portal is the right tool for first-time setup and for visual exploration. For day-2 automation, the CLI is typically faster.

Part 3: The VastCloud CLI

The first time you create a deployment configuration in the portal, a dialog offers a download link for the VastCloud CLI matched to your local OS (Mac, Linux, Windows). Click it, the CLI downloads and auto configures. On subsequent deployment configs, you can skip the download step.

Command surface

VastCloud is small by design:

bash

vastcloud cluster create # create a new cluster
vastcloud cluster delete # tear a cluster down
vastcloud cluster list # list all clusters in your organization
vastcloud cluster set-default # mark a cluster as default for later commands
vastcloud cluster import # import an existing external cluster
vastcloud cluster webui # open the cluster's VMS web UI in your browser
vastcloud shell # interactive shell
vastcloud completion # generate shell completions

For anyone who’s automated AWS or GCP CLIs, this will feel familiar: small verbs, focused on cluster lifecycle, scriptable.

Part 4: Deploying a cluster

The Polaris portal walks you through the deployment form. The inputs map to what you configured in GCP:

  • Target project and region — where to deploy
  • Subnetwork — the VPC subnet the cluster sits in (private)
  • Cluster size — node count within the supported range for your entitlement
  • Cluster name — human-readable identifier
  • SSH public key — your key for access to nodes
  • Network tags and labels — GCP-side metadata for cost allocation and routing
  • Tenancy and access controls — admin identities and initial view policies

A few options worth knowing about:

  • enable_similarity — turn on similarity-based data reduction. Off by default. Flip it on when your data has low-entropy repetition patterns (genomic, seismic, log archives); leave it off if data is already compressed or truly unique.
  • enable_callhome — enables telemetry to VAST Support. Off by default. Turn it on if you have a support agreement and want proactive monitoring.
  • ignore_nfs_permissions — skips NFS/S3 permission checks on the cluster. Off by default and should stay off unless you have a specific reason (e.g., lift-and-shift from a POSIX-permissions-hostile environment).

When you submit, Polaris provisions the GCP resources and installs the cluster on top. The VAST Web UI (VMS) becomes reachable at a private IP inside your VPC. Because the cluster is in private subnets, you’ll access VMS from your own address space — either through a jump host, a peered VPC, or a VPN.

Typical provisioning time is several minutes. You’ll get endpoints for NFS/S3, the DataBase, VIPs for protocol and replication traffic, and a management URL for VMS.

A note on VIPs

VIPs for the cluster are allocated in VMS (Network Access → VIP Pools). Two rules:

  1. The VIPs must be routable to GCP from wherever clients live.
  2. They must not be in any GCP subnet or any CIDR assigned to GCP subnets.

Part 5: Seeding data from on premises

This is where Polaris really earns its keep. Cloud storage is only useful if it has your data in it, and Polaris integrates two complementary mechanisms for getting that data there: Global Snapshot Clone and async replication.

Global Snapshot Clone

A global snapshot clone makes a snapshot on the source cluster instantly available as a writable clone on the Polaris cluster. The mechanism is elegant: VAST doesn’t copy metadata the way legacy snapshot systems do, every object is time stamped with a “snaptime,” so a snapshot is just a consistent view of metadata at a given point. Cloning that across clusters is cheap and immediate.

The clone has a Background Sync option that controls how data actually moves:

Background Sync = ON

  • Replication of 100% of the data starts immediately on clone creation.
  • If users access the clone while sync is in flight, requested blocks are prioritized so reads are fast.
  • Best for: well-understood workloads where you know you’ll need most or all of the data (e.g., moving a project from on premises to a cloud cluster for a month).

Background Sync = OFF

  • Only metadata is copied initially. The clone is fully consistent and writable immediately, but data blocks are streamed from the source on demand as the cluster reads them.
  • Preserves bandwidth and target capacity, you only pay egress for what you actually touch.
  • Best for: ephemeral or sparse access workloads where the dataset is large but the working set is small (e.g., a texture library where a given model only needs a subset).

In both modes the clone is writable from the moment it’s created , writes go to the clone, not the source, so cloud-side work doesn’t affect on-prem.

Async replication

Global snapshot clones solve the “how do I get source data to the cloud” problem. Async replication solves the opposite: “how do I get results back.” Configure a protected path on the Polaris cluster with ASYNC_REPLICATION capability pointing back to on-prem, and output written in the cloud replicates on a schedule. This is the natural pattern for any workflow where the cloud is compute and on premises is the system of record.

The typical hybrid loop

Put them together and you get a clean pattern:

  1. Snapshot the on-prem source path.
  2. Global snapshot clone to the Polaris cluster (sync on or off depending on access pattern).
  3. Run the workload: training, inference, RAG indexing, analytics.
  4. Async replicate outputs back to on-prem.
  5. Destroy the Polaris cluster (vastcloud cluster delete).

The data you worked with is preserved on-prem. The cloud cluster is disposable.

Part 6: The Terraform alternative

For completeness: if you can’t use Polaris (e.g., your org mandates fully-IaC-managed infrastructure), the Terraform-based VoC path is still supported. It uses n2 (DNodes, 16 CPUs each) and n2d (CNodes, 32 CPUs each) instances and a VAST-supplied zip of Terraform files. Key differences from Polaris:

  • Terraform v1.5.4+ and gcloud SDK required locally.
  • A voc-gcp-checker tool is available to pre-validate the GCP environment before real deployment.
  • Cluster size is 8–14 nodes, configured in voc.auto.tfvars.
  • Same firewall, NAT, and private networking requirements; different quota math (n2/n2d vs z3).

For most teams, Polaris is the right default. Reach for Terraform when you specifically need IaC-native declarative management or when you’re integrating cluster provisioning into a broader Terraform pipeline.

Part 7: Landmines and lessons learned

A concentrated list of things that will save you pain:

  • File quota requests early. Local SSD and z3 CPU quota bumps are GCP-side approvals, not instant. Discovering this on deploy day adds days to the timeline.
  • Check org policies before touching the portal. A restrictive org policy will cause the Polaris deployment to fail after you’ve done all the prep work, which is maddening.
  • Size your Private Services Access range generously. A tight range becomes a ceiling you hit later.
  • Keep Polaris clusters in the same region as your compute. Cross-region data movement defeats most of the performance argument, especially for TPU/GPU workloads.
  • Model egress costs for background-sync-off clones. If your workload ends up reading the whole dataset, you’ll pay egress for the whole dataset — and it’s often cheaper to have used sync-on.
  • Back up the configuration. For Terraform deployments especially, the .tfvars and state files are load-bearing. For Polaris, your deployment configurations are stored in the portal but capture any local customizations.
  • Use separate folders per cluster. If you’re doing Terraform-based deployments, one folder per cluster. Polaris handles this cleanly in the portal.
  • Size with headroom; but you can also expand the cluster when needed.

Part 8: When to use Polaris

Polaris hits a sweet spot for a few specific patterns:

  • Burst AI training or inference. Clone the dataset to a cloud cluster, run the workload on GCP GPUs or TPUs, replicate outputs back, destroy.
  • Short term research or project work. Stand up a cluster for the duration of a project, give the team familiar NFS/S3/DB semantics, tear it down when done.
  • Cloud based DR or staging environments. Persistent Polaris clusters receiving replication from on-prem, ready to take over if needed.
  • Hybrid data fabric. Persistent clusters on both sides of a DataSpace, with global snapshot clones and async replication stitching them into one logical namespace.

Wrap

Polaris takes what used to be “VAST in the cloud is a Terraform project” and turns it into “VAST in the cloud is a button.” That’s the whole thesis. The GCP prep work is real and worth taking seriously : APIs, private networking, NAT, firewall, quotas, org policies but it’s one time. After that, cluster lifecycle lives in the portal and the CLI, and your data strategy lives in snapshot clones and async replication.

below, you can see a demo, showing how to deploy VAST Polaris in GCP


Reference docs on the VAST KB:

Discover more from Lots of Data - Thoughts around AI Workloads

Subscribe now to keep reading and get access to the full archive.

Continue reading