The Neocloud Storage Problem: Why GPU-Rich Clouds Still Need a Persistent Object Store

Moving 50TB from AWS S3 to a neocloud costs $4,300 in egress before your first GPU fires. Neoclouds solved compute—not persistent, S3-compatible storage. Akave Cloud sits as a neutral layer: $14.99/TB, zero egress, native Iceberg, and on-chain audit trails. For 100TB with 50TB monthly egress, that's $1,499/month vs $6,620 on AWS—77% savings.
Stefaan Vervaet
April 3, 2026

Over $4,000 in egress fees before your first GPU fires. That's what moving 50 TB from AWS S3 to a neocloud costs. Not storage. Not compute. Just access to your own data.

The neocloud revolution delivered cheap, abundant GPU compute. What it didn't focus on was an S3 compatible object store to put the data persistently at scale. Many ML teams find this out on their first real training run.

Neoclouds Solved Compute. Storage Is a Different Problem.

The neocloud market crossed $23 billion in 2025, with Q2 2025 revenues up over  200% year-over-year. CoreWeave posted $5.1 billion in FY2025 revenue and went public in March 2025. GPU availability, once the binding constraint on AI development, is no longer the bottleneck.

But compute and storage are different problems. Neoclouds were built to solve one.

Most neocloud providers offer compute-adjacent storage — local NVMe, NFS shares, or managed parallel filesystems,  designed to feed GPUs during training. Lambda Labs provides persistent NFS volumes. Together AI manages Weka and VAST filesystems. Vast.ai offers host-locked persistent volumes. But none of these are durable, S3-compatible object stores where your data lives long-term. Together AI’s own documentation tells users to “schedule a pod that can download from S3” for large datasets. The data still lives somewhere else.

CoreWeave launched AI Object Storage in late 2025. It's S3-compatible, uses Local Object Transport Accelerator (LOTA) technology, and includes automatic tiering. The important takeaway is not that neocloud storage is solved. It's that even the most advanced neocloud had to add storage after the fact.

Backblaze reinforced the same point from outside the neocloud stack. Its B2 Neo launch in early 2026, anchored by a $15 million first deal, shows the market now recognizes storage as missing infrastructure in the neocloud stack. But that still doesn't solve the deeper issue: provider-owned storage ties your data layer to someone else's roadmap.

The Storage Problem: What's Actually Missing From Neoclouds?

For most neoclouds, the problem is not that there is zero storage,  it’s that the storage available is compute-adjacent, not durable. Your data still starts on AWS S3 or Google Cloud Storage.

. Your data stays on AWS S3 or Google Cloud Storage. Every training run starts with data movement. Every transfer has a cost.

CoreWeave is one response to that gap. But even its storage has limits that matter for AI workloads. CoreWeave does not include a built-in Apache Iceberg catalog, requiring teams to configure external catalog services like Hive Metastore to make Parquet datasets queryable for analytics pipelines.

.

That last point is the deeper problem. Store data in CoreWeave's object storage and you've traded AWS lock-in for CoreWeave lock-in. Move to a different neocloud next quarter, because pricing changes, or capacity tightens, or a better H100 deal comes along, and you rebuild the data layer from scratch.

Compute-provider-owned storage solves the problem inside one provider's walls. It doesn't solve portability, lineage, or compliance.

Why Storage Matters More for AI Workloads Than Traditional Cloud Apps?

A traditional web application tolerates storage friction. Read occasionally, write occasionally, serve requests. The economics hold.

AI workloads are different. They're storage-intensive in ways that compound.

Training datasets scale to terabytes and petabytes. Large models require checkpoints saved during training to survive hardware failures, which happen often in large GPU clusters.  A trillion-parameter model generates checkpoints of approximately 15 TB, the majority of which is optimizer state rather than model weights.. IBM Storage Scale demonstrated 656 GiB/s read bandwidth to restore one of those checkpoints with minimal delay. If your storage is off-cluster, that bandwidth requirement becomes a latency event and a cost event on every checkpoint write and load.

The iteration cycle makes it worse. You don't train a model once. You train, evaluate, retrain on new data, adjust hyperparameters, and retrain again. Each iteration that moves data from S3 to a neocloud pays the egress toll. A 100 TB training dataset costs roughly $7,800 in egress fees from S3 before the first experiment result.

Beyond cost, there are compliance requirements. EU AI Act Article 10 requires documented training data provenance: a verifiable record of what data was used, when it was modified, and how it was transformed. Most cloud object storage systems log access and modification events in files that administrators can change. That's compliance based on provider-controlled logs, which may not satisfy stricter audit requirements. For high-scrutiny AI systems, teams need something cryptographically verifiable.

Teams building multi-neocloud or hybrid AI pipelines need storage that doesn't belong to any compute provider. You can't build a portable data layer on a neocloud's own storage product.

Current Workarounds (and Why They're All Painful)

When ML teams hit the storage gap, three options surface. Each solves part of the problem, but leaves critical gaps.

Keep data on S3, pay egress to the neocloud. The math: $0.09/GB. 50 TB costs approximately $4,300 per move. A petabyte costs approximately $56,000. For teams running frequent training runs or continuous retraining pipelines, this becomes the largest line item on their infrastructure bill, not GPU time.

Copy data to neocloud local storage. No egress bill. But no durability guarantees, no Iceberg. When you switch neoclouds, you move the data again without a clean migration path. You've traded one problem for a data silo.

Use Cloudflare R2 or Backblaze B2. Zero egress, yes. But no Apache Iceberg. No cryptographic storage proofs. Backblaze B2 Neo is a white-label product for neocloud operators building B2B storage services, not a neutral layer that ML teams configure directly. These products are adjacent to the problem, not a solution to it.

The Solution: Neutral Storage Purpose-Built for AI Workloads

The fix is architectural: storage that sits outside any compute provider.

An S3-compatible API. Your training scripts, orchestration tools, and analytics queries don't change. One endpoint swap.

Zero egress fees. Ingest once, then run training jobs as many times as you need. Iterate without a recurring cost penalty. $0 per gigabyte moved.

Native Apache Iceberg. Parquet-based datasets become queryable for Snowflake, Spark, and Trino without extra tooling. Those same workflows you already run, pointed at Akave instead of S3.

Compatibility with any neocloud. CoreWeave today, Crusoe next quarter, hybrid after that. Your storage doesn't change.

Tamper-evident audit trails. On-chain via Avalanche L1, cryptographically verifiable. Modifications are detectable regardless of who has administrative access, including Akave. For teams facing EU AI Act scrutiny, that means a tamper-evident record of where training data lived, when it changed, and who accessed it. That's the proof layer compliance teams actually need at the data layer.

A neutral storage layer designed for those constraints looks like this. Akave Cloud provides that architecture: $14.99/TB/month, $0 egress, $0 per-request API fees, S3-compatible, Iceberg-native, with on-chain audit trails that work with any neocloud's compute.

For a team running 100 TB of training data with 50 TB monthly egress:

Component AWS S3 approach Akave Cloud Plus
Storage (100 TB) ~$2,300 / month $1,499 / month
Egress (50 TB) ~$4,300 / month $0
API fees ~$20 / month $0
Monthly total ~$6,620 $1,499
Annual total ~$79,440 $17,988

Calculated from official AWS and Akave pricing pages, March 2026.

That's a 77% reduction, achieved by fixing the storage layer, not by switching neoclouds. Even if the first 100 TB migration costs roughly $7,800 in one-time S3 egress, the recurring savings in this example pay that back in well under two months.

Your Storage Shouldn't Belong to Your Compute Provider

The neocloud era didn't just change where you rent GPUs. It created an architectural question most teams answer with a workaround: keep the data on S3 and pay to move it. That workaround compounds with every training run, every retraining cycle, every experiment.

The math is simple. Take your monthly egress volume from S3. Multiply by $0.09. That's what you're paying to access your own data. Run the number yourself.

When you're ready to stop feeding the egress meter: migrate one bucket, keep your S3 code, and see the difference at akave.com.

FAQ

What is the neocloud storage problem?

The neocloud storage problem is the gap between where AI teams rent GPU compute and where their data actually lives. In many stacks, compute runs on a neocloud while data stays on AWS S3 or Google Cloud Storage. That creates a split-stack architecture: every training run starts with data movement, and every transfer adds cost, latency, or lock-in. The issue is not just missing storage. It's that the storage layer usually belongs to someone else's compute roadmap.

If we already keep training data in S3, why change anything?

If you rarely move data, S3 may be fine. The problem starts when training, retraining, checkpoint restores, and multi-engine access make data movement routine instead of occasional. In the model used in this blog, 50 TB of egress costs roughly $4,300 per move. That turns storage from a background line item into a recurring tax on experimentation.

What does the split-stack model actually cost at 50 TB or 100 TB?

Using the pricing model in this article, 50 TB of egress from S3 is roughly $4,300 per move, and a one-time 100 TB migration is roughly $7,800. In the example table, 100 TB of storage plus 50 TB of monthly egress costs about $6,620 per month on the AWS path versus $1,499 on Akave Cloud Plus. That is why the split-stack model is a recurring tax, not a one-time inconvenience. The larger the training loop, the worse the economics get.

Why does this matter for AI governance and audit requirements?

Cost is only half the problem. EU AI Act Article 10 raises the bar for documenting training data provenance: what data was used, how it changed, and who accessed it. Provider-controlled logs can help operationally, but stricter audits may require stronger evidence than logs the provider controls. That is where tamper-evident audit trails matter.

Where does Akave Cloud fit in a neocloud architecture?

Akave sits as the neutral storage layer underneath the compute choice. The point is not to replace every tool in the stack. It's to keep S3-compatible workflows, add native Apache Iceberg, remove recurring egress fees, and keep audit trails on Avalanche L1. That lets teams run compute on CoreWeave, Crusoe, or another provider without rebuilding the data layer each time.

Try Akave Cloud Risk Free

Akave Cloud is an enterprise-grade, distributed and scalable object storage designed for large-scale datasets in AI, analytics, and enterprise pipelines. It offers S3 object compatibility, cryptographic verifiability, immutable audit trails, and SDKs for agentic agents; all with zero egress fees and no vendor lock-in saving up to 80% on storage costs vs. hyperscalers.

Akave Cloud works with a wide ecosystem of partners operating hundreds of petabytes of capacity, enabling deployments across multiple countries and powering sovereign data infrastructure. The stack is also pre-qualified with key enterprise apps such as Snowflake and others. 

Infra moderne. Vérifiable dès la conception

Whether you're scaling your AI infrastructure, handling sensitive records, or modernizing your cloud stack, Akave Cloud is ready to plug in. It feels familiar, but works fundamentally better.