Index

This is some text inside of a div block.

Iceberg v3 on Akave: How to Build a Verifiable Lakehouse on Immutable Object Storage

Iceberg v3 introduces row lineage and deletion vectors, but doesn't verify whether the underlying Parquet files were modified after write. This post examines how Akave's cryptographic eCID tagging and immutable S3-compatible storage fill that gap, and walks through the full configuration to deploy a verifiable Iceberg v3 lakehouse on Databricks or Spark.

Stefaan Vervaet

May 5, 2026

Databricks shipped Apache Iceberg v3 into Public Preview in April 2026, available in Databricks Runtime 18.0 and above. Three things changed: deletion vectors that make row-level updates up to 10x faster, row lineage that stamps every row with a permanent _row_id, and native VARIANT support for semi-structured data.

Here's what didn't change: Iceberg still stores everything, data files, metadata manifests, snapshot logs, on object storage. The format tracks which rows changed. It can't prove whether the Parquet file containing those rows was modified after it was written.

That's the gap. Pair Iceberg v3 with Akave's eCID-tagged, immutable S3-compatible storage and you get a lakehouse where every table manifest carries a cryptographic proof of state, not just a change log.

‍

What Iceberg v3 Actually Changes

Iceberg v3 is a format-level upgrade. It doesn't touch the catalog, the query engine, or the storage layer, it changes how Iceberg represents data files and row-level metadata inside those files.

Feature	v2	v3
Row deletes	Positional delete files (one per deleted row)	Deletion vectors (bitset in Puffin file, one per data file)
Row identity	None	`_row_id`, permanent unique identifier per row
Change tracking	Snapshot-level	Row-level via `_last_updated_sequence_number`
Semi-structured data	Cast to string or nested struct	Native `VARIANT` type
Default column values	Not supported	Schema-level defaults
Multi-argument partitioning	Single-argument transforms only	Composite bucket and date functions

Two features drive most of the interest for data platform teams:

Deletion vectors consolidate row-level deletes into a single compact bitset, stored as a Puffin file, attached to each data file. Instead of rewriting entire Parquet files on every update, the engine writes the deletion vector alongside the original. Up to 10x faster than copy-on-write, per Databricks.

Row lineage gives every row a permanent _row_id and a _last_updated_sequence_number reflecting the last commit that touched it. Any downstream engine can identify exactly which rows changed between two snapshots without a full table scan.

Engine Support as of April 2026

v3 is in Public Preview on Databricks (Runtime 18.0+). AWS announced support for v3 deletion vectors and row lineage in November 2025. Most other engines, Snowflake, BigQuery, Trino, Dremio, StarRocks, RisingWave, read and write Iceberg v1/v2 today; full v3 feature support is in progress. The practical approach: write v3 tables from Databricks or Spark, and let other engines read the underlying Parquet files without needing v3 awareness.

What Iceberg Actually Needs From Object Storage

Not all S3-compatible stores behave identically. The Apache Iceberg spec requires four things from storage:

1. Immutable writes. Once a Parquet data file or metadata manifest is written, it's never moved or altered. Iceberg's snapshot isolation depends on this. A store that silently mutates objects breaks the format's consistency model.

2. Seekable reads. Query engines read specific byte ranges within Parquet files, column chunks, footer metadata. The store must support range requests (Range header semantics).

3. Delete operations. Compaction, snapshot expiry, and orphan file cleanup all issue DELETE calls. No additional configuration should be required.

4. Multipart upload. Iceberg's S3FileIO uploads large Parquet files in parallel parts as each becomes ready. Required for write performance at scale.

Akave's S3-compatible API supports all four: PUT, GET (including range reads), DELETE, LIST, multipart upload, presigned URLs, and bucket policies. One caveat: conditional write semantics (If-None-Match / If-Match for optimistic concurrency) are an edge case where Akave's behavior may differ from native AWS S3. If your catalog implementation depends heavily on conditional writes, validate against Akave's API docs before moving production write paths.

‍

Why Akave Is the Right Storage Layer for Iceberg v3

Every major object store can serve Iceberg files. What Akave Cloud does differently happens at write time.

When a Parquet file is written to Akave, it gets a content identifier (eCID), a cryptographic hash of the object's exact content at that moment. The eCID is anchored to an on-chain audit trail on Avalanche's L1, independent of the application that wrote the data and independent of Akave itself.

Here's how that maps against what Iceberg v3 already gives you:

Iceberg v3 Capability	What It Tells You	What Akave Adds
Row lineage (`_row_id`)	Which rows changed and when (logical table level)	Whether the Parquet file containing those rows was modified after write
Snapshot history	Which commits happened and in what order	Whether the metadata manifest files are unchanged since writing
Deletion vectors	Which rows are logically deleted per data file	Whether the Puffin file encoding the deletion vector is unmodified
Table metadata	Schema evolution, partition spec changes over time	Whether `metadata.json` matches the version that passed governance review

Iceberg's _row_id tells you which rows changed. Akave's eCID tells you whether the file was tampered with. Different layers, different questions.

Here's why that matters in practice. A compliance reviewer asks: "Was the customer dataset used in the March 28th batch run the same version governance approved?"

Iceberg alone answers: "The snapshot history shows no writes between approval and run."

Iceberg on Akave answers: "The snapshot history shows no writes, and the eCiD of every Parquet file in that snapshot matches the eCIDs recorded at governance sign-off."

The second answer survives adversarial scrutiny. The first doesn't.

How to Configure Iceberg v3 on Akave

Step 1: Point your catalog at Akave

Akave uses the same endpoint/credential model as AWS S3. The S3FileIO config is a drop-in swap.

spark = SparkSession.builder \
    .config("spark.sql.extensions",
            "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions") \
    .config("spark.sql.catalog.akave_catalog",
            "org.apache.iceberg.spark.SparkCatalog") \
    .config("spark.sql.catalog.akave_catalog.type", "rest") \
    .config("spark.sql.catalog.akave_catalog.warehouse",
            "s3://your-bucket/warehouse/") \
    .config("spark.sql.catalog.akave_catalog.io-impl",
            "org.apache.iceberg.aws.s3.S3FileIO") \
    .config("spark.sql.catalog.akave_catalog.s3.endpoint",
            "https://o3.akave.xyz") \
    .config("spark.sql.catalog.akave_catalog.s3.access-key-id",
            "<YOUR_AKAVE_ACCESS_KEY>") \
    .config("spark.sql.catalog.akave_catalog.s3.secret-access-key",
            "<YOUR_AKAVE_SECRET_KEY>") \
    .config("spark.sql.catalog.akave_catalog.s3.path-style-access",
            "true") \
    .getOrCreate()

‍

Step 2: Create a v3 table

CREATE TABLE akave_catalog.db.transactions (
    id          BIGINT,
    customer_id BIGINT,
    amount      DECIMAL(18,4),
    payload     VARIANT,
    created_at  TIMESTAMP
)
USING iceberg
TBLPROPERTIES (
    'format-version'    = '3',
    'write.delete.mode' = 'merge-on-read'
)

format-version=3 enables deletion vectors and row lineage. write.delete.mode=merge-on-read activates deletion vectors for all subsequent DELETE and UPDATE operations.

‍

Step 3: Verify eCID assignment

After the first write, check that Akave has assigned eCiDs to your Parquet files using the AWS CLI:

aws s3api head-object \
  --bucket <your-bucket> \
  --key /<path-to-your-iceberg-data>/<your-manifest-file> \
  --endpoint-url <your-akave-endpoint-url>

Each object shows its eCID alongside standard S3 metadata with the custom metadata header “Network-Root-Cid”. For governance workflows, capture these eCIDs at the time of each data load or approval checkpoint, they're the reference for future comparisons.

‍

Step 4: Connect other engines

Snowflake, Trino, Athena, and DuckDB can read your Iceberg tables from Akave using standard S3-compatible storage configs. Engine-level v3 support isn't required to read the data files.

Snowflake External Volume:

CREATE EXTERNAL VOLUME akave_iceberg_vol
  STORAGE_LOCATIONS = (
    (
      NAME             = 'akave-primary'
      STORAGE_PROVIDER = 'S3COMPAT'
      STORAGE_BASE_URL = 's3compat://your-bucket/warehouse/'
      AWS_KEY_ID       = '<AKAVE_ACCESS_KEY>'
      AWS_SECRET_KEY   = '<AKAVE_SECRET_KEY>'
      STORAGE_ENDPOINT = '<AKAVE_ENDPOINT>'
    )
  );

‍

Step 5: Two-layer audit check

For governed datasets, run both checks:

-- Rows changed since last governance checkpoint
SELECT
    _row_id,
    _last_updated_sequence_number,
    col1,
    col2
FROM akave_catalog.db.transactions
WHERE _last_updated_sequence_number > <last_approved_snapshot_seq>;

# Parquet files unchanged since governance checkpoint
aws s3api head-object \
  --bucket <your-bucket> \
  --key /<path-to-your-iceberg-data-directory>/governance-checkpoint-2026-03-28.json \
  --endpoint-url <your-akave-endpoint-url>

The Iceberg query tells you what changed. The eCID check proves the files are the same files governance reviewed.

‍

What It Costs?

Take Bridgepoint Analytics, a mid-size data platform team running a 60TB Iceberg lakehouse (50TB data files, 10TB metadata and snapshot history), with 25TB of monthly egress across Snowflake, Trino, and Spark.

Line Item	AWS S3	Akave Cloud
Storage (60 TB × monthly rate)	$1,413 / month ($23.55 / TB)	$899.40 / month ($14.99 / TB)
Egress (25 TB / month)	$2,125 / month ($85 / TB avg)	$0
API request fees	~$50–100 / month	$0
Total monthly	~$3,600	$899
Annual delta	—	~$32,400 saved

The multi-engine read pattern makes this worse on AWS S3 specifically. Every engine, Snowflake, Trino, Databricks, reading from the same tables generates its own egress line. Three engines reading the same 25TB means 75TB of egress charges. On Akave, it's still zero.

‍

What You Don't Get Elsewhere

Three things Akave gives you that no other object store does:

Tamper-evident Parquet files.

Every file has a eCID, a cryptographic fingerprint of its content at write time. A pipeline bug, a compromised credential, or deliberate modification all show up as a eCID mismatch.

An audit trail that survives your stack.

eCIDs are anchored to Avalanche's L1, not your application infrastructure. They're there regardless of what happens to your Spark version, your catalog implementation, or your cloud provider. A eCID recorded today is verifiable in five years.

Zero egress across the full query federation.

The Iceberg-anywhere promise only works economically if your storage doesn't bill per read. AWS S3 does. Akave doesn't.

‍

Where Akave Fits, and Where It Doesn't

Use it when:

Your lakehouse runs multiple query engines and cross-engine egress is eating into your budget
You're in a regulated industry where file-level provenance matters beyond Iceberg's snapshot history
You're on Databricks Runtime 18.0+ or Spark and want v3 features (deletion vectors, row lineage) backed by verifiable storage
You're migrating off AWS S3 and want a genuine drop-in S3-compatible target

Think twice if:

Your catalog implementation relies heavily on conditional writes (If-None-Match / If-Match semantics), validate Akave's API behavior against your specific catalog before moving production write workloads
Raw query scan latency is your primary metric, there are no public Iceberg-on-Akave performance benchmarks yet; treat this as a validation item before committing
You need v3-specific features (row lineage queries, deletion vector reads) natively in Dremio, StarRocks, or BigQuery, those engines are still on v1/v2 as of April 2026

‍

FAQ

What is a verifiable lakehouse?

A verifiable lakehouse is one where data integrity can be independently proven, not just tracked. Iceberg v3 adds row-level change tracking through row lineage (_row_id) and deletion vectors, telling you which rows changed and when. Akave Cloud adds cryptographic content identifiers (eCIDs) to every Parquet file and metadata object at write time, anchored on-chain via Avalanche's L1. Iceberg tells you what changed at the logical table level; Akave proves the underlying files weren't modified at the physical storage level.

Does Akave support all the S3 operations Apache Iceberg requires?

Yes for the core operations. Akave's S3-compatible API supports PUT, GET (including range reads for Parquet column chunks), DELETE, LIST, multipart upload, presigned URLs, and bucket policies, everything Iceberg's S3FileIO needs. The one edge case: conditional write semantics (If-None-Match / If-Match headers used by some catalog implementations for optimistic concurrency). Test your catalog implementation against Akave's API docs before moving write-heavy production workloads.

Which query engines can read Iceberg tables stored on Akave?

Any engine with S3-compatible storage support, Snowflake (via External Volumes), Databricks, Spark, Trino, Athena, DuckDB, and others. The tables are standard Parquet + Iceberg metadata. Engine-level v3 feature support varies, but reading the underlying data files works across engines regardless of v3 status.

How much does an Iceberg lakehouse cost on Akave vs AWS S3?

Akave charges $14.99/TB/month with zero egress and zero API request fees. AWS S3 Standard is $23.55/TB/month (first 50TB) plus $0.09/GB egress. For a team with 60TB stored and 25TB monthly egress, that's roughly $3,600/month on AWS S3 versus $899/month on Akave, about $32,000/year. That gap grows with every engine you add to the federation.

Is Iceberg v3 production-ready on Akave today?

On Databricks Runtime 18.0+ or Spark: yes. AWS has supported v3 deletion vectors and row lineage since November 2025. The v3 spec is stable. What's still in progress is full v3 support across all query engines, Snowflake, BigQuery, Trino, Dremio, StarRocks are primarily on v1/v2 as of April 2026. If you need v3-specific features across every engine in your federation simultaneously, check engine support status before setting format-version=3.

If Iceberg's snapshot history already tracks changes, why does Akave's eCID matter?

Iceberg's snapshot history lives inside the Iceberg metadata files, it's a self-referential record. It tells you a transaction happened, but it can't prove the data files that transaction touched haven't been modified since. Akave's eCID is generated at the storage layer and anchored on-chain, external to the application that wrote the data. A eCID comparison catches file tampering that's completely invisible to Iceberg's snapshot history.

Can I migrate an existing Iceberg v2 lakehouse on AWS S3 to Akave without downtime?

For standard S3 object-storage patterns, Parquet files and metadata via S3FileIO, migration is typically an endpoint and credential swap in your Spark or catalog config as well as a data migration. Teams using AWS-specific features (Lambda triggers on S3 events, KMS-managed encryption keys, Lake Formation policies) need separate planning for those components. The Iceberg data files themselves are format-portable; the integration surface to validate is your catalog and access control layer, not the data.

Get Started

Modern Infra. Verifiable By Design

Whether you're scaling your AI infrastructure, handling sensitive records, or modernizing your cloud stack, Akave Cloud is ready to plug in. It feels familiar, but works fundamentally better.

Try Risk-free

Meet With Us

Check Out Our Docs ›