At NVIDIA GTC 2026, Fortanix demonstrated exactly what a hardware-enforced AI factory looks like. Confidential pipeline. Composite attestation. HSM-gated key release. Hopper and Blackwell GPUs running inside trusted execution environments where even the infrastructure operator can't see what's being processed. TELUS and Fortanix announced they'd extended this to Canadian data sovereignty, cryptographic proof that data never leaves Canadian jurisdiction during AI training and inference. NTT DATA brought the same architecture to India for DPDP Act compliance.
It's genuinely impressive infrastructure. And it solves exactly what it says it solves: securing the compute layer.
Now ask a different question. Ask what happens to your training dataset before it enters the enclave. Ask where the proof lives that the data your model trained on is the same data your data governance team approved, not a modified version, not a swapped version, not a version someone touched between your compliance review and the moment it crossed into the TEE.
That question doesn't have an answer in the Fortanix architecture. Not because Fortanix made a mistake. Because it's a different layer of the problem, and right now, almost nobody is building it.
What Confidential Computing Actually Guarantees, and Akave's Case for What It Doesn't?
Confidential computing is a hardware-level technique that keeps data encrypted and isolated while it is being processed, not just at rest or in transit. A trusted execution environment (TEE) creates a hardware-enforced boundary around a computation: the enclave runs on the CPU or GPU, the host operating system cannot read its memory, and remote attestation produces cryptographic evidence that the computation ran in verified hardware with verified software.
This is a meaningful guarantee. It means a cloud provider, a neocloud operator, even a nation-state with physical access to the server cannot extract what the model is processing during inference. For regulated industries, healthcare, financial services, defense contractors, it addresses a real and specific problem.
What confidential computing does not guarantee: that the data entering the enclave was authoritative, unmodified, and provenance-verified at the time of ingestion. The TEE attests that the computation happened correctly. It does not attest to the integrity of its inputs before they arrived.
This is not a limitation unique to Fortanix. It's inherent to the architecture. TEEs are designed to secure in-use computation, not to verify the supply chain of the data feeding that computation. Those are different problems that require different solutions.
Why the Input Boundary Is Where AI Factories Are Actually Vulnerable?
The attack surface that confidential computing can't reach is the data pipeline itself: the storage buckets, preprocessing scripts, feature stores, and dataset registries that exist between data collection and model ingestion.
Research from Columbia, NYU, and Washington University found that as few as 50,000 manipulated articles added to a public training dataset were sufficient to corrupt medical LLMs, producing systematically biased outputs that persisted even after retraining on clean data. JFrog's security research team identified approximately 100 malicious models on HuggingFace, with embedded code execution payloads that established reverse shell connections upon loading, that accumulated thousands of downloads before detection.
Neither attack vector is stopped by a TEE. Both attacks happen upstream of the enclave boundary.
A TEE running a poisoned dataset produces a clean attestation. The HSM releases keys because the hardware is legitimate and the software stack is verified. The pipeline runs exactly as designed, faithfully training on data that someone with access to a preprocessing script or a storage bucket modified days or weeks earlier.
This is the AI factory sovereignty gap: you've secured the factory floor, but you haven't secured what gets delivered to the loading dock.
How Akave's Verifiable Storage Layer Closes the Gap?
What makes Akave's storage architecture different, and specifically relevant to confidential AI pipelines, is that provenance verification happens at write time, not at audit time.
When a dataset, checkpoint, or feature file is written to Akave Cloud, it receives a content identifier (CID): a cryptographic hash of the object's exact content at the moment of writing. What this means in an AI factory context: every dataset version that a compliance team or data governance function approves gets a CID at the moment of approval. When the TEE is ready to
ingest that dataset, a CID comparison between the approved version and the version being loaded answers the question the TEE cannot: was this modified after approval? The CID either matches or it doesn't. The proof doesn't depend on logs, on access records, on trusting the preprocessing pipeline, or on anyone's word.
This is the data-side complement to Fortanix's compute-side confidentiality. Fortanix proves the computation was clean. Akave proves the inputs were. Together, they close the loop that neither can close alone.
AI Factory Sovereignty Is a Two-Layer Problem, and Akave Addresses the Data Side
The "sovereign AI" framing that Fortanix, NVIDIA, and NTT DATA are building around is the right frame. But sovereignty has two dimensions that the current confidential computing narrative conflates.
1. Compute sovereignty,.
proving that inference happened in verified hardware within a specified jurisdiction, that the operator couldn't see the data, and that the model wasn't tampered with during execution. Confidential computing with hardware attestation addresses this directly and well.
2. Data sovereignty,
proving that the data the model trained on or inferred from was the authoritative version, approved by the right people, geofenced to the right jurisdiction, and unmodified from the moment of governance sign-off to the moment of ingestion. This is what verifiable storage addresses.
The gap matters most in regulated environments, the same environments driving confidential computing adoption. A healthcare organization deploying AI in a TEE needs to demonstrate to auditors not just that the inference was confidential, but that the patient dataset the model was trained on was the same dataset the IRB reviewed. A financial institution running agentic AI inside an enclave needs to prove that the market data it acted on wasn't modified between the data vendor's delivery and the model's consumption. A government agency using confidential AI for sensitive classification tasks needs a data provenance chain that survives any legal challenge to the AI's outputs.
The EU AI Act's provisions on high-risk AI systems are already pointing at this. Demonstrating compliance isn't just about how the model ran, it's about proving what it ran on. We've written in detail about how Akave's cryptographic data provenance maps directly to EU AI Act requirements; the same logic applies to every sovereignty framework being deployed alongside confidential computing.
Looking Ahead
The confidential computing market is maturing fast. Gartner has placed it among the core infrastructure technologies shaping enterprise AI over the next five years. NVIDIA's hardware attestation capabilities are in active deployment across neoclouds and on-prem AI factories. The regulatory frameworks, EU AI Act, India DPDP, Canada's PIPEDA successor, emerging US federal AI governance, are all converging on the same requirement: demonstrable trust at every layer of the AI supply chain.
Right now, that trust story has a conspicuous gap at the data layer. The narrative is "our AI runs in a verified enclave." The missing chapter is "and the data it ran on was verified before it entered."
The teams that close this loop now, building data provenance into their AI factory architecture before regulators require it, will have an auditable, demonstrable answer when the question arrives. The teams that rely on the compute layer alone will discover that "we had a TEE" is not a complete answer to "show us the data was clean."
Get Started
If you're building AI pipelines on confidential computing infrastructure and want to close the data provenance gap, start with Akave's verifiable storage architecture at akave.com/ai-ml-workloads.
Free trial and S3-compatible integration documentation at akave.com/free-trial and docs.akave.xyz.
Further Reading
- EU AI Act Compliance Made Verifiable: How Akave Cloud Delivers Cryptographic Data Provenance, how Akave's CID architecture maps directly to EU AI Act traceability requirements
- Rethinking Content Addressing: Introducing Akave's eCID, the technical foundation for Akave's encrypted, verifiable content identifiers
- Agent Memory Is Production-Grade. Agent Accountability Isn't., how the same provenance gap affects autonomous AI agents at scale
FAQ
What is the difference between confidential computing and data provenance for AI?
Confidential computing, using trusted execution environments (TEEs), protects data while it is being processed inside the AI model: inference, training computation, and activation states remain encrypted and isolated from the operator. Data provenance addresses what happened to the data before it entered the computation: whether it was the authoritative version, who approved it, when it was last modified, and whether it was tampered with between governance review and model ingestion. Both matter for AI factory sovereignty. Neither replaces the other.
Does a TEE like Fortanix's protect against training data poisoning?
No. A TEE attests that the computation inside the enclave ran correctly on verified hardware with verified software, it does not verify the integrity of the inputs before they cross the enclave boundary. If a training dataset is poisoned or modified upstream of the TEE, the enclave will faithfully process the corrupted data and produce a clean attestation. Protecting against training data poisoning requires provenance verification at the storage layer, before ingestion, which is what Akave's CID-at-write-time architecture provides.
How does Akave's CID-based provenance work in an AI pipeline?
Akave Cloud assigns a content identifier (CID), a cryptographic hash of the object's exact content, to every dataset, checkpoint, or file at the moment it is written. That CID is anchored to an on-chain audit trail on Avalanche's L1 infrastructure. When a governance team approves a dataset version, its CID captures the approved state. Before ingestion into a training or inference pipeline, a CID comparison confirms the file matches the approved version, without relying on application logs or anyone's attestation about what happened in between.
Can Akave integrate with Fortanix's Confidential AI platform?
Akave exposes a fully S3-compatible API, which means it integrates with any pipeline that reads and writes data through configurable S3 endpoints. Practically, Akave handles the data provenance layer upstream of the TEE boundary, CID verification happens before the dataset crosses into the enclave. This does not require changes to the Fortanix architecture; it's a data-layer addition at the ingestion step. We see this as a clear partnership opportunity and are actively engaging with teams building on both platforms.
Isn't training data integrity already covered by data versioning tools like DVC or MLflow?
Data versioning tools provide lineage tracking within the application layer, they record what version of a dataset a run used, based on what the pipeline reported. That's genuinely useful for reproducibility. It does not provide an independent, tamper-evident proof that the dataset version used matches the version that was approved for use. A DVC record is written by the application and stored in application infrastructure. Akave's CID is written at the storage layer and anchored on-chain, independent of the application. The distinction is the same as the one between observability and accountability in AI agent systems: one records what the system said it did, the other proves it.
If we already have a TEE from Fortanix and HSM-based key management, why is additional storage verification needed?
Because HSM-gated key release verifies that the environment requesting the decryption key is legitimate, that the hardware is genuine and the software stack is attested. It does not verify that the data being decrypted is the version your governance function approved. Two things can both be true: the environment is legitimate, and the dataset was modified between governance sign-off and ingestion. Key management and data provenance solve different problems. You need both for a complete AI factory trust model.
What regulatory frameworks require data provenance in addition to confidential compute?
The EU AI Act requires providers of high-risk AI systems to maintain data governance documentation demonstrating that training data was appropriate, traceable, and subject to access controls, which requires provenance evidence beyond compute attestation. India's DPDP Act focuses on data handling and consent, not just processing security. The NIST AI Risk Management Framework emphasizes traceability of data throughout the AI lifecycle. In each case, "we used a TEE" addresses the compute side; proving the data was authoritative and unmodified is a separate, explicitly required demonstration.

