A $10B AI Company Was Breached Through Its LLM Gateway. Could You Prove What Your Agents Touched?

In late March 2026, Mercor, a $10B AI talent company, confirmed it was breached. The entry point was LiteLLM, the open-source LLM gateway sitting inside an estimated 36% of cloud environments. Poisoned versions ran live for about 40 minutes; that was enough. After an incident like that, every agentic-AI team faces the same question: can you prove what data each agent actually touched, and that it matched what you approved? A Grantex audit of 30 leading frameworks (500K+ GitHub stars) found 0% provide per-agent cryptographic identity and 87% produce no action log. Observability can't close that gap. A cryptographic record at the storage layer can.
Stefaan Vervaet

European governments have spent eighteen months announcing sovereign compute, AMD–France, the UK's £500M Sovereign AI Unit, BT/Nscale on 14MW of UK capacity. Every announcement focuses on where the compute lives.

The EU AI Act will eventually ask a different question. Not where the compute lives, what evidence the data layer can produce when an auditor asks where the training data came from, who held it, and whether anyone (including the cloud vendor) could have modified it without the operator knowing. Compute without verifiable data is theatre.

Provenance can't be retrofitted. That's the architectural fact under the deadline politics.

The storage substrate has to land before the training data does. Get Free trial at Akave's sovereign cloud storage, no migration.

Where the deadline actually sits today

Annex III enforcement is currently scheduled for August 2, 2026, biometric ID, critical infrastructure, education, employment, essential services, law enforcement, migration, justice. On May 7, 2026, the European Parliament and Council reached a provisional trilogue agreement on Digital Omnibus VII proposing to defer the Annex III standalone deadline to December 2, 2027 and embedded-product obligations to August 2, 2028. Both co-legislators have stated intent to formalize adoption before August 2026.

Until the Official Journal publishes adoption, August 2026 legally binds. But the planning horizon most operators are sequencing against is December 2, 2027, sixteen months. That makes the cost of the architecture mistake larger, not smaller: more training data accumulates on storage that can't produce ingestion-time evidence.

Fines stack: €15M or 3% on high-risk; €35M or 7% on prohibited practices; plus GDPR Article 83(5) (€20M or 4% floor on the maximum) and sectoral regulators.

The audit lands on the data layer, not the GPU

Article 12 requires automatic event recording over the system's lifetime. Article 17 covers data management. Article 19 covers automatic logs (Article 26(6) sets a six-month deployer retention floor). Article 72 covers post-market monitoring. None are answered by GPU location, they're answered by what the storage layer can produce.

A French AI factory on AMD GPUs is still trained on data assembled in AWS S3 in us-east-1. The compute moved. The provenance chain didn't. The audit question has no EU-anchored answer when the upstream provenance is hyperscaler-managed.

Three properties the data layer has to have

  • Jurisdictional independence at the architecture layer.
    • “EU-headquartered vendor, so CLOUD Act doesn't apply” is a corporate-diagram argument that collapses on cross-border data flow. The architecture-layer answer: data sharded across independent storage operators via erasure coding. No single operator, including Akave, holds reconstructible plaintext.
  • Cryptographic provenance at ingestion.
    • Every object lands with a Proof-of-Data-Possession (PDP) attestation, a cryptographic proof generated at write time, anchored on a dedicated immutable storage ledger separate from the object store. The auditor verifies, without trusting Akave, that the dataset used in March is bit-for-bit identical to what's on storage in November.
  • Protocol-level immutability with detection guarantees.
    • Not “no administrator can modify the data”, an absolute claim that breaks under scrutiny, but “any modification is independently detectable” through hash mismatch against the ledger.

Why hyperscaler cloud storage can't bolt this on?

The hyperscaler answer, KMS, EU region, SOC 2 report, is compliance through trust assertion. CloudTrail logs are produced by the same vendor whose actions they document. Article 12 increasingly asks for evidence verifiable independently. You can't bolt that onto a trust-based architecture.

Mapping Annex III exposure against the trilogue trajectory? Reach Akave via the sales form for a workload-specific architecture review.

Provenance can't be retrofitted, that's the real deadline

The ingestion-time proof has to be generated when the data lands. There's no retroactive path. The training data assembled today is the data the audit will inspect in 2027 and 2028. Proofs invented after the fact don't exist.

The sequence that works: stand up the verifiable storage substrate first. Evidence accumulates from day one. The architecture decision is the sovereignty decision; the sovereignty decision is the audit decision.

Two ways to start: free trial at console.akave.com, or contact Akave sales for an Article 12 evidence walkthrough.

FAQ

What changes on August 2, 2026?

As of today, Annex III enforcement is scheduled to begin: Articles 12, 17, 19 (with Article 26(6) six-month deployer retention), 20, and 72 become operative. The Parliament and Council reached a provisional trilogue agreement on May 7, 2026 to defer the Annex III standalone deadline to December 2, 2027 (embedded products to August 2, 2028), with intent to formalize before August 2026. Until the Official Journal publishes adoption, August 2026 legally binds.

Does Akave's US incorporation create CLOUD Act exposure?

The jurisdictional independence argument rests on architecture, not legal entity. Data is sharded across independent storage operators via erasure coding, no single operator holds reconstructible plaintext, including Akave. CLOUD Act exposure is determined by data possession and operator behavior, not parent-company location.

What does PDP mean and why does it matter for Article 12?

Proof-of-Data-Possession (PDP) is a cryptographic proof generated at write time that binds an object to a ledger entry the storage operator can't forge or revoke. An external auditor verifies the proof without trusting the storage vendor, the property that makes Article 12 “automatic recording of events” independently verifiable rather than vendor-asserted.

Sources
  1. EU AI Act (Regulation (EU) 2024/1689), eur-lex.europa.eu, Annex III enforcement currently August 2, 2026; Articles 12, 17, 19, 20, 26(6), 72; fines €35M/7% and €15M/3%.
  2. European Parliament and Council provisional trilogue agreement on Digital Omnibus VII, May 7, 2026, proposed Annex III deferral to December 2, 2027 and embedded-product deferral to August 2, 2028; not yet adopted in the Official Journal as of publication.
  3. GDPR Article 83(5), €20M or 4% global annual revenue as floor on the maximum.
  4. Sovereign compute announcements: AMD–France (April 16, 2026); UK Sovereign AI Unit (April 16, 2026); BT/Nscale UK 14MW (April 2026).

Modern Infra. Verifiable By Design

Whether you're scaling your AI infrastructure, handling sensitive records, or modernizing your cloud stack, Akave Cloud is ready to plug in. It feels familiar, but works fundamentally better.