Skip to content

Event Sourcing × GDPR — Strategy ADR

Status: Decided (spike-W1) Datum: 2026-04-16 Author: Marc Frost Context: Kumiko commits to ES. Events are immutable. GDPR Article 17 requires ability to erase personal data on request. Irreconcilable at first glance — must be resolved before launch.


Problem

Event sourcing’s core guarantee is that events are an append-only log. GDPR’s “right to be forgotten” requires that personal data (PII) be deletable on user request. A naive ES system stores PII inside events forever, which violates Article 17.

Scale of impact: any EU-facing SaaS hitting this architecture without a plan is legally exposed from day one.


Options Considered

Option A — Crypto-Shredding (selected)

PII fields inside event payloads are encrypted with a per-subject key before storage. Keys live in a dedicated subject_keys table, indexed by subject id. “Forget” = delete the key. Encrypted payload remains but becomes unreadable.

Pros:

  • Events stay truly immutable — no event mutation at delete time
  • Efficient: single DELETE on key table erases PII across millions of events
  • Preserves aggregate history for non-PII replay (state machines, counts, audit)
  • Well-understood pattern (used by accounting systems, NIST-compliant approaches exist)

Cons:

  • Projections containing decrypted PII must be rebuildable — or rebuild after forget
  • Key-management becomes a critical operational concern (backup, rotation, HSM)
  • Reads must decrypt on-the-fly — slight perf cost per event read

Option B — Tombstone Events

Add a redacted_at column on events. Writers mark PII-containing events as redacted on forget-request. All readers must skip redacted PII fields.

Pros: simpler data model, no cryptography Cons:

  • Events become mutable (breaks ES’s core promise)
  • Every reader must implement “redacted” semantics (large surface area)
  • Projections keep stale PII — must rebuild anyway
  • Regulators have been skeptical of “soft-delete” arguments for personal data

Rejected — erodes the ES contract without compensating benefit.

Option C — Pure Deletion

Actually DELETE events that contain PII.

Pros: simplest mental model for GDPR Cons:

  • Breaks every aggregate reducer that touches those events
  • Version numbers get gaps → optimistic concurrency breaks
  • Not ES any more

Rejected — not compatible with the architecture.

Option D — Hybrid (Non-PII events clear, PII events encrypted)

Framework distinguishes PII and non-PII fields. Non-PII lands in event payload clear. PII goes through crypto-shredding.

This is actually what we’re selecting — Option A applied only to PII fields. Unencrypted fields stay queryable, indexable, and fast. The distinction is made at the entity-definition layer.


Decision

Crypto-Shredding with per-subject keys, applied to fields marked as PII in the entity definition.

Key structural elements:

  1. subject_keys table:

    CREATE TABLE kumiko_subject_keys (
    subject_id UUID PRIMARY KEY,
    tenant_id UUID NOT NULL,
    cipher_key BYTEA NOT NULL,
    created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
    erased_at TIMESTAMPTZ
    );
    • cipher_key is AES-256 key (32 bytes), encrypted at rest via envelope encryption (KEK from KMS/HSM)
    • erased_at is set when key is destroyed; the row stays as tombstone (auditable)
  2. Field annotation in entity definitions:

    r.entity("user", {
    fields: {
    email: textField({ pii: true }), // crypto-shredded
    displayName: textField({ pii: true }), // crypto-shredded
    status: textField({ pii: false }), // clear
    },
    });
  3. Event-store envelope:

    • Before storing, framework encrypts PII fields with the subject’s key
    • On read, framework decrypts (or surfaces a sentinel [[erased]] marker for forgotten subjects)
    • Key lookup is cached per request for perf
  4. Forget-API:

    r.command("privacy.forget", { access: { roles: ["Admin", "DataProtectionOfficer"] } });
    // Implementation: UPDATE subject_keys SET cipher_key = NULL, erased_at = now() WHERE subject_id = $1
    // Emits a 'privacy.subject_forgotten' event in the same TX for auditability.
  5. Projection rebuild after forget:

    • Projections that materialized PII in clear must rebuild
    • Framework tracks “PII-touching” projections via entity metadata
    • privacy.forget triggers async projection-rebuild for those entities

Operational Implications

  • Key backup: erased keys are not recovered. DR procedures must distinguish “accidentally lost keys” (catastrophic) from “forgotten subjects” (intentional).
  • Performance: encryption on write + decryption on read adds ~50–100µs per PII field with AES-GCM on modern CPUs. Not observable against DB round-trip latency.
  • Access-control: “right of access” (Article 15) still works — admin can decrypt and export for the subject before erasing, but cannot undo erasure.

What This Does NOT Solve

  • Backups containing old encrypted events — keys are gone, so backups effectively can’t restore PII, which is actually desirable. Documented as “backups older than key-deletion cannot expose PII” for regulators.
  • Analytics on forgotten subjects — aggregations that include counts/durations still work; identity-linked analytics do not. This is the correct behavior under GDPR.
  • Data in external systems — search indexes (Meilisearch), notifications sent externally, etc. must have their own forget-path. Documented per-integration in Phase 2.

Not a Launch Blocker

Crypto-shredding is a well-understood pattern. Implementation complexity is real but manageable:

  • 1 new table + key-lifecycle command
  • Per-field annotation on entity definitions (additive, opt-in)
  • Framework-level encrypt/decrypt at write/read boundary
  • Key-backed projection-rebuild hook

Estimated effort for Phase 2: ~1 week dedicated, or ~2 weeks alongside other Phase-2 work.

Implication for Spike Go/No-Go

GREEN. GDPR has a concrete, industry-standard solution. Not a blocker for the ES pivot decision.