Event Sourcing × GDPR — Strategy ADR

Status: Decided (spike-W1) Datum: 2026-04-16 Author: Marc Frost Context: Kumiko commits to ES. Events are immutable. GDPR Article 17 requires ability to erase personal data on request. Irreconcilable at first glance — must be resolved before launch.

Problem

Event sourcing’s core guarantee is that events are an append-only log. GDPR’s “right to be forgotten” requires that personal data (PII) be deletable on user request. A naive ES system stores PII inside events forever, which violates Article 17.

Scale of impact: any EU-facing SaaS hitting this architecture without a plan is legally exposed from day one.

Options Considered

Option A — Crypto-Shredding (selected)

PII fields inside event payloads are encrypted with a per-subject key before storage. Keys live in a dedicated subject_keys table, indexed by subject id. “Forget” = delete the key. Encrypted payload remains but becomes unreadable.

Pros:

Events stay truly immutable — no event mutation at delete time
Efficient: single DELETE on key table erases PII across millions of events
Preserves aggregate history for non-PII replay (state machines, counts, audit)
Well-understood pattern (used by accounting systems, NIST-compliant approaches exist)

Cons:

Projections containing decrypted PII must be rebuildable — or rebuild after forget
Key-management becomes a critical operational concern (backup, rotation, HSM)
Reads must decrypt on-the-fly — slight perf cost per event read

Option B — Tombstone Events

Add a redacted_at column on events. Writers mark PII-containing events as redacted on forget-request. All readers must skip redacted PII fields.

Pros: simpler data model, no cryptography Cons:

Events become mutable (breaks ES’s core promise)
Every reader must implement “redacted” semantics (large surface area)
Projections keep stale PII — must rebuild anyway
Regulators have been skeptical of “soft-delete” arguments for personal data

Rejected — erodes the ES contract without compensating benefit.

Option C — Pure Deletion

Actually DELETE events that contain PII.

Pros: simplest mental model for GDPR Cons:

Breaks every aggregate reducer that touches those events
Version numbers get gaps → optimistic concurrency breaks
Not ES any more

Rejected — not compatible with the architecture.

Option D — Hybrid (Non-PII events clear, PII events encrypted)

Framework distinguishes PII and non-PII fields. Non-PII lands in event payload clear. PII goes through crypto-shredding.

This is actually what we’re selecting — Option A applied only to PII fields. Unencrypted fields stay queryable, indexable, and fast. The distinction is made at the entity-definition layer.

Decision

Crypto-Shredding with per-subject keys, applied to fields marked as PII in the entity definition.

Key structural elements:

subject_keys table:

CREATE TABLE kumiko_subject_keys (
  subject_id   UUID        PRIMARY KEY,
  tenant_id    UUID        NOT NULL,
  cipher_key   BYTEA       NOT NULL,
  created_at   TIMESTAMPTZ NOT NULL DEFAULT now(),
  erased_at    TIMESTAMPTZ
);

cipher_key is AES-256 key (32 bytes), encrypted at rest via envelope encryption (KEK from KMS/HSM)
erased_at is set when key is destroyed; the row stays as tombstone (auditable)

Field annotation in entity definitions:

r.entity("user", {
  fields: {
    email: textField({ pii: true }),           // crypto-shredded
    displayName: textField({ pii: true }),     // crypto-shredded
    status: textField({ pii: false }),         // clear
  },
});

Event-store envelope:
- Before storing, framework encrypts PII fields with the subject’s key
- On read, framework decrypts (or surfaces a sentinel [[erased]] marker for forgotten subjects)
- Key lookup is cached per request for perf

Forget-API:

r.command("privacy.forget", { access: { roles: ["Admin", "DataProtectionOfficer"] } });
// Implementation: UPDATE subject_keys SET cipher_key = NULL, erased_at = now() WHERE subject_id = $1
// Emits a 'privacy.subject_forgotten' event in the same TX for auditability.

Projection rebuild after forget:
- Projections that materialized PII in clear must rebuild
- Framework tracks “PII-touching” projections via entity metadata
- privacy.forget triggers async projection-rebuild for those entities

Operational Implications

Key backup: erased keys are not recovered. DR procedures must distinguish “accidentally lost keys” (catastrophic) from “forgotten subjects” (intentional).
Performance: encryption on write + decryption on read adds ~50–100µs per PII field with AES-GCM on modern CPUs. Not observable against DB round-trip latency.
Access-control: “right of access” (Article 15) still works — admin can decrypt and export for the subject before erasing, but cannot undo erasure.

What This Does NOT Solve

Backups containing old encrypted events — keys are gone, so backups effectively can’t restore PII, which is actually desirable. Documented as “backups older than key-deletion cannot expose PII” for regulators.
Analytics on forgotten subjects — aggregations that include counts/durations still work; identity-linked analytics do not. This is the correct behavior under GDPR.
Data in external systems — search indexes (Meilisearch), notifications sent externally, etc. must have their own forget-path. Documented per-integration in Phase 2.

Not a Launch Blocker

Crypto-shredding is a well-understood pattern. Implementation complexity is real but manageable:

1 new table + key-lifecycle command
Per-field annotation on entity definitions (additive, opt-in)
Framework-level encrypt/decrypt at write/read boundary
Key-backed projection-rebuild hook

Estimated effort for Phase 2: ~1 week dedicated, or ~2 weeks alongside other Phase-2 work.

Implication for Spike Go/No-Go

GREEN. GDPR has a concrete, industry-standard solution. Not a blocker for the ES pivot decision.