Event Sourcing × GDPR — Strategy ADR
Status: Decided (spike-W1) Datum: 2026-04-16 Author: Marc Frost Context: Kumiko commits to ES. Events are immutable. GDPR Article 17 requires ability to erase personal data on request. Irreconcilable at first glance — must be resolved before launch.
Problem
Event sourcing’s core guarantee is that events are an append-only log. GDPR’s “right to be forgotten” requires that personal data (PII) be deletable on user request. A naive ES system stores PII inside events forever, which violates Article 17.
Scale of impact: any EU-facing SaaS hitting this architecture without a plan is legally exposed from day one.
Options Considered
Option A — Crypto-Shredding (selected)
PII fields inside event payloads are encrypted with a per-subject key before storage. Keys live in a dedicated subject_keys table, indexed by subject id. “Forget” = delete the key. Encrypted payload remains but becomes unreadable.
Pros:
- Events stay truly immutable — no event mutation at delete time
- Efficient: single DELETE on key table erases PII across millions of events
- Preserves aggregate history for non-PII replay (state machines, counts, audit)
- Well-understood pattern (used by accounting systems, NIST-compliant approaches exist)
Cons:
- Projections containing decrypted PII must be rebuildable — or rebuild after forget
- Key-management becomes a critical operational concern (backup, rotation, HSM)
- Reads must decrypt on-the-fly — slight perf cost per event read
Option B — Tombstone Events
Add a redacted_at column on events. Writers mark PII-containing events as redacted on forget-request. All readers must skip redacted PII fields.
Pros: simpler data model, no cryptography Cons:
- Events become mutable (breaks ES’s core promise)
- Every reader must implement “redacted” semantics (large surface area)
- Projections keep stale PII — must rebuild anyway
- Regulators have been skeptical of “soft-delete” arguments for personal data
Rejected — erodes the ES contract without compensating benefit.
Option C — Pure Deletion
Actually DELETE events that contain PII.
Pros: simplest mental model for GDPR Cons:
- Breaks every aggregate reducer that touches those events
- Version numbers get gaps → optimistic concurrency breaks
- Not ES any more
Rejected — not compatible with the architecture.
Option D — Hybrid (Non-PII events clear, PII events encrypted)
Framework distinguishes PII and non-PII fields. Non-PII lands in event payload clear. PII goes through crypto-shredding.
This is actually what we’re selecting — Option A applied only to PII fields. Unencrypted fields stay queryable, indexable, and fast. The distinction is made at the entity-definition layer.
Decision
Crypto-Shredding with per-subject keys, applied to fields marked as PII in the entity definition.
Key structural elements:
-
subject_keystable:CREATE TABLE kumiko_subject_keys (subject_id UUID PRIMARY KEY,tenant_id UUID NOT NULL,cipher_key BYTEA NOT NULL,created_at TIMESTAMPTZ NOT NULL DEFAULT now(),erased_at TIMESTAMPTZ);cipher_keyis AES-256 key (32 bytes), encrypted at rest via envelope encryption (KEK from KMS/HSM)erased_atis set when key is destroyed; the row stays as tombstone (auditable)
-
Field annotation in entity definitions:
r.entity("user", {fields: {email: textField({ pii: true }), // crypto-shreddeddisplayName: textField({ pii: true }), // crypto-shreddedstatus: textField({ pii: false }), // clear},}); -
Event-store envelope:
- Before storing, framework encrypts PII fields with the subject’s key
- On read, framework decrypts (or surfaces a sentinel
[[erased]]marker for forgotten subjects) - Key lookup is cached per request for perf
-
Forget-API:
r.command("privacy.forget", { access: { roles: ["Admin", "DataProtectionOfficer"] } });// Implementation: UPDATE subject_keys SET cipher_key = NULL, erased_at = now() WHERE subject_id = $1// Emits a 'privacy.subject_forgotten' event in the same TX for auditability. -
Projection rebuild after forget:
- Projections that materialized PII in clear must rebuild
- Framework tracks “PII-touching” projections via entity metadata
privacy.forgettriggers async projection-rebuild for those entities
Operational Implications
- Key backup: erased keys are not recovered. DR procedures must distinguish “accidentally lost keys” (catastrophic) from “forgotten subjects” (intentional).
- Performance: encryption on write + decryption on read adds ~50–100µs per PII field with AES-GCM on modern CPUs. Not observable against DB round-trip latency.
- Access-control: “right of access” (Article 15) still works — admin can decrypt and export for the subject before erasing, but cannot undo erasure.
What This Does NOT Solve
- Backups containing old encrypted events — keys are gone, so backups effectively can’t restore PII, which is actually desirable. Documented as “backups older than key-deletion cannot expose PII” for regulators.
- Analytics on forgotten subjects — aggregations that include counts/durations still work; identity-linked analytics do not. This is the correct behavior under GDPR.
- Data in external systems — search indexes (Meilisearch), notifications sent externally, etc. must have their own forget-path. Documented per-integration in Phase 2.
Not a Launch Blocker
Crypto-shredding is a well-understood pattern. Implementation complexity is real but manageable:
- 1 new table + key-lifecycle command
- Per-field annotation on entity definitions (additive, opt-in)
- Framework-level encrypt/decrypt at write/read boundary
- Key-backed projection-rebuild hook
Estimated effort for Phase 2: ~1 week dedicated, or ~2 weeks alongside other Phase-2 work.
Implication for Spike Go/No-Go
GREEN. GDPR has a concrete, industry-standard solution. Not a blocker for the ES pivot decision.