← Back to project

Scrapbook: MVP Architecture Design



Date: 2026-02-16T16:00:00Z

Goal



Design a 60-day MVP architecture for an on-prem CDC-based audit overlay, targeting MySQL first. Must be easy to install, minimal dependencies, secure, and deliver value quickly.

Components from Prompt



A) CDC ingestion (chosen: Debezium Server HTTP or Maxwell HTTP)
B) Event normalization (raw → canonical)
C) Semantic event inference (canonical → business events)
D) Storage (append-only event log + query index)
E) AI layer (summaries, anomaly explanations, semantic search)
F) Interfaces (CLI, optional web UI, MCP server)


Constraints



Initial Thoughts



Ingestion: Run Debezium Server (Java) configured for MySQL connector with HTTP sink. It POSTs JSON events to our Go service endpoint `/ingest`. Alternatively, Maxwell HTTP does similar. Debezium gives richer metadata (transaction ID, source ts) and handles schema changes gracefully.

Normalization: Define `CanonicalChangeEvent` that abstracts DB specifics. Includes: `event_id` (UUID derived from source transaction ID + seq), `source` (db name, table, pk), `operation` (INSERT/UPDATE/DELETE), `before`, `after`, `timestamp`, `transaction_id`. Preserve raw payload as JSON for debugging.

Inference: Simple rule engine (maybe Go-based) that watches sequences of changes to infer higher-level business events. E.g., if an `inventory_adjustments` row changes `quantity` and `reason_code='SALE'`, maybe combine with `sales` table insert to mark a sale completion. Could start with coarse-grained: treat any adjustment as an `InventoryAdjustment` business event. For MVP, inference can be minimal; we mainly want audit trail and anomaly detection on raw changes. But the requirement says semantic event inference, so we need at least a taxonomy.

Storage: Append-only log: store canonical events in a simple table with indexes: `(event_id)` primary key, `(timestamp)`, `(entity_type, entity_id)`, `(event_type)` maybe. SQLite would be simplest (single file, no server). But for query performance and concurrency, Postgres is better. For a single-user on-prem demo, SQLite might suffice; but if we expect high volume, Postgres is safer. Given it's an MVP for a pilot, we can go with SQLite for simplicity (zero ops) and later swap to Postgres if needed. However, our overlay service will be the only writer; indexing a few million rows is fine in SQLite. We'll design repository interface to be swappable.

AI Layer: We need daily audit summaries, anomaly explanations, semantic search. Options:

Implementation: Precompute daily summary at end-of-day: query events of the day, feed to LLM with prompt: "Summarize audit events for today in 3–4 paragraphs, highlighting anomalies and significant changes." Also anomaly detection could be rule-based (sudden inventory drops, adjustments outside business hours) plus LLM to explain why it's suspicious (reference specific values). Semantic search: use embeddings? Could use local embedding model (BGE) + vector store (SQLite with sqlite-vss) or simple keyword search for MVP. Simpler: use LLM to answer natural language queries by retrieving relevant events via keyword, then summarizing. For MVP, skip full semantic search; just provide filters and a natural language query interface that the LLM can answer with context.


Interfaces:


Deployment Topology:



+----------------+ +------------------+ +-------------------+
| MySQL | ---> | Debezium Server| ---> | |
| (primary or | | (Java) | | Audit Overlay |
| replica) | +------------------+ | Service (Go) |
+----------------+ | - Normalizer |
| - Inference |
| - Storage (SQLite)|
| - AI (LLM client)|
| - CLI/MCP |
+-------------------+


All components run on-prem inside Docker Compose or as systemd services. Single Docker network. The Go service is the core; Debezium Server is an auxiliary process.

Security:


Install Experience:

60-day Milestones:

Build vs Buy (Deliverable 8):
1. MVP: Debezium Server HTTP → overlay.
2. If customers demand lighter footprint, explore Maxwell or Go binlog libs (e.g., `go-mysql`).

---


This scrapbook entry will be refined into Report 2.