Scrapbook: Implementation Plan & Backlog

Date: 2026-02-16T18:00:00Z

Goal: Break MVP architecture into 25–40 engineering tasks, each with clear goal, acceptance criteria, and difficulty estimate (S=1-2d, M=3-5d, L=1+wk). Group by milestone and produce a 60‑day schedule.

We’ll aim for ~30 tasks.

Milestone 1: CDC Ingestion (Days 1–10)

1. Task: Set up test MySQL 8 instance with binlog enabled, sample pharmacy schema.

Goal: Have a running MySQL with necessary tables.

AC: `mysql -u root -e "SHOW BINLOG EVENTS"` works; schema created.

Difficulty: S

2. Task: Install and configure Debezium Server (HTTP sink).

Goal: Debezium reads MySQL binlog and POSTs JSON to a configurable endpoint.

AC: POSTs arrive at our listener with correct format; offsets stored in file.

Difficulty: M

3. Task: Implement `/ingest` HTTP endpoint in Go service.

Goal: Accept POST of batch of Debezium events; decode into raw struct; forward to normalizer channel.

AC: 200 OK; logs count; hangs if malformed? Return 400.

Difficulty: S

4. Task: End‑to‑end verification: Debezium → overlay → storage? Not yet storage; just log receipt.

Goal: Confirm connectivity.

AC: Logs show events; manual curl test works.

Difficulty: S

Milestone 2: Normalization & Persistent Storage (Days 5–12)

5. Task: Define Go structs for CanonicalChangeEvent; implement conversion from Debezium raw.

Goal: Clean transformation; generate `EventID` (UUID v5).

AC: Unit tests passing for sample events.

Difficulty: M

6. Task: Design and implement SQLite storage schema (table `events` + indexes).

Goal: Persistent append‑only log.

AC: `INSERT` successful; indexes created; `SELECT` by time works.

Difficulty: S

7. Task: Implement idempotent ingestion: store canonical event with `INSERT OR IGNORE` by `event_id`.

Goal: Safe retries.

AC: Duplicate insert ignored; existing row remains.

Difficulty: S

8. Task: Add transaction grouping: store `transaction_id` and ensure atomic batch ingest.

Goal: All events in a transaction are persisted together or not at all.

AC: Simulate partial failure; transaction rolled back.

Difficulty: M

Milestone 3: Inference Engine (Days 10–20)

9. Task: Implement rule engine that consumes canonical events grouped by transaction.

Goal: Apply domain rules to emit BusinessEvents.

AC: Given sample transaction with sale + items, produces one `sale.completed`.

Difficulty: M

10. Task: Implement inventory adjustment inference (single row update).

Goal: Generate `inventory.adjustment` with reason from payload.

AC: Adjustment with reason `SALE` from inventory update is emitted.

Difficulty: S

11. Task: Implement user role change detection.

Goal: Detect updates to `users.role_id`.

AC: BusinessEvent emitted with old/new role IDs.

Difficulty: S

12. Task: Write unit tests for inference rules (sample transactions).

Goal: ≥ 80% coverage.

Difficulty: M

Milestone 4: AI Layer (Days 15–30)

13. Task: Integrate Ollama client (HTTP to localhost:11434).

Goal: Call `generate` endpoint with prompt; parse response.

AC: Simple “Hello” prompt returns completion.

Difficulty: S

14. Task: Implement daily summary batch job (runs at 23:30).

Goal: Gather today’s BusinessEvents, feed to Ollama with summary prompt, store result.

AC: Summary appears in DB; can be retrieved via API.

Difficulty: M

15. Task: Design and implement heuristic anomaly detection (inventory, user, branch).

Goal: Compute z‑scores over rolling windows; flag events.

AC: Anomalies stored with severity and explanation template.

Difficulty: M

16. Task: AI‑enhanced explanations: for each anomaly, call Ollama to produce natural language reason.

Goal: Store explanation text alongside anomaly record.

AC: Explanations are plausible and reference relevant data.

Difficulty: M

17. Task: Caching layer for summaries and anomaly explanations to avoid repeated LLM calls.

Goal: Redis or in‑memory LRU; TTL 24h.

AC: Repeated calls within TTL hit cache.

Difficulty: S

Milestone 5: Interfaces (Days 25–35)

18. Task: Build CLI `auditctl` (subcommands: summary, anomalies, trace, export).

Goal: Binary installed; commands work.

AC: `auditctl summary --date today` prints summary.

Difficulty: M

19. Task: Implement MCP server (stdio) with all five tools.

Goal: `audit-overlay mcp` runs; ChatGPT can connect.

AC: Tool calls succeed; proper JSON‑RPC responses.

Difficulty: M

20. Task: (Stretch) Build minimal web UI: dashboard showing recent events, summary, anomalies.

Goal: SPA served from Go binary using embed.

AC: UI loads; fetches data from internal API.

Difficulty: L

Milestone 6: Security & Polish (Days 30–40)

21. Task: Add configurable data masking (column blacklist).

Goal: Before LLM calls, specified fields are replaced with `[REDACTED]`.

AC: Masked fields never appear in prompts or logs.

Difficulty: S

22. Task: Harden secrets handling: support Docker secrets or `.env` with strict file perms.

Goal: No secrets in logs.

AC: Run with secrets; check logs.

Difficulty: S

23. Task: Implement overlay‑self audit logging (rotate files, capture errors).

Goal: `audit-overlay.log` rotates at 10 MB, keeps 5 backups.

Difficulty: S

24. Task: Create Docker Compose file bundling Debezium, overlay, and optional MySQL.

Goal: `docker compose up -d` starts everything.

AC: All services healthy; end‑to‑end flow produces summary next day.

Difficulty: M

25. Task: Write installation guide (markdown) for on‑prem deploy.

Goal: Clear steps, prerequisites, troubleshooting.

Difficulty: M

Milestone 7: Synthetic Data & Demo (Days 35–50)

26. Task: Build synthetic pharmacy data generator (Go or Python).

Goal: Populate DB with 20 branches, 500 products, 90 days of activity + anomalies.

AC: Data volume approx real (e.g., 10k sales/day).

Difficulty: M

27. Task: Integrate generator into Docker Compose (init container or script).

Goal: One‑command demo with data ready.

Difficulty: S

28. Task: Create demo script: show key features (summary, anomaly, trace).

Goal: Slide deck or live demo notes.

Difficulty: S

Milestone 8: Pilot Prep & Documentation (Days 45–60)

29. Task: End‑to‑end testing with synthetic data; tune heuristics.

Goal: Reduce false positives to acceptable level.

Difficulty: M

30. Task: Write operator’s manual (config reference, troubleshooting).

Goal: Comprehensive doc.

Difficulty: M

31. Task: Prepare pilot proposal template (based on Report 5).

Goal: Editable PDF/Markdown for sales.

Difficulty: S

32. Task: Final code cleanup: remove debug, add metrics, ensure graceful shutdown.

Goal: Production‑quality.

Difficulty: M

That’s 32 tasks. Good.

Schedule (High‑Level)

Days 1–10: Milestones 1 + early 2 (ingestion + storage). End‑to‑end raw capture into SQLite.

Days 11–20: Milestones 2–3 (normalization + inference). BusinessEvents stored.

Days 21–30: Milestones 4 (AI layer) and part of 5 (CLI basics). Daily summary works.

Days 31–40: Milestones 5 (MCP), 6 (security). Full feature set.

Days 41–50: Milestones 7 (synthetic data, demo).

Days 51–60: Milestone 8 (polish, docs, pilot kit).

Dependencies: Ingestion must precede normalization; normalization before inference; storage before AI; AI before summary/CLI. MCP depends on all read paths.

---

Will be refined into Report 6.