Scrapbook: Implementation Plan & Backlog
Date: 2026-02-16T18:00:00Z
Goal: Break MVP architecture into 25–40 engineering tasks, each with clear goal, acceptance criteria, and difficulty estimate (S=1-2d, M=3-5d, L=1+wk). Group by milestone and produce a 60‑day schedule.
We’ll aim for ~30 tasks.
Milestone 1: CDC Ingestion (Days 1–10)
1. Task: Set up test MySQL 8 instance with binlog enabled, sample pharmacy schema.
- Goal: Have a running MySQL with necessary tables.
- AC: `mysql -u root -e "SHOW BINLOG EVENTS"` works; schema created.
- Difficulty: S
2. Task: Install and configure Debezium Server (HTTP sink).
- Goal: Debezium reads MySQL binlog and POSTs JSON to a configurable endpoint.
- AC: POSTs arrive at our listener with correct format; offsets stored in file.
- Difficulty: M
3. Task: Implement `/ingest` HTTP endpoint in Go service.
- Goal: Accept POST of batch of Debezium events; decode into raw struct; forward to normalizer channel.
- AC: 200 OK; logs count; hangs if malformed? Return 400.
- Difficulty: S
4. Task: End‑to‑end verification: Debezium → overlay → storage? Not yet storage; just log receipt.
- Goal: Confirm connectivity.
- AC: Logs show events; manual curl test works.
- Difficulty: S
Milestone 2: Normalization & Persistent Storage (Days 5–12)
5. Task: Define Go structs for CanonicalChangeEvent; implement conversion from Debezium raw.
- Goal: Clean transformation; generate `EventID` (UUID v5).
- AC: Unit tests passing for sample events.
- Difficulty: M
6. Task: Design and implement SQLite storage schema (table `events` + indexes).
- Goal: Persistent append‑only log.
- AC: `INSERT` successful; indexes created; `SELECT` by time works.
- Difficulty: S
7. Task: Implement idempotent ingestion: store canonical event with `INSERT OR IGNORE` by `event_id`.
- Goal: Safe retries.
- AC: Duplicate insert ignored; existing row remains.
- Difficulty: S
8. Task: Add transaction grouping: store `transaction_id` and ensure atomic batch ingest.
- Goal: All events in a transaction are persisted together or not at all.
- AC: Simulate partial failure; transaction rolled back.
- Difficulty: M
Milestone 3: Inference Engine (Days 10–20)
9. Task: Implement rule engine that consumes canonical events grouped by transaction.
- Goal: Apply domain rules to emit BusinessEvents.
- AC: Given sample transaction with sale + items, produces one `sale.completed`.
- Difficulty: M
10. Task: Implement inventory adjustment inference (single row update).
- Goal: Generate `inventory.adjustment` with reason from payload.
- AC: Adjustment with reason `SALE` from inventory update is emitted.
- Difficulty: S
11. Task: Implement user role change detection.
- Goal: Detect updates to `users.role_id`.
- AC: BusinessEvent emitted with old/new role IDs.
- Difficulty: S
12. Task: Write unit tests for inference rules (sample transactions).
- Goal: ≥ 80% coverage.
- Difficulty: M
Milestone 4: AI Layer (Days 15–30)
13. Task: Integrate Ollama client (HTTP to localhost:11434).
- Goal: Call `generate` endpoint with prompt; parse response.
- AC: Simple “Hello” prompt returns completion.
- Difficulty: S
14. Task: Implement daily summary batch job (runs at 23:30).
- Goal: Gather today’s BusinessEvents, feed to Ollama with summary prompt, store result.
- AC: Summary appears in DB; can be retrieved via API.
- Difficulty: M
15. Task: Design and implement heuristic anomaly detection (inventory, user, branch).
- Goal: Compute z‑scores over rolling windows; flag events.
- AC: Anomalies stored with severity and explanation template.
- Difficulty: M
16. Task: AI‑enhanced explanations: for each anomaly, call Ollama to produce natural language reason.
- Goal: Store explanation text alongside anomaly record.
- AC: Explanations are plausible and reference relevant data.
- Difficulty: M
17. Task: Caching layer for summaries and anomaly explanations to avoid repeated LLM calls.
- Goal: Redis or in‑memory LRU; TTL 24h.
- AC: Repeated calls within TTL hit cache.
- Difficulty: S
Milestone 5: Interfaces (Days 25–35)
18. Task: Build CLI `auditctl` (subcommands: summary, anomalies, trace, export).
- Goal: Binary installed; commands work.
- AC: `auditctl summary --date today` prints summary.
- Difficulty: M
19. Task: Implement MCP server (stdio) with all five tools.
- Goal: `audit-overlay mcp` runs; ChatGPT can connect.
- AC: Tool calls succeed; proper JSON‑RPC responses.
- Difficulty: M
20. Task: (Stretch) Build minimal web UI: dashboard showing recent events, summary, anomalies.
- Goal: SPA served from Go binary using embed.
- AC: UI loads; fetches data from internal API.
- Difficulty: L
Milestone 6: Security & Polish (Days 30–40)
21. Task: Add configurable data masking (column blacklist).
- Goal: Before LLM calls, specified fields are replaced with `[REDACTED]`.
- AC: Masked fields never appear in prompts or logs.
- Difficulty: S
22. Task: Harden secrets handling: support Docker secrets or `.env` with strict file perms.
- Goal: No secrets in logs.
- AC: Run with secrets; check logs.
- Difficulty: S
23. Task: Implement overlay‑self audit logging (rotate files, capture errors).
- Goal: `audit-overlay.log` rotates at 10 MB, keeps 5 backups.
- Difficulty: S
24. Task: Create Docker Compose file bundling Debezium, overlay, and optional MySQL.
- Goal: `docker compose up -d` starts everything.
- AC: All services healthy; end‑to‑end flow produces summary next day.
- Difficulty: M
25. Task: Write installation guide (markdown) for on‑prem deploy.
- Goal: Clear steps, prerequisites, troubleshooting.
- Difficulty: M
Milestone 7: Synthetic Data & Demo (Days 35–50)
26. Task: Build synthetic pharmacy data generator (Go or Python).
- Goal: Populate DB with 20 branches, 500 products, 90 days of activity + anomalies.
- AC: Data volume approx real (e.g., 10k sales/day).
- Difficulty: M
27. Task: Integrate generator into Docker Compose (init container or script).
- Goal: One‑command demo with data ready.
- Difficulty: S
28. Task: Create demo script: show key features (summary, anomaly, trace).
- Goal: Slide deck or live demo notes.
- Difficulty: S
Milestone 8: Pilot Prep & Documentation (Days 45–60)
29. Task: End‑to‑end testing with synthetic data; tune heuristics.
- Goal: Reduce false positives to acceptable level.
- Difficulty: M
30. Task: Write operator’s manual (config reference, troubleshooting).
- Goal: Comprehensive doc.
- Difficulty: M
31. Task: Prepare pilot proposal template (based on Report 5).
- Goal: Editable PDF/Markdown for sales.
- Difficulty: S
32. Task: Final code cleanup: remove debug, add metrics, ensure graceful shutdown.
- Goal: Production‑quality.
- Difficulty: M
That’s 32 tasks. Good.
Schedule (High‑Level)
- Days 1–10: Milestones 1 + early 2 (ingestion + storage). End‑to‑end raw capture into SQLite.
- Days 11–20: Milestones 2–3 (normalization + inference). BusinessEvents stored.
- Days 21–30: Milestones 4 (AI layer) and part of 5 (CLI basics). Daily summary works.
- Days 31–40: Milestones 5 (MCP), 6 (security). Full feature set.
- Days 41–50: Milestones 7 (synthetic data, demo).
- Days 51–60: Milestone 8 (polish, docs, pilot kit).
Dependencies: Ingestion must precede normalization; normalization before inference; storage before AI; AI before summary/CLI. MCP depends on all read paths.
---
Will be refined into Report 6.