Research Roadmap: AI Audit & Operational Intelligence Overlay MVP
Project: On-prem CDC-based AI overlay for legacy enterprise app auditing (Peru/LATAM mid-market)
Goal: Design a sellable, near-term MVP that integrates via database CDC (MySQL first) without touching application code.
Phase 1: Discovery & Scoping (Week 1)
- [ ] Define precise research questions per deliverable (1–9)
- [ ] Identify primary sources: official docs (Debezium, MySQL CDC), engineering blogs, enterprise deployment guides
- [ ] Build initial mental model of CDC pipelines, event normalization, semantic inference layers
Phase 2: CDC Technology Landscape (Deliverable 1)
- [ ] Research MySQL OSS CDC options (Debezium, Maxwell, others)
- [ ] Research SQL Server CDC built-in + Debezium connector
- [ ] Survey Postgres logical replication (for later)
- [ ] Brief Oracle assessment (GoldenGate, Debezium)
- [ ] Document privileges, risks, performance, offset management for each
- [ ] Write Report 1: CDC Technology Landscape (~1000 words)
Phase 3: MVP Architecture Specification (Deliverables 2, 8)
- [ ] Propose 60-day MVP architecture (MySQL-only)
- [ ] Define components: ingestion, normalization, semantic inference, storage, AI layer, interfaces
- [ ] Choose recommended stack (Debezium? custom? why)
- [ ] Draw deployment topology (text-based diagram)
- [ ] Create Build vs Buy decision analysis (Deliverable 8)
- [ ] Write Report 2: MVP Architecture (~1000 words)
Phase 4: Event Model & Demo Domain (Deliverables 3, 4)
- [ ] Define raw, canonical, business event schemas
- [ ] Address schema evolution, transaction boundaries, idempotency, ordering, correlation
- [ ] Choose demo domain (Pharmacy chain) and design schema (branches, products, inventory, sales, adjustments, suppliers, users, roles)
- [ ] Plan synthetic data generator (realistic patterns + anomaly injection)
- [ ] Write Report 3: Event Model & Demo Domain (~1000 words)
Phase 5: “Audit Intelligence” Features (Deliverable 5)
- [ ] Propose 5–10 concrete MVP features (daily summary, anomaly detection, suspicious activity, variance, traceability)
- [ ] For each: required signals, heuristic baseline, AI improvement, pilot metrics
- [ ] Write Report 4: MVP Features (~1000 words)
Phase 6: MCP Tooling & Security (Deliverables 6, 7)
- [ ] Design MCP server interface (tools: search_events, get_daily_summary, get_anomalies, trace_entity, export_report)
- [ ] Create security/compliance checklist (privileges, network, encryption, audit logging, data minimization, on-prem vs cloud LLM)
- [ ] Draft pilot proposal outline (low-risk)
- [ ] Write Report 5: MCP & Security (~1000 words)
Phase 7: Implementation Plan (Deliverable 9)
- [ ] Break architecture into 25–40 engineering tasks
- [ ] Group by milestone; define acceptance criteria; estimate difficulty (S/M/L)
- [ ] Draft Day 1–10/Day 10–20 schedule
- [ ] Write Report 6: Implementation Plan (~1000 words)
Phase 8: Synthesis & Finalization
- [ ] Compile all reports
- [ ] Update index.md with summaries and status
- [ ] Review decisions.md for completeness
- [ ] Prepare final package for NotebookLM / handoff
This roadmap will be updated weekly as we learn.