← Back to project

Research Roadmap: AI Audit & Operational Intelligence Overlay MVP

Project: On-prem CDC-based AI overlay for legacy enterprise app auditing (Peru/LATAM mid-market)

Goal: Design a sellable, near-term MVP that integrates via database CDC (MySQL first) without touching application code.

Phase 1: Discovery & Scoping (Week 1)

[ ] Define precise research questions per deliverable (1–9)

[ ] Identify primary sources: official docs (Debezium, MySQL CDC), engineering blogs, enterprise deployment guides

[ ] Build initial mental model of CDC pipelines, event normalization, semantic inference layers

Phase 2: CDC Technology Landscape (Deliverable 1)

[ ] Research MySQL OSS CDC options (Debezium, Maxwell, others)

[ ] Research SQL Server CDC built-in + Debezium connector

[ ] Survey Postgres logical replication (for later)

[ ] Brief Oracle assessment (GoldenGate, Debezium)

[ ] Document privileges, risks, performance, offset management for each

[ ] Write Report 1: CDC Technology Landscape (~1000 words)

Phase 3: MVP Architecture Specification (Deliverables 2, 8)

[ ] Propose 60-day MVP architecture (MySQL-only)

[ ] Define components: ingestion, normalization, semantic inference, storage, AI layer, interfaces

[ ] Choose recommended stack (Debezium? custom? why)

[ ] Draw deployment topology (text-based diagram)

[ ] Create Build vs Buy decision analysis (Deliverable 8)

[ ] Write Report 2: MVP Architecture (~1000 words)

Phase 4: Event Model & Demo Domain (Deliverables 3, 4)

[ ] Define raw, canonical, business event schemas

[ ] Address schema evolution, transaction boundaries, idempotency, ordering, correlation

[ ] Choose demo domain (Pharmacy chain) and design schema (branches, products, inventory, sales, adjustments, suppliers, users, roles)

[ ] Plan synthetic data generator (realistic patterns + anomaly injection)

[ ] Write Report 3: Event Model & Demo Domain (~1000 words)

Phase 5: “Audit Intelligence” Features (Deliverable 5)

[ ] Propose 5–10 concrete MVP features (daily summary, anomaly detection, suspicious activity, variance, traceability)

[ ] For each: required signals, heuristic baseline, AI improvement, pilot metrics

[ ] Write Report 4: MVP Features (~1000 words)

Phase 6: MCP Tooling & Security (Deliverables 6, 7)

[ ] Design MCP server interface (tools: search_events, get_daily_summary, get_anomalies, trace_entity, export_report)

[ ] Create security/compliance checklist (privileges, network, encryption, audit logging, data minimization, on-prem vs cloud LLM)

[ ] Draft pilot proposal outline (low-risk)

[ ] Write Report 5: MCP & Security (~1000 words)

Phase 7: Implementation Plan (Deliverable 9)

[ ] Break architecture into 25–40 engineering tasks

[ ] Group by milestone; define acceptance criteria; estimate difficulty (S/M/L)

[ ] Draft Day 1–10/Day 10–20 schedule

[ ] Write Report 6: Implementation Plan (~1000 words)

Phase 8: Synthesis & Finalization

[ ] Compile all reports

[ ] Update index.md with summaries and status

[ ] Review decisions.md for completeness

[ ] Prepare final package for NotebookLM / handoff

This roadmap will be updated weekly as we learn.