Research Tasks
Granular task list with status tags (TODO / DOING / DONE).
Phase 1: Discovery & Scoping
- [ ] Compile list of primary sources (Debezium docs, MySQL CDC guides, enterprise blogs)
- [ ] Draft initial mental model of CDC-to-AI pipeline
Phase 2: CDC Landscape
- [X] Research Debezium MySQL connector: architecture, setup, connectors, limitations
- [X] Research Maxwell daemon for MySQL: pros/cons vs Debezium
- [X] Research other OSS MySQL CDC options (if any)
- [X] Research SQL Server CDC (built-in) + Debezium SQL Server connector feasibility
- [X] Survey Postgres logical replication (wal2json, pgoutput) for future phase
- [X] Brief Oracle options: GoldenGate, Debezium Oracle connector status, cost
- [X] For each DB: required DB privileges, operational risks, performance impact, offset storage, failure recovery
- [X] Write Report 1: CDC Technology Landscape
- [X] Tasks → decisions (recorded in .meta/decisions.md)
Phase 3: MVP Architecture
- [X] Define MVP scope boundaries (60 days, MySQL-only)
- [X] Choose CDC ingestion component (Debezium vs custom) — record decision
- [X] Design event normalization layer (raw → canonical schema)
- [X] Design semantic inference layer (canonical → business events)
- [X] Choose storage layer (append-only log + index: Postgres? SQLite? embedded?)
- [X] Choose AI layer (LLM integration: local vs cloud, summarization, search)
- [X] Design interfaces (CLI, optional web UI, MCP server later)
- [X] Draft deployment topology (containers vs binaries, network layout)
- [X] Write Report 2: MVP Architecture
- [X] Argue Build vs Buy (Debezium vs custom) — Deliverable 8
Phase 4: Event Model & Demo Domain
- [X] Define RawChangeEvent schema (fields from CDC)
- [X] Define CanonicalChangeEvent (normalized, DB-agnostic)
- [X] Define BusinessEvent (semantic, domain-specific)
- [X] Document handling of schema evolution, transaction boundaries, idempotency, ordering, multi-table correlation
- [X] Choose demo domain: Pharmacy chain (preferred) or Transportation
- [X] Design synthetic schema (branches, products, inventory, sales, adjustments, suppliers, users, roles, audit fields)
- [X] Plan synthetic data generator: realistic daily patterns + anomaly injection (fraud, stockouts, unusual adjustments)
- [X] Output SQL DDL (tables + indexes)
- [X] Write generator pseudocode (language-agnostic)
- [X] Write Report 3: Event Model & Demo Domain
Phase 5: Audit Intelligence Features
- [X] List 5–10 MVP features with immediate business value
- [X] For each feature: required data signals, simple heuristic baseline, AI improvement, pilot metrics
- [X] Write Report 4: MVP Features
Phase 6: MCP & Security
- [X] Design MCP server interface (tool definitions: search_events, get_daily_summary, get_anomalies, trace_entity, export_report)
- [X] Create security/compliance checklist (minimum privileges, network layout, encryption, audit logging, data minimization, LLM deployment choice)
- [X] Draft pilot proposal outline (emphasize low risk, quick value)
- [X] Write Report 5: MCP & Security
Phase 7: Implementation Plan
- [ ] Break MVP architecture into 25–40 engineering tasks
- [ ] Group tasks by milestone (e.g., Milestone 1: CDC ingestion; Milestone 2: Normalization; etc.)
- [ ] Define goal and acceptance criteria for each task
- [ ] Estimate difficulty: S (1–2 days), M (3–5 days), L (1+ weeks)
- [ ] Create Day 1–10 and Day 10–20 schedule (order tasks realistically)
- [ ] Write Report 6: Implementation Plan
Phase 8: Synthesis
- [X] Compile all reports into final package
- [X] Update index.md with summaries and status
- [X] Review decisions.md; ensure all major choices are logged
- [X] Prepare export for NotebookLM (optional)
- [X] Wrote FINAL_SUMMARY.md for handoff
Phase 9: Additional Reports (as needed)
- [X] Write Report 7: SQL Server & Oracle CDC Tooling Summary (standalone recap with detailed pros/cons, privileges, risks)
Phase 9: Additional Reports (as needed)
- [ ] Write Report 7: SQL Server & Oracle CDC Tooling Summary (standalone recap with detailed pros/cons, privileges, risks)
Status tags: TODO, DOING, DONE. Use checkboxes to track.