Key Findings
Finding 1: MySQL MVP Ingestion — Debezium Server HTTP preferred
- Evidence: Research of Debezium, Maxwell, and alternatives; weighed Kafka dependency vs robustness. Debezium Server provides HTTP sink without Kafka; Maxwell HTTP is simpler but writes offsets to source DB.
- Implication: We can ship a single Java process (Debezium Server) that pushes JSON events to our Go service over HTTP. This is on-prem friendly and avoids Kafka ops.
- Confidence: High
Finding 2: SQL Server Ingestion — Native CDC with custom poller
- Evidence: SQL Server built-in CDC is mature; polling approach avoids Kafka; can be implemented in Go with acceptable latency.
- Implication: We'll write a lightweight poller that reads CDC change tables and pushes canonical events.
- Confidence: High
Finding 3: PostgreSQL later via wal2json
- Evidence: wal2json output plugin is a common, lightweight way to get logical changes; requires replication role.
- Implication: When expanding to Postgres, we'll use wal2json + custom Go consumer to maintain no-Kafka architecture.
- Confidence: Medium (needs testing)
Finding 4: Oracle not MVP
- Evidence: GoldenGate is expensive; Debezium Oracle connector depends on XStream client libraries (license); high friction.
- Implication: Defer Oracle; focus on MySQL/SQL Server; can revisit for enterprise deals if budget permits.
- Confidence: High
Finding 5: Privileges & On‑prem Fit
- Evidence: All approaches require read access to binlog/replication streams. For MySQL, `REPLICATION SLAVE` + `SELECT`. For SQL Server, enabling CDC needs higher privileges but poller only needs `SELECT` on change tables. For Postgres, `REPLICATION` role.
- Implication: We can request a read‑only replica for MySQL; for SQL Server we may need a one‑time DBA enablement then read-only poller.
- Confidence: High
---
Top 5 Open Questions (Resolved or Pending)
1. Which CDC connector? → Debezium Server HTTP (MySQL). Resolved.
2. Minimal privileges? → `REPLICATION SLAVE` + `SELECT` on tables (MySQL); read‑only on CDC tables (SQL Server). Resolved.
3. Canonical schema stability? → Use `map[string]interface{}` with optional `schema_version`. Resolved.
4. On‑prem LLM quality? → Ollama default; quality may be marginal; allow cloud fallback. Resolved.
5. Demo domain? → Pharmacy chain. Resolved.
Future open questions will be tracked as findings evolve.