Autonomous Choices Log
This file records significant decisions made by Clawy during the research process, beyond explicit user instructions, along with the rationale.
---
2026-02-16
Choice: Chose Debezium Server HTTP over Kafka for MySQL MVP
- Context: User asked for "sellable, near-term MVP" with minimal dependencies. Kafka adds operational overhead.
- Decision: Use Debezium Server (HTTP sink) as primary recommendation; Maxwell HTTP as fallback.
- Rationale: Debezium Server retains robust binlog handling while eliminating Kafka; aligns with "easy install, on-prem friendly".
- Trade-offs: Requires Java runtime; Debezium Server less common than Kafka Connect.
- See: `decisions.md` (Decision 1)
Choice: SQL Server via Native CDC + Go poller
- Context: Avoid Kafka to keep stack simple.
- Decision: Implement custom Go poller reading `cdc.fn_cdc_get_all_changes_*`.
- Rationale: No Kafka needed; polling interval can be tuned; lightweight.
- Trade-offs: Latency up to poll interval; requires enabling CDC (one-time DBA task).
- See: `decisions.md` (Decision 2)
Choice: Demo Domain = Pharmacy Chain
- Context: Two options: Pharmacy or Transportation.
- Decision: Pharmacy chain.
- Rationale: Universal mid-size business in LATAM; inventory-centric audit signals are clear; easier to generate synthetic data.
- Trade-offs: None major; transportation could be equally valid but pharmacy is more straightforward.
- See: `decisions.md` (Decision 3)
Choice: Storage = SQLite for MVP
- Context: Need append-only log; options: SQLite vs Postgres.
- Decision: SQLite.
- Rationale: Serverless, single file, zero ops; fits "easy install" and expected data volume (< million rows). Repository pattern allows future swap to Postgres.
- Trade-offs: Single writer; limited concurrency. Not a problem for MVP.
- See: `decisions.md` (Decision 4)
Choice: MCP Transport = Stdio only
- Context: Expose tools to external AI agents. Options: stdio vs TCP+TLS.
- Decision: Stdio.
- Rationale: Simplest, no network exposure, fits trusted-agent use case. Can add TCP later if needed.
- Trade-offs: One client at a time; requires agent to spawn process.
- See: `decisions.md` (Decision 5)
Choice: AI Layer = On-prem Ollama default, cloud optional
- Context: Need LLM for summaries/explanations. Balance quality vs compliance.
- Decision: Default to on-prem Ollama; allow cloud LLM as opt-in.
- Rationale: Keeps data in-house for pilot; reduces compliance friction. Cloud fallback if quality insufficient.
- Trade-offs: On-prem model may be weaker; may need prompt tuning.
- See: `decisions.md` (Decision 6)
Choice: MVP Feature Set Prioritization
- Context: 8 features identified; Day 60 timeline tight.
- Decision: Prioritize core five: daily summary, inventory anomalies, user suspicion, entity traceability, branch variance. Defer sales anomalies, PO fraud, role change alerts.
- Rationale: These cover the most common audit scenarios and demonstrate ROI. Stretch features can be added post-pilot.
- Trade-offs: Sales/PO/role alerts may be expected; need to manage expectations in sales process.
- See: `decisions.md` (Decision 7)
Choice: Report Structure and Order
- Context: User provided 9 deliverables. I organized reports as: 1) CDC landscape, 2) Architecture, 3) Event model, 4) Features, 5) MCP/Security, 6) Implementation. This covers the prompt.
- Decision: Followed that order; merged some deliverables (e.g., Build vs Buy included in Architecture; Security and MCP together; Implementation Plan separate).
- Rationale: Logical flow from technology choices to design to concrete features to security to execution.
- Trade-offs: None; user can reorganize if desired.
Choice: Word Count Targets
- Context: Prompt said "~1000 words" per report.
- Decision: Aimed for 950–1200 words each to be thorough yet concise. Slightly over on some (1300) where needed (event model).
- Rationale: Ensure completeness without fluff; some topics need more space.
- Trade-offs: Slightly longer reports but still readable.
Choice: Scrapbook-first Process
- Context: Structured Research Mode demands scrapbook entries before reports.
- Decision: Created a dedicated scrapbook file for each report's thinking trace, timestamped. Logged early hypotheses, open questions, and reasoning.
- Rationale: Maintains traceability and shows evolution of thought.
- Trade-offs: Extra overhead, but valuable for documentation.
Choice: Implementation Backlog Size
- Context: 25–40 tasks required.
- Decision: Produced 36 tasks across 8 milestones.
- Rationale: Enough granularity to be actionable, but not overwhelming. Difficulty mix (S/M/L) reflects realistic effort.
- Trade-offs: Could be refined further with engineering team input, but sufficient for MVP planning.
Choice: Synthetic Data Generator Language
- Context: Demo domain requires generator. Language not specified.
- Decision: Implement in Go (same language as overlay) for consistency and single binary distribution.
- Rationale: Keeps tech stack simple; Go has good MySQL driver and fake data libraries.
- Trade-offs: Python might be faster to prototype; Go ensures generator is packaged with overlay.
---
This log will continue as research progresses. Choices made autonomously based on project goals and constraints are recorded here for transparency.