← Back to project

Autonomous Choices Log

This file records significant decisions made by Clawy during the research process, beyond explicit user instructions, along with the rationale.

---

2026-02-16

Choice: Chose Debezium Server HTTP over Kafka for MySQL MVP

Context: User asked for "sellable, near-term MVP" with minimal dependencies. Kafka adds operational overhead.

Decision: Use Debezium Server (HTTP sink) as primary recommendation; Maxwell HTTP as fallback.

Rationale: Debezium Server retains robust binlog handling while eliminating Kafka; aligns with "easy install, on-prem friendly".

Trade-offs: Requires Java runtime; Debezium Server less common than Kafka Connect.

See: `decisions.md` (Decision 1)

Choice: SQL Server via Native CDC + Go poller

Context: Avoid Kafka to keep stack simple.

Decision: Implement custom Go poller reading `cdc.fn_cdc_get_all_changes_*`.

Rationale: No Kafka needed; polling interval can be tuned; lightweight.

Trade-offs: Latency up to poll interval; requires enabling CDC (one-time DBA task).

See: `decisions.md` (Decision 2)

Choice: Demo Domain = Pharmacy Chain

Context: Two options: Pharmacy or Transportation.

Decision: Pharmacy chain.

Rationale: Universal mid-size business in LATAM; inventory-centric audit signals are clear; easier to generate synthetic data.

Trade-offs: None major; transportation could be equally valid but pharmacy is more straightforward.

See: `decisions.md` (Decision 3)

Choice: Storage = SQLite for MVP

Context: Need append-only log; options: SQLite vs Postgres.

Decision: SQLite.

Rationale: Serverless, single file, zero ops; fits "easy install" and expected data volume (< million rows). Repository pattern allows future swap to Postgres.

Trade-offs: Single writer; limited concurrency. Not a problem for MVP.

See: `decisions.md` (Decision 4)

Choice: MCP Transport = Stdio only

Context: Expose tools to external AI agents. Options: stdio vs TCP+TLS.

Decision: Stdio.

Rationale: Simplest, no network exposure, fits trusted-agent use case. Can add TCP later if needed.

Trade-offs: One client at a time; requires agent to spawn process.

See: `decisions.md` (Decision 5)

Choice: AI Layer = On-prem Ollama default, cloud optional

Context: Need LLM for summaries/explanations. Balance quality vs compliance.

Decision: Default to on-prem Ollama; allow cloud LLM as opt-in.

Rationale: Keeps data in-house for pilot; reduces compliance friction. Cloud fallback if quality insufficient.

Trade-offs: On-prem model may be weaker; may need prompt tuning.

See: `decisions.md` (Decision 6)

Choice: MVP Feature Set Prioritization

Context: 8 features identified; Day 60 timeline tight.

Decision: Prioritize core five: daily summary, inventory anomalies, user suspicion, entity traceability, branch variance. Defer sales anomalies, PO fraud, role change alerts.

Rationale: These cover the most common audit scenarios and demonstrate ROI. Stretch features can be added post-pilot.

Trade-offs: Sales/PO/role alerts may be expected; need to manage expectations in sales process.

See: `decisions.md` (Decision 7)

Choice: Report Structure and Order

Context: User provided 9 deliverables. I organized reports as: 1) CDC landscape, 2) Architecture, 3) Event model, 4) Features, 5) MCP/Security, 6) Implementation. This covers the prompt.

Decision: Followed that order; merged some deliverables (e.g., Build vs Buy included in Architecture; Security and MCP together; Implementation Plan separate).

Rationale: Logical flow from technology choices to design to concrete features to security to execution.

Trade-offs: None; user can reorganize if desired.

Choice: Word Count Targets

Context: Prompt said "~1000 words" per report.

Decision: Aimed for 950–1200 words each to be thorough yet concise. Slightly over on some (1300) where needed (event model).

Rationale: Ensure completeness without fluff; some topics need more space.

Trade-offs: Slightly longer reports but still readable.

Choice: Scrapbook-first Process

Context: Structured Research Mode demands scrapbook entries before reports.

Decision: Created a dedicated scrapbook file for each report's thinking trace, timestamped. Logged early hypotheses, open questions, and reasoning.

Rationale: Maintains traceability and shows evolution of thought.

Trade-offs: Extra overhead, but valuable for documentation.

Choice: Implementation Backlog Size

Context: 25–40 tasks required.

Decision: Produced 36 tasks across 8 milestones.

Rationale: Enough granularity to be actionable, but not overwhelming. Difficulty mix (S/M/L) reflects realistic effort.

Trade-offs: Could be refined further with engineering team input, but sufficient for MVP planning.

Choice: Synthetic Data Generator Language

Context: Demo domain requires generator. Language not specified.

Decision: Implement in Go (same language as overlay) for consistency and single binary distribution.

Rationale: Keeps tech stack simple; Go has good MySQL driver and fake data libraries.

Trade-offs: Python might be faster to prototype; Go ensures generator is packaged with overlay.

---

This log will continue as research progresses. Choices made autonomously based on project goals and constraints are recorded here for transparency.