Scrapbook: MCP Tooling & Security Design
Date: 2026-02-16T17:30:00Z
MCP Server Interface
Goal: Expose our overlay’s capabilities to external AI agents (like ChatGPT, Claude) via Model Context Protocol. This lets Antonio query his audit data through a natural interface.
MCP tools (functions) to expose:
1. `search_events(query: string) -> list<EventSummary>`
- Free‑text search over event payloads (simple LIKE or future vector). Returns matching events with id, type, timestamp, entity, snippet.
- Returns the pre‑computed natural‑language summary for that date, or generates on‑the‑fly if missing.
- Returns flagged anomalies in date range, with explanations.
- Full timeline for given entity.
- Generate ad‑hoc reports; e.g., export all sales for a date range.
Transport: stdio (JSON‑RPC 2.0) as per MCP spec. Our Go binary will have a `mcp` subcommand that runs the server loop. Tools must be fast; heavy LLM calls are done only where needed (e.g., summary generation already cached). For `search_events`, we initially use SQLite FTS if available, else fallback to simple LIKE.
Security: Only local invocation (stdio) by trusted agent (Antonio’s ChatGPT). No network exposure. If needed, we could add token‑based auth over pipes, but trust is assumed.
---
Security, Compliance, Politics Checklist
IT approval concerns for on‑prem software:
- Minimum privileges:
- DB: read‑only replica account for MySQL; `REPLICATION SLAVE` + `SELECT` on captured tables. If using native CDC for SQL Server, need `db_owner` to enable CDC initially, then a read‑only account for the poller. We’ll document that enabling CDC is a one‑time DBA task; afterwards the poller uses `SELECT` on CDC tables.
- Network layout:
- All components communicate over localhost or a dedicated Docker bridge network. No inbound ports from outside the host, except optionally a UI port (HTTPS) behind corporate firewall.
- Encryption:
- In‑transit: DB connection uses TLS if available (MySQL can use `REQUIRE SSL`). Overlay services can use HTTPS internally if desired, but localhost plain is acceptable.
- At rest: SQLite database file stored with restricted permissions (0600). Optional disk encryption at host level (customer’s responsibility).
- Audit logging of the overlay itself:
- The overlay’s own operations (ingest requests, errors, admin actions) are logged to a local rotating file with size limit. Logs are not sent to external parties.
- Data minimization:
- Before any data leaves the host (optional cloud LLM), we strip columns marked as PII (e.g., `users.username`, `customers.*`). This is configurable via `mask_columns` list per table.
- If using on‑prem LLM (Ollama), no data leaves; but we still mask to avoid accidental exposure in logs.
- On‑prem vs cloud LLM:
- Default: on‑prem Ollama (e.g., `llama3.2:3b` or Mistral 7B). Requires CPU or modest GPU.
- Optional: cloud LLM (OpenRouter, Anthropic) for higher quality summaries if customer allowlists outbound traffic and signs data processing agreement.
- Recommendation for pilot: start with on‑prem to simplify compliance; switch to cloud only if quality unacceptable and customer agrees.
Politics:
- The overlay is read‑only from the DB’s perspective (except optional offset storage in source for Maxwell; avoid that). IT likes that it doesn’t modify production data.
- Installation should be a single command (Docker Compose) and should not require changes to the application. That’s the main selling point.
---
Pilot Proposal Outline (Low‑Risk)
Objective: Demonstrate value within 2 weeks of deployment on a non‑production replica of the customer’s database (or synthetic data if replica not available).
Scope:
- MySQL instance with a copy of the customer’s schema (minus PII) and synthetic realistic data.
- Overlay runs in a VM or Docker host inside the customer’s network.
- No changes to existing applications.
Success criteria:
- Daily summary generated without errors.
- At least 3 anomaly alerts per week that are verified as true positives by customer staff.
- Entity traceability queries return results < 2 seconds.
- Auditor feedback: >7/10 on usefulness.
Exit clause: If installation proves too disruptive, we uninstall with one command; no residual changes to their databases.
---
MCP Tool Definitions (Detailed)
Follows JSON‑RPC method naming.
search_events
{
"jsonrpc": "2.0",
"method": "audit.search_events",
"params": { "query": "inventory theft" },
"id": 1
}
Response:
{
"jsonrpc": "2.0",
"result": {
"events": [
{
"event_id": "...",
"type": "inventory.adjustment",
"timestamp": "2026-02-15T22:30:00Z",
"entity_type": "product",
"entity_id": "123",
"snippet": "adjustment reason=THEFT, qty -50"
}
]
},
"id": 1
}
get_daily_summary
Params: `{ "date": "2026-02-16" }` → `{ "summary": "Today ..."}"
get_anomalies
Params: `{ "start_date": "...", "end_date": "...", "severity": "high" }` (severity optional)
trace_entity
Params: `{ "entity_type": "product", "entity_id": "123" }` → array of events sorted by timestamp.
export_report
Params: `{ "format": "csv", "type": "sales", "start": "...", "end": "..." }` → returns file content as base64 or stream.
All tools are read‑only; no risk of modifying source data.
---
This will be organized into Report 5.