Scrapbook: MCP Tooling & Security Design

Date: 2026-02-16T17:30:00Z

MCP Server Interface

Goal: Expose our overlay’s capabilities to external AI agents (like ChatGPT, Claude) via Model Context Protocol. This lets Antonio query his audit data through a natural interface.

MCP tools (functions) to expose:

1. `search_events(query: string) -> list<EventSummary>`

Free‑text search over event payloads (simple LIKE or future vector). Returns matching events with id, type, timestamp, entity, snippet.

2. `get_daily_summary(date: string) -> string`

Returns the pre‑computed natural‑language summary for that date, or generates on‑the‑fly if missing.

3. `get_anomalies(start_date, end_date, severity?) -> list<Anomaly>`

Returns flagged anomalies in date range, with explanations.

4. `trace_entity(entity_type: string, entity_id: string) -> list<Event>`

Full timeline for given entity.

5. `export_report(format: "json"|"csv"|"md", params) -> file or text`

Generate ad‑hoc reports; e.g., export all sales for a date range.

Transport: stdio (JSON‑RPC 2.0) as per MCP spec. Our Go binary will have a `mcp` subcommand that runs the server loop. Tools must be fast; heavy LLM calls are done only where needed (e.g., summary generation already cached). For `search_events`, we initially use SQLite FTS if available, else fallback to simple LIKE.

Security: Only local invocation (stdio) by trusted agent (Antonio’s ChatGPT). No network exposure. If needed, we could add token‑based auth over pipes, but trust is assumed.

---

Security, Compliance, Politics Checklist

IT approval concerns for on‑prem software:

Minimum privileges:

DB: read‑only replica account for MySQL; `REPLICATION SLAVE` + `SELECT` on captured tables. If using native CDC for SQL Server, need `db_owner` to enable CDC initially, then a read‑only account for the poller. We’ll document that enabling CDC is a one‑time DBA task; afterwards the poller uses `SELECT` on CDC tables.

Network layout:

All components communicate over localhost or a dedicated Docker bridge network. No inbound ports from outside the host, except optionally a UI port (HTTPS) behind corporate firewall.

Encryption:

In‑transit: DB connection uses TLS if available (MySQL can use `REQUIRE SSL`). Overlay services can use HTTPS internally if desired, but localhost plain is acceptable.

At rest: SQLite database file stored with restricted permissions (0600). Optional disk encryption at host level (customer’s responsibility).

Audit logging of the overlay itself:

The overlay’s own operations (ingest requests, errors, admin actions) are logged to a local rotating file with size limit. Logs are not sent to external parties.

Data minimization:

Before any data leaves the host (optional cloud LLM), we strip columns marked as PII (e.g., `users.username`, `customers.*`). This is configurable via `mask_columns` list per table.

If using on‑prem LLM (Ollama), no data leaves; but we still mask to avoid accidental exposure in logs.

On‑prem vs cloud LLM:

Default: on‑prem Ollama (e.g., `llama3.2:3b` or Mistral 7B). Requires CPU or modest GPU.

Optional: cloud LLM (OpenRouter, Anthropic) for higher quality summaries if customer allowlists outbound traffic and signs data processing agreement.

Recommendation for pilot: start with on‑prem to simplify compliance; switch to cloud only if quality unacceptable and customer agrees.

Politics:

The overlay is read‑only from the DB’s perspective (except optional offset storage in source for Maxwell; avoid that). IT likes that it doesn’t modify production data.

Installation should be a single command (Docker Compose) and should not require changes to the application. That’s the main selling point.

---

Pilot Proposal Outline (Low‑Risk)

Objective: Demonstrate value within 2 weeks of deployment on a non‑production replica of the customer’s database (or synthetic data if replica not available).

Scope:

MySQL instance with a copy of the customer’s schema (minus PII) and synthetic realistic data.

Overlay runs in a VM or Docker host inside the customer’s network.

No changes to existing applications.

Success criteria:

Daily summary generated without errors.

At least 3 anomaly alerts per week that are verified as true positives by customer staff.

Entity traceability queries return results < 2 seconds.

Auditor feedback: >7/10 on usefulness.

Exit clause: If installation proves too disruptive, we uninstall with one command; no residual changes to their databases.

---

MCP Tool Definitions (Detailed)

Follows JSON‑RPC method naming.

search_events



{

  "jsonrpc": "2.0",

  "method": "audit.search_events",

  "params": { "query": "inventory theft" },

  "id": 1

}

Response:



{

  "jsonrpc": "2.0",

  "result": {

    "events": [

      {

        "event_id": "...",

        "type": "inventory.adjustment",

        "timestamp": "2026-02-15T22:30:00Z",

        "entity_type": "product",

        "entity_id": "123",

        "snippet": "adjustment reason=THEFT, qty -50"

      }

    ]

  },

  "id": 1

}

get_daily_summary

Params: `{ "date": "2026-02-16" }` → `{ "summary": "Today ..."}"

get_anomalies

Params: `{ "start_date": "...", "end_date": "...", "severity": "high" }` (severity optional)

trace_entity

Params: `{ "entity_type": "product", "entity_id": "123" }` → array of events sorted by timestamp.

export_report

Params: `{ "format": "csv", "type": "sales", "start": "...", "end": "..." }` → returns file content as base64 or stream.

All tools are read‑only; no risk of modifying source data.

---

This will be organized into Report 5.