Scrapbook: Event Model & Pharmacy Domain Design
Date: 2026-02-16T16:30:00Z
Event Schemas
RawChangeEvent: as emitted by Debezium MySQL connector. Fields:
- `source` (db, table, pk)
- `ts_ms` (timestamp), `transaction_id` (GTID or file:pos)
- `op` (c/u/d/r)
- `before`, `after` (full row images)
- `schema` (optional, for DDL events)
We'll accept this as ingress; we don't need to model it in our code as a struct? Actually we will parse into struct for Go. But canonical strips Debezium specifics.
CanonicalChangeEvent:
- event_id (UUID v5: namespace=source+txid, name=table+pk+seq)
- source.db, source.table, source.pk (JSON)
- operation (string)
- before (map), after (map)
- timestamp (time.Time)
- transaction_id (string, from source)
- schema_version (optional string, e.g., table:version)
- metadata.raw (original payload for debug if needed)
BusinessEvent:
- type (string, e.g., `inventory.adjustment`, `sale.completed`, `user.role_changed`)
- entity_type, entity_id (e.g., product, 12345)
- timestamp
- payload (fields relevant)
- confidence (float)
- source_canonical_event_id (link to underlying change, could be list)
Handling Concerns
- Schema evolution: If table structure changes, `before`/`after` fields will have new/removed columns. Our canonical schema keeps them as maps; downstream code must handle missing keys. For BusinessEvent inference, we should reference stable keys (e.g., `product_id` exists across versions). We'll also include `schema_version` to track version.
- Transaction boundaries: Debezium groups events within a transaction using same `transaction_id` and provides a `transaction` wrapper if we enable it. Our ingester should receive whole transactions (array) and atomically store them to avoid partial visibility. The overlay's storage can have a `transaction_id` column; queries can filter. Guarantees: events from same transaction are ingested together or not at all.
- Idempotency & deduplication: Use `event_id` as primary key. Our storage insert uses `INSERT OR IGNORE` (SQLite) or `ON CONFLICT DO NOTHING` (Postgres) to handle retries safely.
- Ordering: Debezium preserves order within a transaction. Across transactions, order is roughly chronological by `ts_ms`. We'll index by timestamp for queries. If out-of-order delivery occurs, we still maintain insert order via `ingested_at` but business logic should not depend on strict global ordering; use transaction boundaries for grouping.
- Multi-table correlation: Business events often span tables (e.g., sale involves `sales` header + `sales_items` lines). Debezium emits separate row events for each table. Our inference engine can detect a common `transaction_id` and group events from that transaction to construct a higher-level event. Example: if within a transaction we see inserts into `sales` and multiple `sales_items`, we can aggregate into a single `SaleCompleted` business event with line items.
Pharmacy Domain Schema (simplified)
Tables:
- `branches` (branch_id, name, address, city, region, created_at, updated_at, created_by, updated_by)
- `products` (product_id, sku, name, category, unit_price, controlled (bool), created_at, updated_at, created_by, updated_by)
- `inventory` (branch_id, product_id, quantity_on_hand, last_counted_date, ... primary key (branch_id, product_id))
- `sales` (sale_id, branch_id, sale_time, total_amount, payment_method, cashier_user_id, ...)
- `sales_items` (sale_id, product_id, quantity, unit_price, line_total)
- `inventory_adjustments` (adjust_id, branch_id, product_id, old_qty, new_qty, reason_code (e.g., 'SALE', 'DAMAGE', 'THEFT', 'COUNT'), adjusted_by_user_id, adjusted_at, notes)
- `suppliers` (supplier_id, name, contact, ...)
- `purchase_orders` (po_id, supplier_id, order_date, status, ...)
- `po_items` (po_id, product_id, quantity, unit_cost)
- `users` (user_id, username, role, branch_id (if applicable), ...)
- `roles` (role_id, name, permissions)
We'll include standard audit fields: `created_at`, `updated_at`, `created_by`, `updated_by` on most tables. These are crucial for traceability.
DDL will be standard MySQL (InnoDB) with foreign keys.
Synthetic Data Generator Plan
We'll write a generator in Go (or Python) that populates the pharmacy schema with realistic data and then generates day-by-day changes.
Initial load:
- Create ~20 branches (mix of big/small cities)
- ~500 products (categories: OTC, prescription, cosmetics, etc.)
- Inventory for each branch: initial quantities based on product type.
- Users: admins, managers, cashiers (10–30 per branch)
- Purchase orders: weekly patterns
Daily patterns:
- Sales: Poisson distribution by hour (peak 10–13 and 17–20). Cashiers ring up sales; each sale has 1–10 items.
- Inventory adjustments: triggered by sales (quantity decrement) automatically; plus manual adjustments for damages, theft, miscounts (rarer).
- Purchase orders: deliveries arrive daily; PO items increase inventory.
- User actions: logins, profile changes.
Anomaly injection (to test detection):
- Fraudulent employee discount: cashier applies unauthorized discount (field `discount_pct` unusual) — we'd need a field, maybe add `override_reason`.
- Inventory theft: batch of adjustments with reason='THEFT' or negative adjustments without sale linkage, after hours.
- Sudden stockout: inventory of a product drops to zero unexpectedly (maybe due to system error or theft); could be correlated by a spike in sales that day.
- Unusual access: user role change outside business hours.
- Duplicate payments: same payment method used for many high‑value sales in short time (possible refund fraud).
The generator will have flags to enable/disable anomaly types and vary intensity.
Generator pseudocode (language‑agnostic):
initialize_schema()
populate_static_data(branches, products, suppliers, roles, users)
set_current_date(start_date)
while current_date <= end_date:
simulate_sales(current_date)
simulate_inventory_adjustments(current_date)
simulate_purchase_orders(current_date)
simulate_user_activity(current_date)
if anomaly_week:
inject_anomalies(current_date)
commit_transaction_for_day()
advance_date()
We'll implement in Go using `github.com/brianvoe/gofakeit` for realistic names and numbers, or custom.
---
This thinking will be refined into Report 3.