← Back to project

SQL Server & Oracle CDC Tooling Summary



This report provides a focused, practical assessment of Change Data Capture (CDC) options for Microsoft SQL Server and Oracle databases, specifically for an on-prem MVP overlay. It expands on the broader landscape from Report 1 with deeper implementation guidance and clear recommendations.

---

SQL Server CDC



SQL Server includes a built-in CDC feature (since SQL Server 2008) that captures row-level changes and stores them in system change tables. This is the most straightforward path for a SQL Server‑backed overlay.

How It Works



Enabling CDC



One‑time DBA steps:

sql
USE YourDatabase;
EXEC sys.sp_cdc_enable_db;
EXEC sys.sp_cdc_enable_table
@source_schema = N'dbo',
@source_name = N'YourTable',
@role_name = NULL; -- optional: restrict access to a role


This creates the capture instance and associated change table. A cleanup job (SQL Server Agent) automatically removes old data based on retention (default 3 days, configurable).

Required Privileges



Polling Implementation



Our overlay will contain a SQL Server poller component (written in Go using the `database/sql` driver). It:

1. Stores the last processed LSN (Log Sequence Number) in the overlay’s own database (or a small file).
2. Periodically (e.g., every 10 seconds) calls `sys.fn_cdc_get_min_lsn('all')` to get the current minimum LSN available.
3. Calls `cdc.fn_cdc_get_all_changes_<capture_instance>(@from_lsn, @to_lsn)` to fetch changes since last run.
4. Converts each row into a CanonicalChangeEvent (including `before`/`after` images derived from the change table columns).
5. Updates the stored LSN to the highest `__$start_lsn` seen, after successful processing.
6. Handles idempotency by generating a UUID for each change based on LSN + sequence.


Idempotency: The combination of `__$start_lsn`, `__$seqval`, and table name uniquely identifies a change. Our `EventID` can be a hash of these. If the poller crashes after storing events but before updating the LSN, we may re‑process some changes; `INSERT OR IGNORE` by `EventID` ensures no duplicates.

Ordering: Within a transaction, changes to multiple tables share the same LSN range but may have different sequence values. For business event inference, we will group events that share the same LSN (or a transaction identifier derived from LSN and commit time). SQL Server CDC does not provide a single transaction ID; we can approximate by grouping events whose LSN range overlaps exactly and whose commit time is within a small window. For MVP, we assume each transaction’s changes appear together in the same poll batch (they usually do).

Pros and Cons



| Pros | Cons |
|------|------|
| Built‑in, no external software | Requires SQL Server Agent for cleanup (usually present) |
| Supports all editions (Developer, Enterprise; Standard from 2016 SP1) | Polling introduces latency (up to poll interval) |
| Change tables are queryable; easy to test and debug | Not a push mechanism; we must poll frequently |
| Privileges for poller are read‑only on change tables | Enabling CDC requires one‑time DBA effort (may be a hurdle) |
| Works over standard TDS protocol | Some overhead on primary (change tables grow, but cleanup mitigates) |


Comparison with Debezium SQL Server Connector



Debezium also supports SQL Server, but it relies on the same CDC feature and streams changes to Kafka. It does not read the transaction log directly. Using Debezium would mean:


Given our goal to minimize dependencies, the native CDC + custom Go poller is the better MVP choice. It avoids Kafka and integrates cleanly with our overlay.


Recommended MVP Approach



---


Oracle CDC



Oracle’s primary CDC technology is GoldenGate, a separate, licensed product that captures redo logs and streams changes to diverse targets. Oracle also offers XStream, an API for accessing redo data, which the Debezium Oracle connector can use. However, both options present significant challenges for an on‑prem MVP targeting mid‑size enterprises.

Oracle GoldenGate



For an MVP overlay, GoldenGate is overkill and costly. We would only consider it if a customer already had GoldenGate licensed and wanted us to consume its output (e.g., to Kafka). That’s a later‑phase enterprise deal, not MVP.


Debezium Oracle Connector



For a small team building an MVP, the Debezium Oracle connector is not recommended due to:

Custom Redox Log Parser?



Theoretically we could build a custom Oracle CDC reader using the Oracle LogMiner API directly in Go. This would be a massive effort, requiring deep Oracle expertise and careful performance tuning. Not viable for MVP.

Verdict: Oracle Out of Scope



Given the target market (Peru/LATAM mid‑size enterprises), Oracle deployments are less common and usually come with existing replication tools (like GoldenGate). Our MVP should exclude Oracle initially. We can document a future path:


For now, mark Oracle as not supported.


---

Comparative Summary



| DBMS | Recommended MVP Approach | Requires Kafka? | Key Privileges | Maturity | On‑Prem Fit | Notes |
|-----------|---------------------------------------|-----------------|---------------------------------------------------|----------|-------------|-------|
| MySQL | Debezium Server (HTTP sink) | No | REPLICATION SLAVE + SELECT on tables | High | Excellent | Avoid Maxwell offset writes; use Debezium |
| SQL Server| Native CDC + custom Go poller | No | db_owner to enable; poller needs SELECT on change tables | High | Very Good | One‑time DBA enable; poller is lightweight |
| Postgres | wal2json + Go replication consumer | No | REPLICATION role (or superuser) | Medium | Good | Requires superuser for replication slot; later phase |
| Oracle | None (defer) or GoldenGate (costly) | Maybe (GoldenGate) | GoldenGate/XStream privileges (complex) | Medium | Poor | GoldenGate expensive; Debezium Oracle connector complex |


---

Final Recommendations



These choices align with the overall goal of minimal dependencies, on‑prem friendliness, and quick time‑to‑value.


---

Word count: ~1,00