Flink CDC 3.6.0: Oracle Finally Gets a Real Pipeline Connector

If you've been duct-taping Oracle CDC into Flink pipelines using the DataStream API and custom Debezium wrappers, version 3.6.0 just eliminated a few hundred lines of your glue code. The release dropped on March 30, and the headline feature — a proper Oracle pipeline connector — is the kind of boring infrastructure improvement that actually changes how teams build production CDC pipelines.

What "pipeline connector" actually means

The Oracle CDC source connector has existed in Flink CDC for a while. You could read change events from Oracle's LogMiner, deserialize them, and route them wherever. But it was a DataStream source only. You wrote Java, you handled schema mapping, you built your own routing logic.

Pipeline connectors are different. They plug into Flink CDC's YAML-based pipeline framework — the same declarative system that MySQL and PostgreSQL sources already use. A full Oracle-to-downstream sync now looks like this:

source:
  type: oracle
  url: jdbc:oracle:thin:@//dbhost:1521/ORCL
  username: flinkuser
  password: ${ORACLE_PASSWORD}
  schema-list: HR
  table-list: HR.EMPLOYEES, HR.DEPARTMENTS
  scan.startup.mode: initial

sink:
  type: hudi
  catalog:
    type: hive
    uri: thrift://metastore:9083

pipeline:
  schema.change.behavior: lenient
  parallelism: 4

That's it. No Java. No custom SerializationSchema. No hand-rolled table routing. The framework handles schema discovery, snapshot splitting, and change event routing.

The Oracle setup tax hasn't changed

Don't let the YAML simplicity fool you. Oracle's side of this is still Oracle. You need archive logging enabled, supplemental logging turned on for every captured table, and a user with a terrifyingly long list of grants — EXECUTE ON DBMS_LOGMNR, SELECT on half a dozen V$ views, FLASHBACK ANY TABLE for the snapshot phase.

The connector uses LogMiner under the hood (via Debezium), which means single-threaded change stream reading. One task receives all change events. You can parallelize the initial snapshot with scan.incremental.snapshot.enabled: true and tune chunk.size, but the ongoing CDC stream is inherently serial. For most OLTP workloads this is fine. For a table doing 50k updates per second, you'll hit a ceiling.

Two gotchas worth flagging:

Checkpointing during snapshot is disabled. If your initial snapshot takes 4 hours, you're running without checkpoints for 4 hours. Set execution.checkpointing.tolerable-failed-checkpoints high and pray nothing else goes wrong in that window.
Oracle JDBC driver licensing. The ojdbc8 driver ships under Oracle's FUTC license, not Apache. You need to manually add it to your Flink lib directory. Your legal team may have opinions about this.

Also, only Oracle 9i through 12c have been formally tested. Running against 19c or 21c? It probably works — Debezium's Oracle connector supports them — but you're outside the tested matrix for the pipeline connector specifically.

Schema evolution that doesn't blow up your pipeline

This is where 3.6.0 gets genuinely interesting beyond the Oracle connector. The release continues pushing lenient mode as the production-safe default for schema evolution, and there's now a JIRA ticket (FLINK-36128) to make it the official default behavior.

Here's why lenient mode matters. The old default, evolve, applies every upstream DDL change to your sink. Someone adds a column? Propagated. Someone drops a column? Also propagated. Someone runs TRUNCATE TABLE on the source during a migration rehearsal? Your sink table gets truncated too.

Lenient mode is more careful:

Schema Event	Evolve	Lenient
Add column	Applied	Applied
Rename column	Applied	Old column kept, new column added
Alter column type	Applied (may fail)	Converted to rename + add
Drop column	Applied	Ignored
Truncate table	Applied	Ignored
Drop table	Applied	Ignored

The key insight: lenient mode never destroys data downstream. A type change becomes a rename-and-add — you keep the old column with existing data and get a new column with the new type. Drops and truncates are silently swallowed. Your downstream consumers keep working until they're explicitly updated.

For teams running CDC into a lakehouse where analysts query the sink tables directly, this is the correct default. An unexpected DROP COLUMN from a developer on the source database shouldn't cascade into a broken Tableau dashboard.

You can also get granular with per-event filtering:

sink:
  include.schema.changes: [create.table, add.column]
  exclude.schema.changes: [drop]

That drop shorthand catches both drop.column and drop.table — partial matching is supported.

Oracle → Flink CDC → Hudi: the full path

The other headline addition is the Apache Hudi sink pipeline connector. Combined with the Oracle source, you now have a declarative, zero-Java path from Oracle to a Hudi lakehouse.

Why does this combination matter? A lot of enterprises are trying to get off Oracle — or at least reduce their dependency — by replicating transactional data into a lakehouse layer where Spark, Trino, or Presto can handle analytics. Previously, this meant either Debezium → Kafka → Spark Structured Streaming → Hudi (four moving pieces) or a commercial CDC tool like Striim or Qlik Replicate.

Flink CDC 3.6.0 collapses that to one job definition. The pipeline framework handles writing Hudi's MOR or COW tables, managing metadata, and integrating with your Hive metastore. It's not magic — you still need to understand Hudi's compaction, cleaning policies, and read-optimized vs real-time query trade-offs — but the plumbing layer just got dramatically simpler.

The Paimon sink also gained VARIANT type support in this release, so if you're using Paimon instead of Hudi for your lakehouse layer, semi-structured Oracle data (JSON columns, CLOBs) can flow through without schema gymnastics.

JDK 11: check your Flink clusters

A quick but important note — 3.6.0 bumps the minimum JDK requirement to 11. If you're still running Flink on JDK 8 (and plenty of production clusters are), this upgrade is a prerequisite. DataStream-based CDC jobs also need the JDK bump plus updated connector dependencies.

Not a big deal technically, but it's the kind of thing that blocks a Tuesday afternoon upgrade when someone discovers the mismatch at deployment time.

Who should care

If you're running Oracle as a source system and already have Flink infrastructure, 3.6.0 removes the biggest gap in the pipeline connector ecosystem. The YAML pipeline approach means your CDC configuration lives in version-controlled config files instead of compiled Java artifacts — easier to review, easier to deploy, easier for the on-call engineer who didn't write it.

If you're evaluating CDC tools and Oracle is in the mix, this puts Flink CDC in the same conversation as Debezium-on-Kafka and the commercial vendors. The single-threaded LogMiner limitation is real, but for the vast majority of Oracle CDC use cases — dozens of tables, moderate change rates — the pipeline connector is more than adequate.

The schema evolution improvements alone are worth the upgrade even if you're not touching Oracle. Lenient mode with per-event filtering gives you production-safe defaults without sacrificing flexibility. That's a rare combination in this space.

#What "pipeline connector" actually means

#The Oracle setup tax hasn't changed

#Schema evolution that doesn't blow up your pipeline

#Oracle → Flink CDC → Hudi: the full path

#JDK 11: check your Flink clusters

#Who should care