Your Lakehouse Catalog Matters More Than Your Table Format

Everyone spent the last three years arguing about Iceberg versus Delta Lake versus Hudi. Meanwhile, the layer that actually determines whether you can swap engines, enforce access control, and avoid vendor lock-in quietly became the real battleground. That layer is the catalog — and picking the wrong one will cost you more than any table format decision ever could.

The Format War Is Basically Over

Databricks shipped UniForm, which mirrors Delta tables as Iceberg. Snowflake now reads Delta-based Iceberg tables with deletion vectors (as of March 2026). Hudi can export Iceberg-compatible metadata. The three formats are converging fast enough that betting your architecture on format purity is a waste of strategic energy.

The question that actually matters: who controls the metadata, and how do they hand out credentials?

Five Catalogs, Five Different Bets

Here's what the field looks like in early 2026.

Apache Polaris is Iceberg-only, implementing the Iceberg REST Catalog API. That narrowness is its superpower — any engine speaking Iceberg REST (north of 20 at last count) works out of the box. Snowflake open-sourced it in mid-2024 and donated it to Apache. Still incubating, but AWS, Google Cloud, Stripe, and IBM all contribute code. If you want pure multi-engine freedom and don't need Delta support, Polaris is the cleanest bet.

Databricks Unity Catalog goes wider. It governs Delta tables natively, bolts on Iceberg through UniForm, and stretches beyond tabular data to cover ML models, functions, and unstructured files. The open-source edition supports five engines — Spark, Daft, DuckDB, PuppyGraph, SpiceAI. The managed version on Databricks supports more, but that's precisely where the lock-in lives. Choose Unity when your org already runs on Databricks and you need governance over AI assets, not just tables.

Project Nessie takes a wildly different approach: Git-style branching for your data. Create a branch, run experimental transforms on an isolated snapshot, merge when satisfied, roll back a bad write with a reset. For teams that want version control over their lake — not just their dbt repo — Nessie is the most mature option. It pairs naturally with Polaris or Dremio as the query layer.

Apache Gravitino is about federation. Data scattered across three clouds, two regions, and four catalogs? Gravitino wraps them in a single namespace. It handles Iceberg, Hudi, and Apache Paimon (no Delta — the project's roots in the Chinese open-source ecosystem skew toward Paimon instead). Along with Unity, it's one of only two catalogs with dedicated support for unstructured objects, which matters if you're building RAG pipelines that mix Parquet tables with PDF corpora.

AWS Glue is the pragmatist's default. It covers all three formats, integrates DeeQu-based data quality assertions that no competitor matches, and has freshness monitoring baked in. You're trading multi-cloud portability for the lowest-friction path on AWS. For teams that aren't leaving Amazon anytime soon, Glue is hard to argue against on pure operational simplicity.

Credential Vending: The Detail Everyone Skips

This is the part most comparison posts gloss over, and it's the one that will wake you up at 2 AM during a compliance audit.

When an external engine — say, a new Trino cluster your platform team just spun up — queries your lakehouse, it needs credentials to read the underlying Parquet files in S3 or GCS. Two catalogs handle this correctly: Polaris and Unity vend temporary, scoped storage credentials directly to query engines. The engine never sees your root storage keys. The catalog becomes a real access control boundary, not just a metadata index.

Glue, DataHub, Atlan? They enforce governance at the metadata layer only. If someone configures a query engine with a broad IAM role — and believe me, someone always does — that engine can bypass the catalog and hit storage directly. I've watched this happen at two different companies where a freshly deployed Trino instance sailed past all the carefully configured access policies because the underlying S3 permissions were too generous.

For regulated industries — fintech, healthcare, anything touching PII — credential vending isn't a nice-to-have. It's the difference between "we govern our data" and "we govern our metadata and hope nobody looks too closely at the IAM policy."

Picking Your Catalog

No silver bullet, but here's the decision framework I'd use:

Your situation	Choose	Reason
Pure Iceberg, multi-engine flexibility	Polaris	REST API compatible with 20+ engines
Databricks-heavy, AI asset governance	Unity Catalog	Tables + models + files, one control plane
Need branch-and-merge for data	Nessie	Git semantics for your warehouse
Multi-cloud, multi-region federation	Gravitino	Single namespace across clouds
AWS-native, built-in quality checks	Glue	DeeQu assertions, lowest AWS friction

The mistake I see teams make: choosing their catalog based on their current engine rather than their future optionality. You run Spark today. Next quarter someone wants Flink for streaming. The quarter after that, a data scientist needs DuckDB for local prototyping. The catalog is what makes those transitions painless — or a three-sprint migration project.

Watch the REST Protocol

The Iceberg REST Catalog protocol is becoming the TCP/IP of lakehouse metadata. Polaris implements it natively. Unity added REST support. Gravitino can serve as an Iceberg REST catalog server. Nessie speaks it too.

Once every catalog speaks the same wire protocol, competition shifts to governance features, operational tooling, and ecosystem integrations. The table format wars taught us that standards eventually win. The catalog wars are headed down the same path — the question is whether you pick a catalog on the right side of that convergence, or end up maintaining the next Hive Metastore: everywhere, but painful.

The Iceberg Summit in San Francisco (April 8–9) will accelerate this. Polaris 1.4.0 planning is solidifying, and the core maintainers will be in the same room for two days. If you're making a catalog decision in Q2 2026, that's the week to pay attention to.

#The Format War Is Basically Over

#Five Catalogs, Five Different Bets

#Credential Vending: The Detail Everyone Skips

#Picking Your Catalog

#Watch the REST Protocol

The Format War Is Basically Over

Five Catalogs, Five Different Bets

Credential Vending: The Detail Everyone Skips

Picking Your Catalog

Watch the REST Protocol