Why We Built Laketap
Where acceleration has to live in a lakehouse
In this series: Acceleration in Modern Lakehouse Platforms
- 1. Why Acceleration Should Be Invisible
- 2. What Lakehouse Enables — and What It Breaks
- 3. Why Lakehouse Performance Is a Coordination Problem
- 4. Why We Built Laketap
The question lakehouse leaves unanswered
After decoupling storage from execution, the lakehouse fundamentally changed how data platforms are built.
Data became shared by default. Engines became interchangeable. Workloads gained flexibility.
What did not change is how performance is reasoned about.
Acceleration still tends to be introduced at the level of individual engines or individual workloads. Performance improvements remain tied to execution contexts, even as data access becomes shared.
By this point in the series, the implication should be clear:
Acceleration cannot be workload-specific. It cannot be engine-local. And it cannot require migration.
If performance is a coordination problem, the real question is no longer how to optimize execution.
It is where coordination can actually live.
Why engines are the wrong boundary
Execution engines are where performance is most visible, so they are often the first place optimization is attempted.
Engines own execution. They schedule work, manage memory, and run operators efficiently. Many performance improvements can only happen there.
But engines are also intentionally scoped.
Each engine sees a single plan. A bounded execution context. A moment in time.
They do not see access patterns across engines. They do not retain memory across workloads. They cannot coordinate decisions over time.
This is not a failure of engine design. It is a consequence of abstraction boundaries.
Engines are built to optimize execution. Coordination requires a vantage point that engines deliberately do not have.
Why storage becomes the wrong boundary
If engines are too local, storage is often proposed as the alternative.
Storage is shared. It is long-lived. It sits beneath all engines.
At first glance, it appears to be the natural place for coordination.
And in principle, storage can become semantic-aware.
A storage system can learn about tables, versions, access patterns, and even query intent. Recent systems have shown that it is possible to push increasingly rich semantics into the storage layer.
But this path comes with a fundamental trade-off.
To coordinate access effectively, storage must absorb far more than bytes and blocks. It must internalize table semantics, snapshot lifecycles, planning decisions, and engine-specific behavior. Over time, it begins to take on responsibilities traditionally owned by catalogs, planners, and execution engines.
At that point, storage stops being a general abstraction.
It becomes a platform.
This is not inherently wrong. But it is a deliberate architectural choice with clear consequences. Once coordination logic is embedded into storage, data access, semantics, and execution assumptions become tightly coupled. The flexibility that made the lakehouse attractive in the first place starts to erode.
Coordination can live in storage. But making storage the coordination layer requires turning it into something else entirely.
The missing layer in the lakehouse stack
What the lakehouse lacks is not capability, but placement.
Coordination needs to satisfy three conditions at once:
-
Cross-engine visibility It must observe access across multiple runtimes.
-
Memory across workloads and time Decisions learned once must be reusable later.
-
Alignment with data semantics Coordination must understand tables, snapshots, files, and access patterns—not just execution plans.
No existing layer in the lakehouse stack satisfies all three without breaking its abstraction.
Engines are too scoped. Storage becomes too opinionated. Catalogs are descriptive, not operational.
The gap is structural.
Where acceleration actually belongs
Acceleration must live above execution engines, but below user workloads.
It must observe how shared data is accessed across engines. It must retain memory across queries and over time. And it must operate without forcing migration or architectural change.
In other words, acceleration must exist as a platform-level coordination layer.
Not a new engine. Not a new storage system. But an infrastructure layer that coordinates access rather than owning execution.
This placement preserves the lakehouse’s core strengths while addressing what it leaves unresolved.
Placing coordination — precisely
At this point, the shape of the solution should be clear.
Coordination cannot live inside execution engines, because they are intentionally scoped to single workloads and moments in time. It cannot fully live inside storage, because doing so would collapse storage into a platform.
Coordination therefore has to live between these layers.
This is where Laketap sits.
Laketap does not replace engines or storage. It does not introduce a new execution model or a new data format. Instead, it occupies a narrow but critical position in the lakehouse stack: a platform-level layer that observes access, retains memory, and reuses decisions across engines and workloads.
Once coordination is placed there, acceleration stops fragmenting.
Performance improvements become reusable rather than rediscovered. They persist across engines rather than resetting at boundaries. And they accrue to the platform, not to individual workloads.
This is not a new way to optimize execution. It is a different place to coordinate access.
That placement is the reason Laketap exists.