Database Speed Depends on Where Data Lives

TL;DR

New research shows adaptive query processing differs for disk vs in-memory databases, helping developers pick the right optimization strategy.

Database management systems are crucial for handling complex data queries in industries ranging from finance to healthcare, but they often struggle with inaccurate predictions about data sizes, leading to slow performance. A new analysis of adaptive query processing (AQP) strategies shows that the effectiveness of these techniques depends heavily on whether data is stored on disk or in memory, offering clear guidance for database developers on where to focus optimization efforts. This research, conducted by a team from the University of Edinburgh and Huawei, systematically evaluates plan-based AQP, a that refines query execution by splitting plans and updating statistics during runtime, revealing divergent performance sources across storage architectures.

The key finding from the study is that plan-based AQP provides overall speedups in both on-disk and in-memory database systems, but the reasons for improvement differ significantly. In the on-disk DBMS, PostgreSQL, performance gains primarily come from query plan reorderings, not from updating cardinalities; in fact, updating cardinalities introduces measurable overhead. Conversely, in the in-memory DBMS, DuckDB, cardinality refinement drives significant performance improvements for most queries. For example, in the JOB benchmark, AQP-DuckDB achieved a 1.36x speedup over vanilla DuckDB, while AQP-PostgreSQL showed a 1.33x improvement, but with different underlying causes. The research also found that plan-based AQP outperforms a state-of-the-art relation-based AQP , Polar, especially for queries with complex joins, highlighting its advantage in multi-join scenarios.

Ology involved implementing and evaluating plan-based AQP on both PostgreSQL and DuckDB, using a modular design that allowed isolation of key components: the plan splitter, sub-plan selector, and monitor. The researchers adopted QuerySplit as a representative state-of-the-art plan-based AQP for PostgreSQL and extended it to support module-level isolation. For DuckDB, they proposed a new design that replaces the DAG generator with a join reorder module to avoid limitations like foreign key constraints and support non-SPJ queries. Experiments were conducted on the Join Order Benchmark (JOB) and Decision Support Benchmark (DSB) with scale factors of 10 and 100, measuring end-to-end and execution times with tools like hyperfine and high-resolution clocks, ensuring reproducibility across different DuckDB versions (v0.6.1, v0.10.1, v1.3.2).

Analysis, detailed in figures such as Figure 7a and Figure 8a, showed that in PostgreSQL, the monitor component slowed performance by 0.97x to 0.98x due to overhead from creating temporary tables, while in DuckDB, updating cardinalities sped up most queries, with improvements up to 25.68x in DSB with scale factor 100. The sub-plan selector's effect varied: PostgreSQL gained speedups from join reordering, but DuckDB did not show clear benefits except in extreme cases like queries 1_050.sql and 2_050.sql, which saw dramatic improvements. Plan splitting introduced overhead for both systems, more severely in PostgreSQL (0.51x slowdown in JOB) due to I/O costs, as shown in Figure 10a. Comparison with Polar in Figure 11 revealed that plan-based AQP achieved better speedups (1.36x vs. 1.09x) for the JOB benchmark, with more queries benefiting from the approach.

Of this research are significant for database system developers, as it provides a unified module-level representation to guide trade-offs between implementation effort and performance gains. For on-disk DBMSs, developers might prioritize heuristic join reordering over cardinality updates to avoid overhead, while for in-memory DBMSs, focusing on cardinality refinement can yield substantial benefits. The study also suggests that plan-based AQP is more effective than relation-based s for complex queries, offering a path for future optimizations in query processing. By understanding these differences, practitioners can make informed decisions about integrating AQP strategies into existing engines, potentially improving efficiency in data-intensive applications without extensive modifications.

Limitations of the study include the inherent design flaws of QuerySplit, such as its reliance on foreign key constraints and inability to handle star-like schemas or non-SPJ operators effectively. The plan splitting strategy disrupts pipeline execution, causing performance loss, especially in on-disk systems where temporary table creation adds I/O overhead. Additionally, the research focused on specific benchmarks and DBMS versions, and may vary with other workloads or systems. Future work could explore intelligent split point selection or co-design with executors to mitigate overhead, as noted in the paper's conclusion. open avenues for enhancing AQP s but also highlight the need for careful consideration of storage architecture when applying these techniques.

About the Author

Guilherme A.

Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.

Connect on LinkedIn