AI Cuts Database Query Traffic by Skipping Extra Steps

TL;DR

A new method pushes only part of an aggregation through joins, cutting shuffle operations and network load without losing accuracy.

In the world of big data, every network shuffle—the movement of data between servers—adds time and cost to processing queries. A new optimization technique called partial partial aggregates (PPA) addresses this by streamlining how databases handle aggregations after joins, potentially speeding up distributed query engines without requiring hardware upgrades. This approach is particularly relevant as data volumes grow and organizations seek more efficient ways to analyze information across distributed systems, where reducing unnecessary data movement can lead to significant performance gains and lower operational expenses.

The researchers found that in distributed query engines, a common scenario involves aggregating data after a join operation, which typically requires two network shuffles: one for the join and another for the aggregation. When pushing a full aggregate below the join to reduce input data, a third shuffle is introduced unless specific conditions allow the top aggregate to be eliminated. PPA avoids this extra shuffle by pushing only the local compute phase of the aggregation through the join, keeping the total shuffle count at two while still achieving data reduction. This is applicable when the join key is not fully included in the grouping key or when the join is not a foreign key-primary key equijoin, cases where a full pushdown would otherwise add wasteful overhead.

Ology relies on the distributive property of aggregate functions like SUM, COUNT, MIN, and MAX, which allows intermediate computations to be combined without affecting the final result. In PPA, the COMPUTE operator performs local accumulation on data before the join, outputting key-accumulator pairs without finalizing aggregation across nodes. After the join, which may introduce duplicates, a post-join COMPUTE re-aggregates the data, followed by a DISTRIBUTE and MERGE to produce the final result. The technique was implemented using a cost-based decision framework in an optimizer like Calcite's VolcanoPlanner, which evaluates strategies such as no pushdown, full pushdown (PA), and PPA based on factors like reduction ratio and network shuffle costs.

Analysis of shows that PPA is beneficial when the top aggregate cannot be eliminated, as it maintains the same two-shuffle structure as no pushdown but reduces the data volume processed by the join. For example, in a query grouping by category after a join, where the join key product_id is not included in the grouping key, PPA avoids the extra shuffle introduced by full pushdown. The decision tree visualization in the paper illustrates this with cost suffixes: in cases where j ⊆ g (join key included in grouping key) and the join is FK-PK, PA is preferred as it eliminates the top aggregate entirely, but in other configurations, PPA offers a lower-cost alternative. The reduction ratio, calculated as ndv(grouping keys) divided by input rows, determines when COMPUTE effectively reduces data, with accurate NDV estimation being critical for making optimal choices.

Of this research extend to real-world applications in distributed databases and data processing platforms, where reducing network traffic can lower latency and resource usage. For instance, in GPU-accelerated query engines like Theseus, PPA helps minimize data movement, directly reducing GPU memory pressure and improving performance for large-scale analytics. By enabling more efficient query execution, this technique can benefit industries relying on fast data analysis, such as e-commerce for sales aggregations or logistics for inventory tracking, without requiring changes to existing infrastructure or data schemas.

Limitations of the approach include its dependence on accurate NDV estimation for grouping keys, as poor estimates can lead to suboptimal decisions where COMPUTE does not significantly reduce data volume. The paper notes that for sorted or pseudo-sorted columns, each batch may contain mostly unique values, making COMPUTE's reduction negligible and PPA less beneficial. Additionally, the technique assumes aggregate functions are distributive, limiting its applicability to functions like AVG that must be rewritten, and it may not be worthwhile when the build side of a join is small enough for a broadcast join regardless of pushdown. Future work could explore integrating PPA with adaptive execution systems to dynamically adjust strategies based on runtime data characteristics.

About the Author

Guilherme A.

Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.

Connect on LinkedIn