[Query Engine] Improve DistinctCountSmartHLL for dictionary-encoded columns #17411

praveenc7 · 2025-12-22T21:25:24Z

Summary

For dictionary-encoded columns, DISTINCT_COUNT_SMART_HLL currently uses a RoaringBitmap to deduplicate dictionary IDs before feeding values into HLL. While efficient for low cardinality, this approach becomes CPU-intensive for high cardinality (hundreds of thousands to millions of distinct values), where RoaringBitmap insertions dominate query execution time and negate the benefits of HLL.

Proposal

Introduce a cardinality-aware execution path for DISTINCT_COUNT_HLL:

Low cardinality → continue using RoaringBitmap (exact deduplication, memory-efficient)
High cardinality → bypass RoaringBitmap and update HLL directly

Observed improvements

Reduces server-side CPU time by ~4x - 10× for high-cardinality queries (observed improvements from ~8s → ~700ms in prod benchmarks).

Testing Done

Added JMH benchmark covering:

This JMH benchmark isolates server-side aggregation cost for the DistinctCountHLLAggregationFunction under controlled parameters: Each variation was run for 10 minutes

recordCount: {100K, 500K, 1M, 5M, 10M, 25M}
cardinalityRatioPercent: {1, 10, 30, 50, 80, 100} → Creates a record with configured cardinality
useRoaringBitMap/HLL -> Controls on to run the test with useRoaringBitMap or HLL

DictIds are pre-generated so benchmark timing includes only aggregation, not data generation.

Sample plots :

Flame graph after optimization : Aggregate doesn't dominate CPU

Benchmark Results (Average Latency, ms/op)

Record Count = 100,000

Cardinality	RoaringBitmap	Direct HLL
1,000	0.6	0.80
10,000	0.71	0.87
30,000	0.89	0.90
50,000	1.05	0.96
80,000	1.79	1.00
100,000	1.91	1.05

Record Count = 500,000

Cardinality	RoaringBitmap	Direct HLL
5,000	1.45	2.85
50,000	2.36	2.92
150,000	5.53	3.16
250,000	7.26	3.18
400,000	9.59	3.18
500,000	10.69	3.17

Record Count = 1,000,000

Cardinality	RoaringBitmap	Direct HLL
10,000	2.53	5.36
100,000	6.69	5.44
300,000	13.12	5.80
500,000	15.92	5.78
800,000	19.84	5.78
1,000,000	22.11	5.71

Record Count = 5,000,000

Cardinality	RoaringBitmap	Direct HLL
50,000	11.51	25.12
500,000	53.62	25.29
1,500,000	75.60	26.13
2,500,000	92.91	25.53
4,000,000	113.21	25.24
5,000,000	129.34	25.79

Record Count = 10,000,000

Cardinality	RoaringBitmap	Direct HLL
100,000	52.98	50.64
1,000,000	117.68	50.61
3,000,000	161.56	50.08
5,000,000	206.71	51.14
8,000,000	248.77	50.01
10,000,000	278.78	50.37

Record Count = 25,000,000

Cardinality	RoaringBitmap	Direct HLL
250,000	199.06	125.82
2,500,000	348.39	126.40
7,500,000	466.14	124.74
12,500,000	555.77	124.35
20,000,000	679.43	124.99

Recommendation:

Based on the micro-benchmark results across record counts and cardinalities, 100K distinct values is a good default threshold to start with for switching away from the RoaringBitmap path. At this scale, RoaringBitmap remains efficient for low-cardinality cases, while higher cardinalities already show clear benefits from using direct HLL updates. This threshold provides a safe balance between preserving deduplication benefits for low cardinality and avoiding excessive bitmap maintenance cost for high-cardinality workloads

codecov-commenter · 2025-12-22T22:11:00Z

❌ 2 Tests Failed:

Tests completed	Failed	Passed	Skipped
19068	2	19066	56

View the full list of 2 ❄️ flaky test(s)

org.apache.pinot.core.operator.transform.function.NotEqualsTransformFunctionTest::testBinaryOperatorTransformFunction
Flake rate in main: 89.74% (Passed 8 times, Failed 70 times)
Stack Traces | 0.723s run time
expected [false] but found [true]

org.apache.pinot.core.operator.transform.function.NotEqualsTransformFunctionTest::testBinaryOperatorTransformFunctionNoDict
Flake rate in main: 89.74% (Passed 8 times, Failed 70 times)
Stack Traces | 0.034s run time
expected [false] but found [true]

To view more test analytics, go to the Test Analytics Dashboard
_{📋 Got 3 mins? Take this short survey to help us improve Test Analytics.}

praveenc7 added 2 commits December 22, 2025 11:54

[Query Engine] Improve DistinctCountHLL

a06722c

style check

b0d3f88

praveenc7 force-pushed the distinct_hll branch from 547f4b8 to 713700b Compare December 23, 2025 21:02

smart hll

bd97325

praveenc7 force-pushed the distinct_hll branch from 713700b to bd97325 Compare December 23, 2025 22:40

praveenc7 marked this pull request as ready for review December 23, 2025 22:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Query Engine] Improve DistinctCountSmartHLL for dictionary-encoded columns #17411

[Query Engine] Improve DistinctCountSmartHLL for dictionary-encoded columns #17411

praveenc7 commented Dec 22, 2025

Uh oh!

codecov-commenter commented Dec 22, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[Query Engine] Improve DistinctCountSmartHLL for dictionary-encoded columns #17411

Are you sure you want to change the base?

[Query Engine] Improve DistinctCountSmartHLL for dictionary-encoded columns #17411

Conversation

praveenc7 commented Dec 22, 2025

Summary

Proposal

Observed improvements

Testing Done

Benchmark Results (Average Latency, ms/op)

Record Count = 100,000

Record Count = 500,000

Record Count = 1,000,000

Record Count = 5,000,000

Record Count = 10,000,000

Record Count = 25,000,000

Recommendation:

Uh oh!

codecov-commenter commented Dec 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

❌ 2 Tests Failed:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov-commenter commented Dec 22, 2025 •

edited

Loading