Skip to content

Conversation

@praveenc7
Copy link
Contributor

Summary

ISSUE=#17336

For dictionary-encoded columns, DISTINCT_COUNT_SMART_HLL currently uses a RoaringBitmap to deduplicate dictionary IDs before feeding values into HLL. While efficient for low cardinality, this approach becomes CPU-intensive for high cardinality (hundreds of thousands to millions of distinct values), where RoaringBitmap insertions dominate query execution time and negate the benefits of HLL.

Screenshot 2025-12-16 at 3 59 43 PM

Proposal

Introduce a cardinality-aware execution path for DISTINCT_COUNT_HLL:

  • Low cardinality → continue using RoaringBitmap (exact deduplication, memory-efficient)
  • High cardinality → bypass RoaringBitmap and update HLL directly

Observed improvements

  • Reduces server-side CPU time by ~4x - 10× for high-cardinality queries (observed improvements from ~8s → ~700ms in prod benchmarks).

Testing Done

Added JMH benchmark covering:

This JMH benchmark isolates server-side aggregation cost for the DistinctCountHLLAggregationFunction under controlled parameters: Each variation was run for 10 minutes

recordCount: {100K, 500K, 1M, 5M, 10M, 25M}
cardinalityRatioPercent: {1, 10, 30, 50, 80, 100} → Creates a record with configured cardinality
useRoaringBitMap/HLL -> Controls on to run the test with useRoaringBitMap or HLL

DictIds are pre-generated so benchmark timing includes only aggregation, not data generation.

Sample plots :
Screenshot 2025-12-16 at 3 58 39 PM

Flame graph after optimization : Aggregate doesn't dominate CPU
Screenshot 2025-12-16 at 4 01 42 PM

Benchmark Results (Average Latency, ms/op)

Record Count = 100,000

Cardinality RoaringBitmap Direct HLL
1,000 0.6 0.80
10,000 0.71 0.87
30,000 0.89 0.90
50,000 1.05 0.96
80,000 1.79 1.00
100,000 1.91 1.05

Record Count = 500,000

Cardinality RoaringBitmap Direct HLL
5,000 1.45 2.85
50,000 2.36 2.92
150,000 5.53 3.16
250,000 7.26 3.18
400,000 9.59 3.18
500,000 10.69 3.17

Record Count = 1,000,000

Cardinality RoaringBitmap Direct HLL
10,000 2.53 5.36
100,000 6.69 5.44
300,000 13.12 5.80
500,000 15.92 5.78
800,000 19.84 5.78
1,000,000 22.11 5.71

Record Count = 5,000,000

Cardinality RoaringBitmap Direct HLL
50,000 11.51 25.12
500,000 53.62 25.29
1,500,000 75.60 26.13
2,500,000 92.91 25.53
4,000,000 113.21 25.24
5,000,000 129.34 25.79

Record Count = 10,000,000

Cardinality RoaringBitmap Direct HLL
100,000 52.98 50.64
1,000,000 117.68 50.61
3,000,000 161.56 50.08
5,000,000 206.71 51.14
8,000,000 248.77 50.01
10,000,000 278.78 50.37

Record Count = 25,000,000

Cardinality RoaringBitmap Direct HLL
250,000 199.06 125.82
2,500,000 348.39 126.40
7,500,000 466.14 124.74
12,500,000 555.77 124.35
20,000,000 679.43 124.99

Recommendation:

Based on the micro-benchmark results across record counts and cardinalities, 100K distinct values is a good default threshold to start with for switching away from the RoaringBitmap path. At this scale, RoaringBitmap remains efficient for low-cardinality cases, while higher cardinalities already show clear benefits from using direct HLL updates. This threshold provides a safe balance between preserving deduplication benefits for low cardinality and avoiding excessive bitmap maintenance cost for high-cardinality workloads

@codecov-commenter
Copy link

codecov-commenter commented Dec 22, 2025

❌ 2 Tests Failed:

Tests completed Failed Passed Skipped
19068 2 19066 56
View the full list of 2 ❄️ flaky test(s)
org.apache.pinot.core.operator.transform.function.NotEqualsTransformFunctionTest::testBinaryOperatorTransformFunction

Flake rate in main: 89.74% (Passed 8 times, Failed 70 times)

Stack Traces | 0.723s run time
expected [false] but found [true]
org.apache.pinot.core.operator.transform.function.NotEqualsTransformFunctionTest::testBinaryOperatorTransformFunctionNoDict

Flake rate in main: 89.74% (Passed 8 times, Failed 70 times)

Stack Traces | 0.034s run time
expected [false] but found [true]

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

@praveenc7 praveenc7 marked this pull request as ready for review December 23, 2025 22:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants