PMM-14566 Slower QAN v3 compared to v2. #4837

JiriCtvrtka · 2025-12-12T13:13:03Z

The interval was 5 years, and the instances were running for around 12 hours. Version 2 had a different (simpler) load because of different scripts on our instances. Locally, both endpoints took around 0.3 seconds, even when they were running for days under load.

Version	getFilters AVG (MIN - MAX)	getReport AVG (MIN - MAX)
2.44.2	0.8 (0.5 - 1.9)	1.3 (0.8 - 2.2)
3.4.0	1.4 (0.7 - 1.9)	2.5 (1 - 5)
3.5.0	1.5 (0.7 - 2)	3 (1 - 6)
3.6.0 with indexes	1 (0.6 - 1.9)	2 (0.9 - 5.5)

Since version 2.44.1, we have added many new columns and additional logic there. PMM is also generally more resource-intensive. Because the metrics table is shared across all technologies, for example new MongoDB columns can also affect PostgreSQL and others.

Overall, it looks like adding indexes, fixing race conditions in HA, and other improvements in the current version helped reduce the average and bring it closer to the average in version 2.

No matter what I tried, I was not able to reproduce 12 seconds for getReport and 6 seconds for getFilter. From the HAR file attached by the customer, I can see that out of the total 12.1 seconds for getReport, the waiting time for the server response is only 6.28 seconds, while content download takes almost the other half. See the attached image.

So, in summary, there might be an issue with the network or with system resources. Let’s have the customer update to 3.6.0 and see how it differs for them.

Originally, I wanted to create a specialized view in scope of this PR to improve performance and problem with non related fields for specific technology, but it turned out to be a much larger task. It would require creating multiple views specialized for specific actions/technologies. A separate ticket would probably need to be created if we decide to go this way.

JiriCtvrtka · 2025-12-15T12:35:54Z

qan-api2/migrations/sql/25_optimize_metrics.down.sql

Empty down migration because when indexes are removed nothing else is needs to be done to revert "OPTIMIZE".

qan-api2/migrations/sql/25_optimize_metrics.up.sql

JiriCtvrtka · 2025-12-15T16:06:05Z

qan-api2/migrations/sql/26_optimize_metrics.up.sql

@@ -0,0 +1 @@
+OPTIMIZE TABLE metrics FINAL;


To apply indexes right now on current (historical) data.

Let's drop it because we are not able to reliably test it.

After discussion with Alex we agreed to remove OPTIMIZE and it will be done on background when data are used for time, or manually by user.

ademidoff · 2025-12-17T09:07:07Z

qan-api2/migrations/sql/23_add_index_queryid.up.sql

@@ -0,0 +1 @@
+ALTER TABLE metrics ADD INDEX idx_metrics_queryid queryid TYPE set(100);


Any reason why we chose the specific number of granules for the index?

If I understand correctly, higher granularity improves speed but increases memory consumption. In this case, there should be a balance between performance and memory consumption. Feel free to suggest a different value.

ademidoff · 2026-01-05T09:12:47Z

qan-api2/migrations/sql/26_optimize_metrics.up.sql

@@ -0,0 +1 @@
+OPTIMIZE TABLE metrics FINAL;


Let's drop it because we are not able to reliably test it.

maxkondr · 2026-01-06T15:33:40Z

qan-api2/migrations/sql/24_add_index_period_start.up.sql

@@ -0,0 +1 @@
+ALTER TABLE metrics ADD INDEX idx_metrics_period_start period_start TYPE minmax;


Suggested change

ALTER TABLE metrics ADD INDEX idx_metrics_period_start period_start TYPE minmax;

ALTER TABLE metrics ADD INDEX IF NOT EXISTS idx_metrics_period_start period_start TYPE minmax;

All migrations are applied from scratch, so it should not exists, but lets discuss this. Same for other migrations.

this is always a good approach to make changes idempotent. It really costs nothing but prevents from possible errors

maxkondr · 2026-01-06T15:34:00Z

qan-api2/migrations/sql/25_add_index_service_id.up.sql

@@ -0,0 +1 @@
+ALTER TABLE metrics ADD INDEX idx_metrics_service_id service_id TYPE set(100);


Suggested change

ALTER TABLE metrics ADD INDEX idx_metrics_service_id service_id TYPE set(100);

ALTER TABLE metrics ADD INDEX IF NOT EXISTS idx_metrics_service_id service_id TYPE set(100);

maxkondr · 2026-01-06T15:34:14Z

qan-api2/migrations/sql/23_add_index_queryid.up.sql

@@ -0,0 +1 @@
+ALTER TABLE metrics ADD INDEX idx_metrics_queryid queryid TYPE set(100);


Suggested change

ALTER TABLE metrics ADD INDEX idx_metrics_queryid queryid TYPE set(100);

ALTER TABLE metrics ADD INDEX IF NOT EXISTS idx_metrics_queryid queryid TYPE set(100);

maxkondr · 2026-01-06T15:34:48Z

qan-api2/migrations/sql/23_add_index_queryid.down.sql

@@ -0,0 +1 @@
+ALTER TABLE metrics DROP INDEX idx_metrics_queryid;


Suggested change

ALTER TABLE metrics DROP INDEX idx_metrics_queryid;

ALTER TABLE metrics DROP INDEX IF EXISTS idx_metrics_queryid;

maxkondr · 2026-01-06T15:34:59Z

qan-api2/migrations/sql/24_add_index_period_start.down.sql

@@ -0,0 +1 @@
+ALTER TABLE metrics DROP INDEX idx_metrics_period_start;


Suggested change

ALTER TABLE metrics DROP INDEX idx_metrics_period_start;

ALTER TABLE metrics DROP INDEX IF EXISTS idx_metrics_period_start;

maxkondr · 2026-01-06T15:35:07Z

qan-api2/migrations/sql/25_add_index_service_id.down.sql

@@ -0,0 +1 @@
+ALTER TABLE metrics DROP INDEX idx_metrics_service_id;


Suggested change

ALTER TABLE metrics DROP INDEX idx_metrics_service_id;

ALTER TABLE metrics DROP INDEX IF EXISTS idx_metrics_service_id;

maxkondr · 2026-01-06T16:42:53Z

qan-api2/migrations/sql/23_add_index_queryid.up.sql

@@ -0,0 +1 @@
+ALTER TABLE metrics ADD INDEX idx_metrics_queryid queryid TYPE set(100);


what about materialising the index?

Alex suggested to don’t apply the new indexes to existing data, since we currently can’t properly test the impact/safety of materializing them. See the discussion here: #4837 (comment)

Our data TTL is 30 days, but it looks like our query that drops old partitions doesn’t work reliably. I created a ticket for this: https://perconadev.atlassian.net/browse/PMM-14670

This definitely happens in HA, and it’s likely caused by our approach of taking the partition name, converting it to UInt32, and comparing it to a timestamp. I suspect this logic might be broken even without HA as well.

Because of that, my original idea was to run an OPTIMIZE to force merges (and indirectly rebuild indexes / clean up parts), see the suggested query here: 07ff5eb

but again, we cannot test it.

maxkondr · 2026-01-07T14:08:44Z

Creating index with TYPE bloom_filter or TYPE set(n) for dateTime column and queries like SELECT ... WHERE start_time >= <..> AND start_time <= <...> will have no effect since set() and bloom_filter don't work with ranges (see https://clickhouse.com/docs/optimize/skipping-indexes#skip-index-functions)

In general, set indexes and Bloom filter based indexes (another type of set index) are both unordered and therefore do not work with ranges. In contrast, minmax indexes work particularly well with ranges since determining whether ranges intersect is very fast.

so for column period_start your choice to use TYPE minmax is correct.
Next, what I found:

Summary of Use Cases
- Use a set(N) index when you have a column with a limited number of discrete values 
appearing within each data block (e.g., status codes, error types) and require exact data skipping 
without any false positives.

- Use a bloom_filter index (including variants like ngrambf_v1 for partial text search) for columns with 
a large number of distinct values (e.g., email addresses, UUIDs, full text). It is a space-efficient way 
to filter "needle in a haystack" queries, even with the possibility of minor false positives.

as for service_id - in this case set() can really be useful.
as for queryId - this column (I assume) has very high cardinality, so using set here will not give good benefits and bloom_filter looks like better choice here.

BUT! Everything shall be measured in order to make a final decision.

PMM-14566 Add indexes.

d97115e

JiriCtvrtka mentioned this pull request Dec 12, 2025

PMM-14566 Slower QAN v3 compared to v2. Percona-Lab/pmm-submodules#4151

Draft

JiriCtvrtka and others added 2 commits December 15, 2025 13:13

PMM-14566 Apply index on history data.

e334f5f

Merge branch 'v3' into PMM-14566-slow-QAN

ddb0273

JiriCtvrtka commented Dec 15, 2025

View reviewed changes

qan-api2/migrations/sql/25_optimize_metrics.up.sql Outdated Show resolved Hide resolved

PMM-14566 Empty line.

b7eee32

JiriCtvrtka marked this pull request as ready for review December 15, 2025 16:05

JiriCtvrtka requested a review from a team as a code owner December 15, 2025 16:05

JiriCtvrtka requested review from idoqo and maxkondr and removed request for a team December 15, 2025 16:05

JiriCtvrtka commented Dec 15, 2025

View reviewed changes

JiriCtvrtka requested a review from ademidoff December 15, 2025 16:06

ademidoff reviewed Dec 17, 2025

View reviewed changes

JiriCtvrtka and others added 5 commits December 17, 2025 10:32

Merge branch 'v3' into PMM-14566-slow-QAN

6d8c9c2

PMM-14566 Rename because of new migration merged.

810daee

Merge branch 'v3' into PMM-14566-slow-QAN

aacd067

Merge branch 'v3' into PMM-14566-slow-QAN

2499029

Merge branch 'v3' into PMM-14566-slow-QAN

2d4dd2e

JiriCtvrtka requested a review from ademidoff January 5, 2026 08:09

PMM-14566 Remove OPTIMIZE.

07ff5eb

ademidoff approved these changes Jan 5, 2026

View reviewed changes

JiriCtvrtka added 2 commits January 6, 2026 11:48

Merge branch 'v3' into PMM-14566-slow-QAN

4ccbb18

Merge branch 'v3' into PMM-14566-slow-QAN

9319f79

maxkondr reviewed Jan 6, 2026

View reviewed changes

		@@ -0,0 +1 @@
		ALTER TABLE metrics ADD INDEX idx_metrics_queryid queryid TYPE set(100);

	ALTER TABLE metrics ADD INDEX idx_metrics_period_start period_start TYPE minmax;
	ALTER TABLE metrics ADD INDEX IF NOT EXISTS idx_metrics_period_start period_start TYPE minmax;

		@@ -0,0 +1 @@
		ALTER TABLE metrics DROP INDEX idx_metrics_queryid;

	ALTER TABLE metrics DROP INDEX idx_metrics_queryid;
	ALTER TABLE metrics DROP INDEX IF EXISTS idx_metrics_queryid;

PMM-14566 Slower QAN v3 compared to v2. #4837

Are you sure you want to change the base?

PMM-14566 Slower QAN v3 compared to v2. #4837

Conversation

JiriCtvrtka commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JiriCtvrtka Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JiriCtvrtka Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maxkondr commented Jan 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

JiriCtvrtka commented Dec 12, 2025 •

edited

Loading

JiriCtvrtka Jan 6, 2026 •

edited

Loading

JiriCtvrtka Jan 6, 2026 •

edited

Loading