-
Notifications
You must be signed in to change notification settings - Fork 186
PMM-14566 Slower QAN v3 compared to v2. #4837
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: v3
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Empty down migration because when indexes are removed nothing else is needs to be done to revert "OPTIMIZE".
| @@ -0,0 +1 @@ | |||
| OPTIMIZE TABLE metrics FINAL; | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To apply indexes right now on current (historical) data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's drop it because we are not able to reliably test it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After discussion with Alex we agreed to remove OPTIMIZE and it will be done on background when data are used for time, or manually by user.
| @@ -0,0 +1 @@ | |||
| ALTER TABLE metrics ADD INDEX idx_metrics_queryid queryid TYPE set(100); | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason why we chose the specific number of granules for the index?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, higher granularity improves speed but increases memory consumption. In this case, there should be a balance between performance and memory consumption. Feel free to suggest a different value.
| @@ -0,0 +1 @@ | |||
| OPTIMIZE TABLE metrics FINAL; | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's drop it because we are not able to reliably test it.
| @@ -0,0 +1 @@ | |||
| ALTER TABLE metrics ADD INDEX idx_metrics_period_start period_start TYPE minmax; | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ALTER TABLE metrics ADD INDEX idx_metrics_period_start period_start TYPE minmax; | |
| ALTER TABLE metrics ADD INDEX IF NOT EXISTS idx_metrics_period_start period_start TYPE minmax; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All migrations are applied from scratch, so it should not exists, but lets discuss this. Same for other migrations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is always a good approach to make changes idempotent. It really costs nothing but prevents from possible errors
| @@ -0,0 +1 @@ | |||
| ALTER TABLE metrics ADD INDEX idx_metrics_service_id service_id TYPE set(100); | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ALTER TABLE metrics ADD INDEX idx_metrics_service_id service_id TYPE set(100); | |
| ALTER TABLE metrics ADD INDEX IF NOT EXISTS idx_metrics_service_id service_id TYPE set(100); |
| @@ -0,0 +1 @@ | |||
| ALTER TABLE metrics ADD INDEX idx_metrics_queryid queryid TYPE set(100); | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ALTER TABLE metrics ADD INDEX idx_metrics_queryid queryid TYPE set(100); | |
| ALTER TABLE metrics ADD INDEX IF NOT EXISTS idx_metrics_queryid queryid TYPE set(100); |
| @@ -0,0 +1 @@ | |||
| ALTER TABLE metrics DROP INDEX idx_metrics_queryid; | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ALTER TABLE metrics DROP INDEX idx_metrics_queryid; | |
| ALTER TABLE metrics DROP INDEX IF EXISTS idx_metrics_queryid; |
| @@ -0,0 +1 @@ | |||
| ALTER TABLE metrics DROP INDEX idx_metrics_period_start; | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ALTER TABLE metrics DROP INDEX idx_metrics_period_start; | |
| ALTER TABLE metrics DROP INDEX IF EXISTS idx_metrics_period_start; |
| @@ -0,0 +1 @@ | |||
| ALTER TABLE metrics DROP INDEX idx_metrics_service_id; | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ALTER TABLE metrics DROP INDEX idx_metrics_service_id; | |
| ALTER TABLE metrics DROP INDEX IF EXISTS idx_metrics_service_id; |
| @@ -0,0 +1 @@ | |||
| ALTER TABLE metrics ADD INDEX idx_metrics_queryid queryid TYPE set(100); | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about materialising the index?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alex suggested to don’t apply the new indexes to existing data, since we currently can’t properly test the impact/safety of materializing them. See the discussion here: #4837 (comment)
Our data TTL is 30 days, but it looks like our query that drops old partitions doesn’t work reliably. I created a ticket for this: https://perconadev.atlassian.net/browse/PMM-14670
This definitely happens in HA, and it’s likely caused by our approach of taking the partition name, converting it to UInt32, and comparing it to a timestamp. I suspect this logic might be broken even without HA as well.
Because of that, my original idea was to run an OPTIMIZE to force merges (and indirectly rebuild indexes / clean up parts), see the suggested query here: 07ff5eb
but again, we cannot test it.
|
Creating index with
so for column
BUT! Everything shall be measured in order to make a final decision. |
PMM-14566
FB: Percona-Lab/pmm-submodules#4151
The interval was 5 years, and the instances were running for around 12 hours. Version 2 had a different (simpler) load because of different scripts on our instances. Locally, both endpoints took around 0.3 seconds, even when they were running for days under load.
Since version 2.44.1, we have added many new columns and additional logic there. PMM is also generally more resource-intensive. Because the metrics table is shared across all technologies, for example new MongoDB columns can also affect PostgreSQL and others.
Overall, it looks like adding indexes, fixing race conditions in HA, and other improvements in the current version helped reduce the average and bring it closer to the average in version 2.
No matter what I tried, I was not able to reproduce 12 seconds for getReport and 6 seconds for getFilter. From the HAR file attached by the customer, I can see that out of the total 12.1 seconds for getReport, the waiting time for the server response is only 6.28 seconds, while content download takes almost the other half. See the attached image.

So, in summary, there might be an issue with the network or with system resources. Let’s have the customer update to 3.6.0 and see how it differs for them.
Originally, I wanted to create a specialized view in scope of this PR to improve performance and problem with non related fields for specific technology, but it turned out to be a much larger task. It would require creating multiple views specialized for specific actions/technologies. A separate ticket would probably need to be created if we decide to go this way.