-
Notifications
You must be signed in to change notification settings - Fork 147
Split metrics documentation table by source #1025
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
✅ Deploy Preview for crossplane ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
01af529 to
27e0e50
Compare
content/master/guides/metrics.md
Outdated
|
|
||
| ## Upjet provider metrics | ||
|
|
||
| These metrics are only emitted by Upjet-based providers (such as provider-aws, provider-azure, provider-gcp). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
provider-upjet-aws, provider-upjet-azure, provider-upjet-gcp, wonder if we want to use the repo links here
| prometheus.io/scrape: "true" | ||
| ``` | ||
| ## Crossplane core metrics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at the beginning of the document we describe how to enable metrics in crossplane core and adding prometheus specifics to enable the scraping
wonder if it makes sense to describe the same for the providers - per default we have the http-prom port and you need a PodMonitor or the prometheus annotations..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that does sound useful - looks like we do have a bit of that already in the provider section, e.g.:
Providers expose metrics on the
metricsport (default8080). To scrape these metrics, configure aPodMonitoror add Prometheus annotations to the provider'sDeploymentRuntimeConfig.
27e0e50 to
510678d
Compare
|
thanks for make this more clear - my approval is without binding we need @jbw976 for an additional look |
jbw976
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome @kaessert, thank you for taking the initiative to make these metrics easier to understand! I do have a couple comments for potential further improvements, let me know what you think of them. Feel free to ping directly so we can iterate fast too 😇
content/master/guides/metrics.md
Outdated
| These metrics are emitted by the Crossplane pod itself. | ||
| {{< table "table table-hover table-striped table-sm">}} | ||
| | Metric Name | Description | Further Explanation | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the "Further Explanation" column has no entries in it, so it's just an empty column and it causes some values from the previous column to wrap unnecessarily - should the column header just be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're right, i dropped it :)
content/master/guides/metrics.md
Outdated
| Providers expose metrics on the `metrics` port (default `8080`). To scrape these metrics, configure a `PodMonitor` or add Prometheus annotations to the provider's `DeploymentRuntimeConfig`. | ||
|
|
||
| {{< table "table table-hover table-striped table-sm">}} | ||
| | Metric Name | Description | Further Explanation | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here, the "further explanation" column is empty and can be removed...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
content/master/guides/metrics.md
Outdated
| These metrics come from the controller-runtime framework and Kubernetes client libraries. They are emitted by both Crossplane and providers. | ||
|
|
||
| {{< table "table table-hover table-striped table-sm">}} | ||
| | Metric Name | Description | Further Explanation | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, even in a table that is using the "further explanation" column, i'm a bit skeptical on the value it's providing. It seems like most of the information there could just be included in the description column. What do you think, do you believe a dedicated column for "further explanation" is the right approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merged into description
| prometheus.io/scrape: "true" | ||
| ``` | ||
| ## Crossplane core metrics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that does sound useful - looks like we do have a bit of that already in the provider section, e.g.:
Providers expose metrics on the
metricsport (default8080). To scrape these metrics, configure aPodMonitoror add Prometheus annotations to the provider'sDeploymentRuntimeConfig.
content/master/guides/metrics.md
Outdated
| | {{<hover label="circuit_breaker_opens_total" line="6">}}circuit_breaker_opens_total{{</hover>}} | Number of times the XR circuit breaker transitioned from closed to open | | | ||
| | {{<hover label="circuit_breaker_closes_total" line="7">}}circuit_breaker_closes_total{{</hover>}} | Number of times the XR circuit breaker transitioned from open to closed | | | ||
| | {{<hover label="circuit_breaker_events_total" line="8">}}circuit_breaker_events_total{{</hover>}} | Number of XR watch events handled by the circuit breaker, labelled by outcome | | | ||
| | {{<hover label="engine_controllers_started_total" line="9">}}engine_controllers_started_total{{</hover>}} | Total number of controllers started | | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good find on getting these names correct! i screwed this up in the release notes for v2.0, so i just retroactively went back and updated that now. Thank you! 😇
https://github.com/crossplane/crossplane/releases/tag/v2.0.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Welcome 😊
content/master/guides/metrics.md
Outdated
| | {{<hover label="function_run_function_request_total" line="1">}}function_run_function_request_total{{</hover>}} | Total number of RunFunctionRequests sent | | | ||
| | {{<hover label="function_run_function_response_total" line="2">}}function_run_function_response_total{{</hover>}} | Total number of RunFunctionResponses received | | | ||
| | {{<hover label="function_run_function_seconds" line="3">}}function_run_function_seconds{{</hover>}} | Histogram of RunFunctionResponse latency (seconds) | | | ||
| | {{<hover label="function_run_function_response_cache_hits_total" line="4">}}function_run_function_response_cache_hits_total{{</hover>}} | Total number of RunFunctionResponse cache hits | | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are some more cache related metrics not listed here but are in the code in https://github.com/crossplane/crossplane/blob/release-2.1/internal/xfn/cached/cached_runner_metrics.go#L76-L134 - consider adding those too. did they show up for your in your testing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
content/master/guides/metrics.md
Outdated
| | {{<hover label="controller_runtime_active_workers" line="3">}}controller_runtime_active_workers{{</hover>}} | Number of used workers per controller | The number of threads processing jobs from the work queue. | | ||
| | {{<hover label="controller_runtime_max_concurrent_reconciles" line="4">}}controller_runtime_max_concurrent_reconciles{{</hover>}} | Maximum number of concurrent reconciles per controller | Describes how reconciles can happen in parallel. | | ||
| | {{<hover label="controller_runtime_reconcile_errors_total" line="5">}}controller_runtime_reconcile_errors_total{{</hover>}} | Total number of reconciliation errors per controller | A counter that counts reconcile errors. Sharp or non stop rising of this metric might be a problem. | | ||
| | {{<hover label="controller_runtime_reconcile_time_seconds_bucket" line="6">}}controller_runtime_reconcile_time_seconds_bucket{{</hover>}} | Length of time per reconciliation per controller | | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you want to remove these _bucket suffixes also, to match what was in the PR body? (there's like 4 of them I think)
Removed _bucket suffix from histogram metric names (added by Prometheus)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! Removed
|
@jbw976 |
905ca02 to
39bc1b0
Compare
yeah for sure, it wouldn't hurt to mention it in the top section as well! |
jbw976
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome @kaessert, this is looking really good now! just another comment though on the older versions and how best we can approach those. What do you think?
| prometheus.io/scrape: "true" | ||
| ``` | ||
| ## Crossplane core metrics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
come to think of it, i don't think we should update old versions like v1.20 and v2.0-preview, because a lot of these values had different names or didn't even exist in those previous versions. but let's also try not to overcomplicate it either, so what do you think of the following?
- don't bother updating
v2.0-preview, that release is a bit of a one-off and those docs aren't even exposed in the version picker anymore - don't bother updating
v1.20, unless you really want to get every old name correct forcrossplane_composition_watches_*etc. as shown in https://github.com/crossplane/crossplane/releases/tag/v2.0.0 - remove
circuit_breaker_*metrics fromv2.0, those were added inv2.1 - keep
v2.1andmasteras is 🎉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great points!
v2.0-previewchanges i threw away- put an effort to get every
v1.20old name correct -> we already went too far for shortcuts 😅 circuit_breaker_*was removed from everywhere exceptv2.1v2.1master kept as is ;)
|
oh also looks like there are a number of Vale issues to fix for this PR too 😇 thanks for continuing to make progress @kaessert! |
39bc1b0 to
9e7eecf
Compare
The metrics documentation previously mixed metrics from different sources in a single table, making it unclear where each metric originates. Split into four sections: - Crossplane core metrics: function runner, response cache, circuit breaker, and engine metrics emitted by the Crossplane pod - Provider metrics: crossplane_managed_resource_* metrics from crossplane-runtime, emitted by all providers - Upjet provider metrics: upjet_resource_* metrics only from Upjet-based providers (provider-aws, provider-azure, provider-gcp) - Controller-runtime and Kubernetes client metrics: external dependency metrics emitted by both Crossplane and providers Additional changes: - Fixed metric name composition_run_function_seconds to function_run_function_seconds (matching actual code) - Added missing metrics: cache metrics, engine metrics, crossplane_managed_resource_drift_seconds - Added missing upjet metrics: cli_duration, active_cli_invocations, running_processes - Removed _bucket suffix from histogram metric names (added by Prometheus) Applied to all doc versions: v1.20, v2.0-preview, v2.0, v2.1, master Signed-off-by: Tobias Kässer <tobias.kasser@upbound.io>
9e7eecf to
a46d215
Compare
|
@jbw976 I think we should be good now. Everything should be addressed and vale is passing for me at least locally too :) |
jbw976
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we already went too far for shortcuts
i love this sentiment 😂
this looks great to me @kaessert, thanks for getting this to the finish line!! 🏁

The metrics documentation previously mixed metrics from different sources in a single table, making it unclear where each metric originates.
Split into four sections:
Additional changes:
Applied to all doc versions: v1.20, v2.0-preview, v2.0, v2.1, master