-
Notifications
You must be signed in to change notification settings - Fork 40
Description
Feature Request: Global validateHealth for Managed Resources
Is your feature request related to a problem? Please describe.
Currently, validateHealth checks can only be defined within individual ClusterProfiles. This creates an observability gap between a successful Helm chart deployment (indicated by the Provisioned status in ClusterSummary) and having actually healthy resources running as expected.
This per-profile approach is error-prone because:
- Users may forget to define health checks for resources they deploy
- Health validation logic must be duplicated across multiple ClusterProfiles
- There's no centralized way to ensure consistent health monitoring across all managed resources
Essentially, we lack an ArgoCD-style health status that automatically tracks whether deployed workloads (Deployments, StatefulSets, DaemonSets, etc.) are actually healthy.
Describe the solution you'd like
Allow Sveltos to automatically detect the resources it manages and let users define validateHealth rules that match on resource types globally, rather than per-ClusterProfile.
The desired behavior would be:
- Sveltos tracks all resources it deploys across ClusterProfiles
- Users can define global health validation rules that apply to specific resource types (e.g., all Deployments, all StatefulSets)
- These global rules automatically apply to matching resources deployed by any ClusterProfile
- Health status is exposed in a way that can be used for metrics, alerting, and Grafana dashboards
This would provide a single source of truth for workload health without requiring users to remember to configure validateHealth in every profile.
Describe alternatives you've considered
-
ClusterHealthCheck/HealthCheck CRDs: These can monitor resources and send notifications when health state changes. However, they are independent of ClusterProfile deployment lifecycle and don't gate the progression of dependent profiles. They're designed for notifications rather than deployment status tracking.
-
Manual validateHealth per ClusterProfile: The current approach works but doesn't scale well and is prone to configuration drift and human error.
Additional context
The goal is to enable:
- Metrics emission for health status (for alerting and Grafana dashboards)
- Consistent health monitoring without per-profile configuration overhead
- Similar UX to ArgoCD's automatic health assessment for common Kubernetes resource types
This feature would significantly improve observability for fleet management use cases where many ClusterProfiles deploy similar resource types across multiple clusters.