[feature request] Global validateHealth for Managed Resources

## Feature Request: Global validateHealth for Managed Resources

**Is your feature request related to a problem? Please describe.**

Currently, `validateHealth` checks can only be defined within individual ClusterProfiles. This creates an observability gap between a successful Helm chart deployment (indicated by the `Provisioned` status in ClusterSummary) and having actually healthy resources running as expected.

This per-profile approach is error-prone because:
- Users may forget to define health checks for resources they deploy
- Health validation logic must be duplicated across multiple ClusterProfiles
- There's no centralized way to ensure consistent health monitoring across all managed resources

Essentially, we lack an ArgoCD-style health status that automatically tracks whether deployed workloads (Deployments, StatefulSets, DaemonSets, etc.) are actually healthy.

**Describe the solution you'd like**

Allow Sveltos to automatically detect the resources it manages and let users define `validateHealth` rules that match on resource types globally, rather than per-ClusterProfile.

The desired behavior would be:
1. Sveltos tracks all resources it deploys across ClusterProfiles
2. Users can define global health validation rules that apply to specific resource types (e.g., all Deployments, all StatefulSets)
3. These global rules automatically apply to matching resources deployed by any ClusterProfile
4. Health status is exposed in a way that can be used for metrics, alerting, and Grafana dashboards

This would provide a single source of truth for workload health without requiring users to remember to configure `validateHealth` in every profile.

**Describe alternatives you've considered**

- **ClusterHealthCheck/HealthCheck CRDs**: These can monitor resources and send notifications when health state changes. However, they are independent of ClusterProfile deployment lifecycle and don't gate the progression of dependent profiles. They're designed for notifications rather than deployment status tracking.

- **Manual validateHealth per ClusterProfile**: The current approach works but doesn't scale well and is prone to configuration drift and human error.

**Additional context**

The goal is to enable:
- Metrics emission for health status (for alerting and Grafana dashboards)
- Consistent health monitoring without per-profile configuration overhead
- Similar UX to ArgoCD's automatic health assessment for common Kubernetes resource types

This feature would significantly improve observability for fleet management use cases where many ClusterProfiles deploy similar resource types across multiple clusters.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feature request] Global validateHealth for Managed Resources #1598

Feature Request: Global validateHealth for Managed Resources

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[feature request] Global validateHealth for Managed Resources #1598

Description

Feature Request: Global validateHealth for Managed Resources

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions