Skip to content

ClusterPolicy fails with "missing required field defaultRuntime" when installed via ArgoCD #1986

@taejune

Description

@taejune

Describe the bug
hen installing GPU Operator via ArgoCD, the ClusterPolicy creation fails with the following error:

ClusterPolicy.spec.operator missing required field "defaultRuntime"

However, the same chart installs successfully when using helm install directly.

Environment

  • GPU Operator Version: [e.g., v25.3.4]
  • Kubernetes Version: [e.g., v1.33.5]
  • Installation Method: ArgoCD (Helm chart)
  • Container Runtime: containerd

Root Cause Analysis

1. Missing Template Rendering

The Helm chart template (templates/clusterpolicy.yaml) does not render the defaultRuntime field from values:

# Current template
spec:
  operator:
    {{- if .Values.operator.runtimeClass }}
    runtimeClass: {{ .Values.operator.runtimeClass }}
    {{- end }}
    {{- if .Values.operator.defaultGPUMode }}
    defaultGPUMode: {{ .Values.operator.defaultGPUMode }}
    {{- end }}
    # ❌ No defaultRuntime rendering!

Verification:

helm template gpu-operator nvidia/gpu-operator --version v24.9.0 | grep -A 20 "kind: ClusterPolicy"
# Result: No defaultRuntime field in the rendered manifest

2. CRD Schema

The CRD defines defaultRuntime with a default value:

defaultRuntime:
  type: string
  default: docker
  enum:
    - docker
    - crio
    - containerd

And it appears to be required (either explicitly or implicitly through schema validation).

3. Why Helm Install Works But ArgoCD Fails

Helm Direct Install (Client-Side Apply):

  1. Helm renders manifest without defaultRuntime
  2. kubectl applies using client-side apply
  3. API Server performs defaulting before/during required validation
  4. Default value docker is applied automatically
  5. ✅ Success

ArgoCD Install (Server-Side Apply):

  1. ArgoCD renders manifest without defaultRuntime
  2. ArgoCD applies using server-side apply (default behavior)
  3. Server-side apply performs stricter validation
  4. Required field check happens before defaulting can occur
  5. ❌ Fails with "missing required field"

This is a known Kubernetes behavior where server-side apply is more strict about required fields than client-side apply.

Steps to Reproduce

  1. Install ArgoCD in a cluster
  2. Create an ArgoCD Application:
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: gpu-operator
  namespace: argocd
spec:
  project: default
  source:
    chart: gpu-operator
    repoURL: https://helm.ngc.nvidia.com/nvidia
    targetRevision: v24.9.0
    helm:
      values: |
        driver:
          enabled: true
  destination:
    server: https://kubernetes.default.svc
    namespace: gpu-operator
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
      - CreateNamespace=true
  1. Sync the application
  2. Observe the error: ClusterPolicy.spec.operator missing required field "defaultRuntime"

Current Workaround

Users must explicitly set the value in ArgoCD Application:

helm:
  values: |
    operator:
      defaultRuntime: containerd

However, this doesn't actually work because the template doesn't render it!

Alternative workaround - disable server-side apply:

syncPolicy:
  syncOptions:
    - ServerSideApply=false

Proposed Solution

Fix 1: Add defaultRuntime to Template (Recommended)

Update templates/clusterpolicy.yaml:

spec:
  operator:
    {{- if .Values.operator.defaultRuntime }}
    defaultRuntime: {{ .Values.operator.defaultRuntime }}
    {{- end }}
    {{- if .Values.operator.runtimeClass }}
    runtimeClass: {{ .Values.operator.runtimeClass }}
    {{- end }}

And ensure values.yaml has a default:

operator:
  defaultRuntime: docker  # or detect from cluster

Fix 2: Remove Required Constraint from CRD

If defaulting should handle this, consider making the field optional in the CRD and relying on the default value.

Fix 3: Add Mutating Webhook

Implement a mutating admission webhook to inject the default value before validation occurs.

Expected Behavior

GPU Operator should install successfully via ArgoCD without requiring users to:

  1. Explicitly set defaultRuntime in values (when template doesn't render it)
  2. Disable server-side apply
  3. Use workarounds

Additional Context

This issue affects all GitOps tools that use server-side apply by default (ArgoCD, Flux, etc.).

The combination of:

  • CRD with required + default fields
  • Helm template not rendering the field
  • Server-side apply's strict validation

Creates an incompatibility that only manifests in GitOps scenarios.

Related Issues

Suggested Priority

High - This breaks GPU Operator installation for all ArgoCD/GitOps users, which is a common deployment pattern in production environments.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions