Skip to content

[BUG] OT-CONTAINER-KIT/redis-operator: applying non-existing certificate of TLS to Redis is not rejected by the operator #378

@songlkkevin

Description

@songlkkevin

What happened?

Why did Acto raise this alarm?

It's a misoperation, and we can find this message in the alarm file.statefulset: test-cluster-follower replicas [3] ready_replicas [2], test-cluster-leader replicas [3] ready_replicas [2], pod: test-cluster-follower-2, test-cluster-leader-2

What happened in the state transition?

  1. Deploy a simple Redis cluster using the following YAML file
apiVersion: redis.redis.opstreelabs.in/v1beta1
kind: RedisCluster
metadata:
  name: test-cluster
spec:
  clusterSize: 3
  kubernetesConfig:
    image: quay.io/opstree/redis:v6.2.5
    imagePullPolicy: IfNotPresent
    resources:
      limits:
        cpu: 101m
        memory: 128Mi
      requests:
        cpu: 101m
        memory: 128Mi
  redisExporter:
    enabled: true
    image: quay.io/opstree/redis-exporter:1.0
    imagePullPolicy: IfNotPresent
    resources:
      limits:
        cpu: 100m
        memory: 128Mi
      requests:
        cpu: 100m
        memory: 128Mi
  storage:
    volumeClaimTemplate:
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 1Gi
  1. Add certificate to Redis by applying the following yaml file
kind: RedisCluster
metadata:
  name: test-cluster
spec:
  TLS:
    secret:
      secretName: ACTOKEY
  clusterSize: 3
  kubernetesConfig:
    image: quay.io/opstree/redis:v6.2.5
    imagePullPolicy: IfNotPresent
    resources:
      limits:
        cpu: 101m
        memory: 128Mi
      requests:
        cpu: 101m
        memory: 128Mi
  redisExporter:
    enabled: true
    image: quay.io/opstree/redis-exporter:1.0
    imagePullPolicy: IfNotPresent
    resources:
      limits:
        cpu: 100m
        memory: 128Mi
      requests:
        cpu: 100m
        memory: 128Mi
  storage:
    volumeClaimTemplate:
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 1Gi

we can find an error event issued by the Pod with the message:MountVolume.SetUp failed for volume "tls-certs" : secret "ACTOKEY" not found"

What did you expect to happen?

The operator needs to reject this erroneous desired state.

Root Cause

The root cause is that the desired TLS.secret.secretName cannot be satisfied in the current cluster state. The redis-operator fails to reject the erroneous desired state and updates the Redis cluster with the unsatisfiable TLS.secret.secretName rule, causing the cluster to lose one replica.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions