Wednesday, January 22, 2025

Change storage class in a running StateFull Set (sts)

Change the storage class of a running STS in k8s

K8s doesn’t allow to change the StorageClass for an active StateFull Set.

However there are times when this is needed.

This article explains how to change the storage class for a running sts.

For this will use

  • sts workload
  • sc (currently in use) default
  • sc (we want to change to) premium

As environment I’m using Azure Kubernetes Service (AKS)

Storage classes

Default storage class using Standard SDD with ZRS (zonal replication).

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
  labels:
    addonmanager.kubernetes.io/mode: EnsureExists
    kubernetes.io/cluster-service: "true"
  name: default
parameters:
  skuname: StandardSSD_ZRS
provisioner: disk.csi.azure.com
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true

Premium storage class using Premium SSD

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  labels:
    addonmanager.kubernetes.io/mode: EnsureExists
    kubernetes.io/cluster-service: "true"
  name: managed-csi-premium
parameters:
  skuname: Premium_ZRS
provisioner: disk.csi.azure.com
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer      
allowVolumeExpansion: true

The two classes above are created by AKS, in case more control over the IOPS is needed a custom class will do

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
   name: premium2-disk-sc
parameters:
# fine tune
   cachingMode: None
   skuName: PremiumV2_LRS
   DiskIOPSReadWrite: "4000"
   DiskMBpsReadWrite: "1000"
provisioner: disk.csi.azure.com
reclaimPolicy: Delete
volumeBindingMode: Immediate  # note - creates a volume when a PVC is present.
allowVolumeExpansion: true

Current state

Manifests

Deployment of two nginx pods with default storage class under the namespace sts-change-sc.

---
apiVersion: v1
kind: Namespace
metadata:
  name: sts-change-sc
---
apiVersion: v1
kind: Service
metadata:
  name: nginx
  labels:
    app: nginx
  namespace: sts-change-sc
spec:
  ports:
  - port: 80
    name: web
  clusterIP: None
  selector:
    app: nginx
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: web
  namespace: sts-change-sc
spec:
  selector:
    matchLabels:
      app: nginx # has to match .spec.template.metadata.labels
  serviceName: "nginx"
  replicas: 3 # by default is 1
  minReadySeconds: 10 # by default is 0
  template:
    metadata:
      labels:
        app: nginx # has to match .spec.selector.matchLabels
    spec:
      terminationGracePeriodSeconds: 10
      containers:
      - name: nginx
        image: registry.k8s.io/nginx-slim:0.24
        ports:
        - containerPort: 80
          name: web
        volumeMounts:
        - name: www
          mountPath: /usr/share/nginx/html
  volumeClaimTemplates:
  - metadata:
      name: www
    spec:
      accessModes: [ "ReadWriteOnce" ]
# initial storage class
      storageClassName: "default"  # technically not needed specified for this demo.
      resources:
        requests:
          storage: 1Gi

Volumes

$ kubectl get pv -n sts-change-sc| grep web
pvc-87813cdf-2f48-4e4e-b121-8ba2bfa04aca  1Gi RWO  Delete Bound  sts-change-sc/www-web-0  default  <unset>  28m
pvc-ad285352-93bc-422c-be14-3628482fa9ba  1Gi RWO  Delete Bound  sts-change-sc/www-web-1  default  <unset>  119s

PVC

$ kubectl get pvc -n sts-change-sc| grep web
www-web-0  Bound  pvc-87813cdf-2f48-4e4e-b121-8ba2bfa04aca  1Gi  RWO  default  <unset>  29m
www-web-1  Bound  pvc-ad285352-93bc-422c-be14-3628482fa9ba  1Gi  RWO  default  <unset>  2m8s

STS

$ kubectl get sts -n sts-change-sc
NAME   READY   AGE
web    2/2     6m4s

SC used by STS

$ kubectl get sts -n sts-change-sc -o json | jq .items[].spec.volumeClaimTemplates[].spec.storageClassName
"default"

Change storage class used by sts

Changing from default to managed-csi-premium.

Manifests

---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: web
  namespace: sts-change-sc
spec:
  selector:
    matchLabels:
      app: nginx # has to match .spec.template.metadata.labels
  serviceName: "nginx"
  replicas: 3 # by default is 1
  minReadySeconds: 10 # by default is 0
  template:
    metadata:
      labels:
        app: nginx # has to match .spec.selector.matchLabels
    spec:
      terminationGracePeriodSeconds: 10
      containers:
      - name: nginx
        image: registry.k8s.io/nginx-slim:0.24
        ports:
        - containerPort: 80
          name: web
        volumeMounts:
        - name: www
          mountPath: /usr/share/nginx/html
  volumeClaimTemplates:
  - metadata:
      name: www
    spec:
      accessModes: [ "ReadWriteOnce" ]
# changed storage class
      storageClassName: "managed-csi-premium"
      resources:
        requests:
          storage: 1Gi

Try to apply the updated sts

$ kubectl apply -f changed.yaml -n sts-change-sc
The StatefulSet "web" is invalid: spec: Forbidden: updates to statefulset spec for fields other than 'replicas', 'ordinals', 'template', 'updateStrategy', 'persistentVolumeClaimRetentionPolicy' and 'minReadySeconds' are forbidden

Since that didn’t work we need to do the following

  1. Delete the sts with the --cascade=orphan option.
  2. Apply the sts manifest with the changed storage class "managed-csi-premium".
  3. Delete one by one each pod from sts and its pvc (cascade delete its referrenced pv).

Applying this looks like

Step 1

$ kubectl delete sts/web --cascade=orphan -n sts-change-sc
statefulset.apps "web" deleted

At this point the pods are still running but the sts is gone. Let’s see the pods.

$ kubectl get pods -n sts-change-sc
NAME    READY   STATUS    RESTARTS   AGE
web-0   1/1     Running   0          24m
web-1   1/1     Running   0          24m

Step 2

Apply the changed manifest.

$ kubectl apply -f changed.yaml -n sts-change-sc
statefulset.apps/web created

Since the initial pods are still running no new pods are created by the updated sts.

$ kubectl get pods -n sts-change-sc
NAME    READY   STATUS    RESTARTS   AGE
web-0   1/1     Running   0          28m
web-1   1/1     Running   0          28m

Step 3

Replacing one by one each pod, starting with web-0.

First you need its pvc so it can be deleted.

$ kubectl get pvc -n sts-change-sc| grep web-0
www-web-0   Bound    pvc-87813cdf-2f48-4e4e-b121-8ba2bfa04aca   1Gi        RWO            default        <unset>                 56m

As you can see its status is Bound so it will not be deleted unless the pod is deleted.

$ kubectl delete pvc/www-web-0 -n sts-change-sc
persistentvolumeclaim "www-web-0" deleted

$ kubectl get pvc -n sts-change-sc
NAME        STATUS        VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   VOLUMEATTRIBUTESCLASS   AGE
www-web-0   Terminating   pvc-87813cdf-2f48-4e4e-b121-8ba2bfa04aca   1Gi        RWO            default        <unset>                 63m
www-web-1   Bound         pvc-ad285352-93bc-422c-be14-3628482fa9ba   1Gi        RWO            default        <unset>                 36m

In the case that a finalizer is attached to the pvc, this case has one, you can see it with

k describe pvc/www-web-0 -n sts-change-sc |grep Finalizers
Finalizers:    [kubernetes.io/pvc-protection]

So you need to delete the pod, as k8s is protecting the pvc deletion kubectl delete pvc/www-web-0 -n sts-change-sc and the command never returns.

So will delete the pod as well.

$ kubectl delete pod/web-0 -n sts-change-sc
pod "web-0" deleted

At this point the initial delete of the pvc is finished and since the pod web-0 is under the control of the updated sts it will be recreated with the updated storage class.

List the pods

$ kubectl get pods -n sts-change-sc
NAME    READY   STATUS    RESTARTS   AGE
web-0   1/1     Running   0          88s  ### NEW
web-1   1/1     Running   0          47m

The newly created pod also created a new pvc and volumes under the new storage class.

$ kubectl get pvc -n sts-change-sc| grep web-0
www-web-0  Bound  pvc-40de012d-98d9-4f76-810f-7bcfb2970b37  1Gi RWO managed-csi-premium  <unset>  2m15s

Continue with the pod rotation until all pods are using the updated storage class.

NOTE if you don’t delete the pvc and skip to just the pod delete this will not work for example

$ kubectl delete pod/web-1 -n sts-change-sc
pod "web-1" deleted

$ kubectl get pvc -n sts-change-sc| grep web-1
www-web-1   Bound    pvc-ad285352-93bc-422c-be14-3628482fa9ba   1Gi        RWO            default               <unset>                 55m

So the initial class default is still there.

Final state

Looking again at resources

$ kubectl get pvc -n sts-change-sc
NAME        STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS          VOLUMEATTRIBUTESCLASS   AGE
www-web-0   Bound    pvc-40de012d-98d9-4f76-810f-7bcfb2970b37   1Gi        RWO            managed-csi-premium   <unset>                 27m
www-web-1   Bound    pvc-21600a0f-85ff-4d06-9a60-9958a45177ff   1Gi        RWO            managed-csi-premium   <unset>                 9s

Sts storage class

$ kubectl get sts -n sts-change-sc -o json | jq .items[].spec.volumeClaimTemplates[].spec.storageClassName
"managed-csi-premium"

Final notes

I’ve been using the images provided by registry.k8s.io which is the official registry used by k8s these days.

To note that is not public web interface that you can use to list images and their tags you can use

curl -sL "https://registry.k8s.io/v2/tags/list" | jq .

ps: an useful tool to further inspect images is gcrane

In the case where the finalizer is causing any issues it can be changed kubectl patch pvc my-pvc -p '{"metadata":{"finalizers":null}}' before deletion this will have the effect that when deleting a pvc will return immediatly.