Change the storage class of a running STS in k8s
K8s doesn’t allow to change the StorageClass for an active StateFull Set.
However there are times when this is needed.
This article explains how to change the storage class for a running
sts.
For this will use
- sts
workload
- sc (currently in use)
default
- sc (we want to change to)
premium
As environment I’m using Azure Kubernetes Service (AKS)
Storage classes
Default storage class using Standard SDD with ZRS (zonal replication).
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
annotations:
storageclass.kubernetes.io/is-default-class: "true"
labels:
addonmanager.kubernetes.io/mode: EnsureExists
kubernetes.io/cluster-service: "true"
name: default
parameters:
skuname: StandardSSD_ZRS
provisioner: disk.csi.azure.com
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
Premium storage class using Premium SSD
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
labels:
addonmanager.kubernetes.io/mode: EnsureExists
kubernetes.io/cluster-service: "true"
name: managed-csi-premium
parameters:
skuname: Premium_ZRS
provisioner: disk.csi.azure.com
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
The two classes above are created by AKS, in case more control over the IOPS is needed a custom class will do
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: premium2-disk-sc
parameters:
# fine tune
cachingMode: None
skuName: PremiumV2_LRS
DiskIOPSReadWrite: "4000"
DiskMBpsReadWrite: "1000"
provisioner: disk.csi.azure.com
reclaimPolicy: Delete
volumeBindingMode: Immediate # note - creates a volume when a PVC is present.
allowVolumeExpansion: true
Current state
Manifests
Deployment of two nginx pods with default
storage class under the namespace sts-change-sc
.
---
apiVersion: v1
kind: Namespace
metadata:
name: sts-change-sc
---
apiVersion: v1
kind: Service
metadata:
name: nginx
labels:
app: nginx
namespace: sts-change-sc
spec:
ports:
- port: 80
name: web
clusterIP: None
selector:
app: nginx
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: web
namespace: sts-change-sc
spec:
selector:
matchLabels:
app: nginx # has to match .spec.template.metadata.labels
serviceName: "nginx"
replicas: 3 # by default is 1
minReadySeconds: 10 # by default is 0
template:
metadata:
labels:
app: nginx # has to match .spec.selector.matchLabels
spec:
terminationGracePeriodSeconds: 10
containers:
- name: nginx
image: registry.k8s.io/nginx-slim:0.24
ports:
- containerPort: 80
name: web
volumeMounts:
- name: www
mountPath: /usr/share/nginx/html
volumeClaimTemplates:
- metadata:
name: www
spec:
accessModes: [ "ReadWriteOnce" ]
# initial storage class
storageClassName: "default" # technically not needed specified for this demo.
resources:
requests:
storage: 1Gi
Volumes
$ kubectl get pv -n sts-change-sc| grep web
pvc-87813cdf-2f48-4e4e-b121-8ba2bfa04aca 1Gi RWO Delete Bound sts-change-sc/www-web-0 default <unset> 28m
pvc-ad285352-93bc-422c-be14-3628482fa9ba 1Gi RWO Delete Bound sts-change-sc/www-web-1 default <unset> 119s
PVC
$ kubectl get pvc -n sts-change-sc| grep web
www-web-0 Bound pvc-87813cdf-2f48-4e4e-b121-8ba2bfa04aca 1Gi RWO default <unset> 29m
www-web-1 Bound pvc-ad285352-93bc-422c-be14-3628482fa9ba 1Gi RWO default <unset> 2m8s
STS
$ kubectl get sts -n sts-change-sc
NAME READY AGE
web 2/2 6m4s
SC used by STS
$ kubectl get sts -n sts-change-sc -o json | jq .items[].spec.volumeClaimTemplates[].spec.storageClassName
"default"
Change storage class used by sts
Changing from default
to managed-csi-premium
.
Manifests
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: web
namespace: sts-change-sc
spec:
selector:
matchLabels:
app: nginx # has to match .spec.template.metadata.labels
serviceName: "nginx"
replicas: 3 # by default is 1
minReadySeconds: 10 # by default is 0
template:
metadata:
labels:
app: nginx # has to match .spec.selector.matchLabels
spec:
terminationGracePeriodSeconds: 10
containers:
- name: nginx
image: registry.k8s.io/nginx-slim:0.24
ports:
- containerPort: 80
name: web
volumeMounts:
- name: www
mountPath: /usr/share/nginx/html
volumeClaimTemplates:
- metadata:
name: www
spec:
accessModes: [ "ReadWriteOnce" ]
# changed storage class
storageClassName: "managed-csi-premium"
resources:
requests:
storage: 1Gi
Try to apply the updated sts
$ kubectl apply -f changed.yaml -n sts-change-sc
The StatefulSet "web" is invalid: spec: Forbidden: updates to statefulset spec for fields other than 'replicas', 'ordinals', 'template', 'updateStrategy', 'persistentVolumeClaimRetentionPolicy' and 'minReadySeconds' are forbidden
Since that didn’t work we need to do the following
- Delete the sts with the
--cascade=orphan
option. - Apply the sts manifest with the changed storage class
"managed-csi-premium"
. - Delete one by one each pod from sts and its pvc (cascade delete its referrenced pv).
Applying this looks like
Step 1
$ kubectl delete sts/web --cascade=orphan -n sts-change-sc
statefulset.apps "web" deleted
At this point the pods are still running but the sts is gone. Let’s see the pods.
$ kubectl get pods -n sts-change-sc
NAME READY STATUS RESTARTS AGE
web-0 1/1 Running 0 24m
web-1 1/1 Running 0 24m
Step 2
Apply the changed manifest.
$ kubectl apply -f changed.yaml -n sts-change-sc
statefulset.apps/web created
Since the initial pods are still running no new pods are created by the updated sts.
$ kubectl get pods -n sts-change-sc
NAME READY STATUS RESTARTS AGE
web-0 1/1 Running 0 28m
web-1 1/1 Running 0 28m
Step 3
Replacing one by one each pod, starting with web-0
.
First you need its pvc so it can be deleted.
$ kubectl get pvc -n sts-change-sc| grep web-0
www-web-0 Bound pvc-87813cdf-2f48-4e4e-b121-8ba2bfa04aca 1Gi RWO default <unset> 56m
As you can see its status is Bound
so it will not be deleted unless the pod is deleted.
$ kubectl delete pvc/www-web-0 -n sts-change-sc
persistentvolumeclaim "www-web-0" deleted
$ kubectl get pvc -n sts-change-sc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS VOLUMEATTRIBUTESCLASS AGE
www-web-0 Terminating pvc-87813cdf-2f48-4e4e-b121-8ba2bfa04aca 1Gi RWO default <unset> 63m
www-web-1 Bound pvc-ad285352-93bc-422c-be14-3628482fa9ba 1Gi RWO default <unset> 36m
In the case that a finalizer is attached to the pvc, this case has one, you can see it with
k describe pvc/www-web-0 -n sts-change-sc |grep Finalizers
Finalizers: [kubernetes.io/pvc-protection]
So you need to delete the pod, as k8s is protecting the pvc deletion kubectl delete pvc/www-web-0 -n sts-change-sc
and the command never returns.
So will delete the pod as well.
$ kubectl delete pod/web-0 -n sts-change-sc
pod "web-0" deleted
At this point the initial delete of the pvc is finished and since the pod web-0
is under the control of the updated sts it will be recreated with the updated storage class.
List the pods
$ kubectl get pods -n sts-change-sc
NAME READY STATUS RESTARTS AGE
web-0 1/1 Running 0 88s ### NEW
web-1 1/1 Running 0 47m
The newly created pod also created a new pvc and volumes under the new storage class.
$ kubectl get pvc -n sts-change-sc| grep web-0
www-web-0 Bound pvc-40de012d-98d9-4f76-810f-7bcfb2970b37 1Gi RWO managed-csi-premium <unset> 2m15s
Continue with the pod rotation until all pods are using the updated storage class.
NOTE if you don’t delete the pvc and skip to just the pod delete this will not work for example
$ kubectl delete pod/web-1 -n sts-change-sc
pod "web-1" deleted
$ kubectl get pvc -n sts-change-sc| grep web-1
www-web-1 Bound pvc-ad285352-93bc-422c-be14-3628482fa9ba 1Gi RWO default <unset> 55m
So the initial class default
is still there.
Final state
Looking again at resources
$ kubectl get pvc -n sts-change-sc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS VOLUMEATTRIBUTESCLASS AGE
www-web-0 Bound pvc-40de012d-98d9-4f76-810f-7bcfb2970b37 1Gi RWO managed-csi-premium <unset> 27m
www-web-1 Bound pvc-21600a0f-85ff-4d06-9a60-9958a45177ff 1Gi RWO managed-csi-premium <unset> 9s
Sts storage class
$ kubectl get sts -n sts-change-sc -o json | jq .items[].spec.volumeClaimTemplates[].spec.storageClassName
"managed-csi-premium"
Final notes
I’ve been using the images provided by registry.k8s.io
which is the official registry used by k8s these days.
To note that is not public web interface that you can use to list images and their tags you can use
curl -sL "https://registry.k8s.io/v2/tags/list" | jq .
ps: an useful tool to further inspect images is gcrane
In the case where the finalizer is causing any issues it can be changed kubectl patch pvc my-pvc -p '{"metadata":{"finalizers":null}}'
before deletion this will have the effect that when deleting a pvc will return immediatly.