Posted on :: 560 Words :: Source Code
This post was orignially written as a guest blog somewhere else. It is mirrored here for safe keeping. It has not been modifier, so broken links are still broken etc.
https://blog.winter-software.com/2024/03/10/migrating-pvs-in-kubernetes

Migrating PVs in Kubernetes

In kubernetes, data storage is provided by so called persistent volumes. Kubernetes itself contains a bunch of different options out of the box, notably HostPath and NFS.

HostPath has the advantage of being incredibly simple. You just take a directory from your host, and make it available in a pod. But, because it is host specific, that does not work very well if you have more than one host and want to be able to tolerate host failures.

NFS has the advantage of being standard, to the extend that every NAS or storage solution from the last 20 years supports it. But, perfomance often isn't great. And, unless you are using NFSv4, NFS has some challenges surrounding things like file locking. But those arent important for anything exept rare cases like databases...

Kubernetes supports having multiple types of PVs, and groups them together in StorageClasses. When requesting a PV from kubernetes (using a pvc), you can specify which storageclass you want. If you leave it empty, kubernetes uses the default one.

If you later want to change the storage class, your data needs to be copied from the old volume to the new one. In a perfect world, kubernetes would to that transparently for you, but as of this writing it does not.

So, lets think about how we can do such a migration.

Application based

Assuming our application supports this, we could just use the applications replication function. e.g. if our application is PostgresSQL, we could deploy a new postgres server with the new storage class, and just replicate all the data over.

But, because thats not specific to kubernetes, we will not look at this further.

Using backups

Assuming we have adequate backups, we could use these. Just stop our application, take a backup, and restore it to a different PV.

E.g. when using velero0, you just add a modifier for the storage class to the restore job.

https://velero.io/docs/v1.9/restore-reference/#changing-pvpvc-storage-classes

Doing it by hand

Now its getting interesting.

What we basically want is to copy some files from one volume to another volume. So basically, we want a container that copies files from a to b.

So, lets just create a container with that mounts these two PVs.

Its gonna be easy, right? kubectl run should just be able to mount a pv into a container, right?

1, 2

No, that would be to easy.

When we want to mount a pv into a kubectl run pod, we need to specify that by manually writing json specs for the pod.

Which basically looks like this:

$ kubectl run -i --rm --tty arch --overrides='
{
  "apiVersion": "batch/v1",
  "spec": {
    "template": {
      "spec": {
        "containers": [
          {
            "name": "arch",
            "image": "arch:latest",
            "args": [
              "bash"
            ],
            "stdin": true,
            "stdinOnce": true,
            "tty": true,
            "volumeMounts": [{
              "mountPath": "/pva",
              "name": "the_pv"
            },{
              "mountPath": "/pvb",
              "name": "the_other_pv"
            }]
          }
        ],
        "volumes": [{
          "name":"the_pv",
          "persistentVolumeClaim":{
          	"claimName": "the_pv_id"
          }
        },{
          "name":"the_other_pv",
          "persistentVolumeClaim":{
          	"claimName": "the_other_pv_id"
          }
        }        ]
      }
    }
  }
}
' --restart=Never -- bash

While it works, its also kindoff unwieldy.

We now could just juse rsync or something to move the data to the new volume.

Using pv-migrate

Someone else had thís problem before, and made a tool to make this easier. Its called pv-migrate.

$ pv-migrate migrate old-pvc new-pvc

Its found at https://github.com/utkuozdemir/pv-migrate/