Chronicles of a Kubernetes Storage Adventure

by nick.schuch / 31 August 2017

This post chronicles our adventure of implementing (and migrating) our Kubernetes storage provisioning from FlexVolumes to Third Party Resources to Storage Classes.

Some quick notes before we begin:

We are running on AWS in the Asia Pacific region.
We have a high level API called “Skipper” which we use as a “man in the middle” for easing developer usage on the cluster.
We are an Ops team of 3.

FlexVolumes

When we first started hosting our sites on Kubernetes we did not have access to ideal “managed storage” options. The only options we
had available to us were:

EBS - Could be used for a single Pod deployment, locking us into a non high availability solution.
S3 - This is ok if you have 1 or 2 applications per dev team, but the effort required to migrate a large amount of applications was very daunting.
Roll our own - We are a small team and didn’t want to have to maintain our own solution.

Around this time FlexVolumes were added to Kubernetes.

FlexVolumes are a very low level entrypoint which allows for a large amount of control over the storage which is mounted on a node.

The options that FlexVolumes opened up for us were very exciting, so we decided to go with a sneaky fourth option, “fuse filesystems”.

We had a choice between 3 fuse filesystems:

S3FS - Stable, but very slow for our needs.
RioFS - Faster, less support.
Goofys - Fastest, but at the time it was experimental.

We chose RioFS. At the time it seemed like a good compromise.

The FlexVolume we implemented was awesome! We wrote it to auto provision S3 buckets and then shell out to RioFS to mount the volume.

Given we run our high-level "Skipper" API over the top of Kubernetes, this made rolling out the storage very easy.

Here is an example of a Deployment definition using our FlexVolume.

apiVersion: apps/v1beta1
kind: Deployment
metadata:
  namespace: test-project
  name: production
spec:
  replicas: 3
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: web
        image: nginx:latest
        volumeMounts:
        - mountPath: /var/www/files
          name: files
      volumes:
      - name: files
        flexVolume:
          driver: "pnx/volume"
          fsType: "fuse"
          options:
            name: "previousnext-test-project-files"

Over time we found that this solution was not viable, we ran into multiple random issues with replication across nodes and the dev team started to lose faith in the solution.

While I really liked the idea of a fuse mounted filesystem, we saw this as a means to an end, we were waiting for AWS EFS to launch in Australia.

If you are interested in the FlexVolume code, you can find it here:

https://github.com/previousnext/flexvolume-riofs

Third Party Resources

I now reference my “Kubernetes life” as “before” and “after” the AWS EFS launch in Sydney.

The first implementation we wrote for provisioning EFS volumes was done via a ThirdPartyResource which was comprised of 3 components:

Provisioner - Daemon to provision new AWS EFS resources.
Status - Daemon to monitor the AWS EFS API for changes to the state of the volumes.
CLI - Simple command line client for listing all the EFS Third Party Resources (cluster admin only)

With these components installed, admins were able to provision new AWS EFS resources with the following definition:

apiVersion: skpr.io/v1
kind: Efs
metadata:
  name: public-files
  namespace: test-project
spec:
  performance: "generalPurpose"
  region: ap-southeast-2
  securityGroup: sg-xxxxxxxx
  subnets:
  - subnet-xxxxxxxx
  - subnet-xxxxxxxx

We then integrated this ThirdPartyResource into our Skipper API, allowing our developers to automatically have EFS backed deployments.

At the time of writing this post we are managing 100 AWS EFS volumes with this implementation.

While this has worked great for us, we acknowledged that this approach would not be ideal for the Kubernetes community as developers would be required to have knowledge of the infrastructure the cluster is running on, such as:

Security Group
VPC Subnets
Region

We have now marked this implementation as v1.

Storage Classes

After reading through the Kubernetes Blog post, Dynamic Provisioning and Storage Classes in Kubernetes, we knew that this architecture was for us.

This is how I think of Storage Classes:

Developer submits a PersistentVolumeClaim which contains a StorageClass reference
StorageClass references a Provisioner
Provisioner creates our SourceVolumes and returns PersistentVolumeSource information (how to mount)
Developer references PersistentVolumeClaim in Pod definition

Not only has this approach allowed us to decouple our applications from the storage layer, but it also allowed us to move away from our ThirdPartyResource definition and command-line client, meaning less code to maintain!

So, how do you use our AWS EFS Storage Class?

First, we declare our Storage Classes. I think of these as aliases for our storage, allowing us to decouple the hosting provider from the deployment.

For example, a Storage Class named “general” could be provisioned by “EFS: General Purpose” on AWS or “Azure File Storage” on Microsoft Azure.

kind: StorageClass
apiVersion: storage.k8s.io/v1beta1
metadata:
  name: general
provisioner: efs.aws.skpr.io/generalPurpose
---
kind: StorageClass
apiVersion: storage.k8s.io/v1beta1
metadata:
  name: fast
provisioner: efs.aws.skpr.io/maxIO

Now that we have our Storage Classes declared, we need provisioners to do the work.

The provisioner was very easy to implement with the help of the Kubernetes incubator project External Storage.

To implement a provisioner with this library all you need to do is satisfy the interface functions Provision() and Delete().

The examples I used to bootstrap our provisioner can be found here:

Our implementation can be found here:

https://github.com/previousnext/k8s-aws-efs/tree/master/workspace/src/github.com/previousnext/k8s-aws-efs

To deploy the provisioners we used the following manifest file:

kind: Deployment
apiVersion: extensions/v1beta1
metadata:
  name: aws-efs-gp
  namespace: kube-system
spec:
  replicas: 1
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        app: aws-efs-gp
    spec:
      containers:
        - name: provisioner
          image: previousnext/k8s-aws-efs:2.0.0
          env:
            - name:  EFS_PERFORMANCE
              value: "generalPurpose"
            - name:  AWS_REGION
              value: "ap-southeast-2"
            - name:  AWS_SECURITY_GROUP
              value: "sg-xxxxxxxxx"
            - name:  AWS_SUBNETS
              value: "subnet-xxxxxx,subnet-xxxxxx"
---
kind: Deployment
apiVersion: extensions/v1beta1
metadata:
  name: aws-efs-max-io
  namespace: kube-system
spec:
  replicas: 1
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        app: aws-efs-max-io
    spec:
      containers:
        - name: provisioner
          image: previousnext/k8s-aws-efs:2.0.0
          env:
            - name:  EFS_PERFORMANCE
              value: "maxIO"
            - name:  AWS_REGION
              value: "ap-southeast-2"
            - name:  AWS_SECURITY_GROUP
              value: "sg-xxxxxxxxx"
            - name:  AWS_SUBNETS
              value: "subnet-xxxxxx,subnet-xxxxxx"

Now we can provision some storage!

In the following definition, we are requesting one of each Storage Class type.

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: files
  namespace: test-project
  annotations:
    volume.beta.kubernetes.io/storage-class: "general"
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 1Mi
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: files-fast
  namespace: test-project
  annotations:
    volume.beta.kubernetes.io/storage-class: "fast"
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 1Mi

We can inspect the status of these PersistentVolumeClaim objects with the following command:

$ kubectl -n test-project get pvc
NAME             STATUS    VOLUME        CAPACITY   ACCESSMODES   STORAGECLASS   AGE
files            Bound     fs-xxxxxxxx   8E         RWX           general        5m
files-fast       Bound     fs-xxxxxxxx   8E         RWX           fast           5m

Consuming a PersistentVolumeClaim is super easy, we now only need to reference what storage we want, not how we mount it (eg. nfs mount details).

apiVersion: apps/v1beta1
kind: Deployment
metadata:
  namespace: test-project
  name: production
spec:
  replicas: 3
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: web
        image: nginx:latest
        volumeMounts:
        - mountPath: /var/www/files
          name: files
      volumes:
      - name: files
        persistentVolumeClaim:
          claimName: files

Conclusion

Each of these APIs are for different use cases:

FlexVolumes are for how a volume is mounted, in this case, we already had access to the NFS volume mount in Kubernetes.
Storage Classes are for the what do you want eg. give me some storage please with X speed and X size.
Third Party Resources allowed us to prototype early and we still use this for other custom API definitions on our clusters.

I am very grateful for the contributors working on these APIs. What we have been able to achieve with them is a testament to their excellent design.

Any feedback or contributions on the AWS EFS project are most welcome.

https://github.com/previousnext/k8s-aws-efs

Further discussion is also welcome on this site and on this Hacker News discussion.

https://news.ycombinator.com/item?id=15138339

Tagged

Kubernetes, AWS

Nick SchuchOperations Lead