It's a poorly hidden fact that I love Kubernetes. After spending months running everything from Marathon DCOS and CoreOS to Rancher and Docker Swarm in production, Kubernetes is the only container orchestration platform that has truly struck me as truly "production ready" and I have been running it for the past year as a result.

While functionality when I first started using it (v1.4) was somewhat patchy and uninteresting, some of the more recent updates have been making sizeable strides towards addressing the operations challenges we face on a daily basis.

With v1.8, Kubernetes has introduced the CronJob controller to batch/v1beta1, making it generally available for people to play with. Sounds like the perfect time to show you how we use CronJobs to manage automated, scheduled, backups within our environments.

Introduction to CronJob

The Kubernetes CronJob controller is responsible for creating Jobs on a schedule. No, really, it is exactly that simple. Kubernetes Jobs take care of ensuring that the job runs correctly, managing crashes and completion time restrictions etc.

This allows you to ensure that a container is run every H 0 * * * - or every day, around midnight, for those who don't speak cron.

Let's take a simple example that shows how one would convert a Job to a CronJob script.

Converting Jobs
  • Job
  • CronJob
1
2
3
4
5
6
7
8
9
10
apiVersion: batch/v1
kind: Job
metadata:
name: say-hi
spec:
template:
spec:
containers:
- name: hello-world
image: hello-world

As you can see from this example, it is actually pretty trivial to convert an existing Kubernetes Job to a CronJob, making migrations quick and simple. You'll also notice that defining a job is no more complex than defining your Deployments.

Building a Backup Container

Now that you're familiar with how to define a Kubernetes CronJob, you probably want to know how to build the container that is going to run your backups for you. Because of the transient nature of a Kubernetes Job, you don't need to worry about problems like keeping the container running, internal scheduling etc.

This means that your backup container can really just run $YOUR_BACKUP_EXECUTABLE and exit when it is done. This removes a huge amount of the complexity that was previously involved with building backup containers and lets you focus on exactly the task you want to perform.

But let's not make this too easy, I personally want my backups to end up somewhere safe - otherwise what's the point? To achieve that, let's toss them over to S3 when we're done, giving us a pretty reliable place to keep track of them.

Backup Container
  • Dockerfile
  • sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# Fetch the mc command line client
FROM alpine:latest
RUN apk update && apk add ca-certificates wget && update-ca-certificates
RUN wget -O /tmp/mc https://dl.minio.io/client/mc/release/linux-amd64/mc
RUN chmod +x /tmp/mc
# Then build our backup image
FROM postgres:9.6
LABEL maintainer="Benjamin Pannell <admin@sierrasoftworks.com>"
COPY --from=0 /tmp/mc /usr/bin/mc
ENV MINIO_SERVER=""
ENV MINIO_BUCKET="backups"
ENV MINIO_ACCESS_KEY=""
ENV MINIO_SECRET_KEY=""
ENV MINIO_API_VERSION="S3v4"
ENV DATE_FORMAT="+%Y-%m-%d"
ADD entrypoint.sh /app/entrypoint.sh
ENTRYPOINT [ "/app/entrypoint.sh" ]

We're going to use the Minio command line client, a fully standards compliant S3 client, to upload our backup as it is created, so we grab the official binary and use Docker's new Multi Stage Builds functionality to toss that binary into the Postgres image (which includes pg_dump).

All that is left to do is put together an entrypoint which will run pg_dump and pipe the result to S3.

Defining our Backup Job

In the real world, you're going to want to draw things like your ACCESS_KEY and SECRET_KEY from the Kubernetes Secrets API and provide some additional metadata for tracking and organization, but the result isn't much more complicated than what we started with.

CronJob
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: my-backup
labels:
app: my-app
spec:
schedule: "H 0 * * *"
jobTemplate:
spec:
template:
metadata:
labels:
app: my-app
name: my-backup
spec:
containers:
- image: minback/mongo:latest
name: backup
args:
- my_db
- -h
- mongodb
env:
- name: MINIO_SERVER
value: http://minio:9000/
- name: MINIO_BUCKET
value: backups
- name: MINIO_ACCESS_KEY
valueFrom:
secretKeyRef:
key: access-key
name: minio-secrets
- name: MINIO_SECRET_KEY
valueFrom:
secretKeyRef:
key: secret-key
name: minio-secrets

Existing Backup Containers

In the interest of speeding up adoption, we have open sourced some of the backup containers we use in our infrastructure. These containers will run a backup of a given datastore and push the resulting backup to S3 using the Minio CLI.