Backups

Here, we describe the data and file backups that are kept to enable recovery of the installation. In order to provide sufficient material to accomplish a complete recovery of the installation we keep copies of the following: -

  1. The rancher server. A docker-based service running in a asmake RKE cluster

  2. The production and development cluster _infrastructure_ database, used by Keycloak for authentication and AWX, the ansible playbook server

  3. The production stack database

  4. The production stack media files

Rancher Server

This service is responsible for managing the cluster VMs and provides kubectl/k9s/lens access to the clusters. Backup is relatively complex and is currently a manual operation that requires the server to be stopped. We could invest time in an automated backup but this would have to be one that can detect errors and alert a human operator. Probably a day or two to develop.

  • Backup process: Manual

  • Backup schedule: We recommend a manual backup is taken on Fridays at the end of the day (prior to a Sunday-night/Monday-morning fs-trim issue)

  • Backup location: STFC S3 Echo bucket (/nw-rancher)

Infrastructure Database (Development)

This database manages Fragalysis & Squonk2 application logins in the development cluster, providing federated access to CAS.

  • Backup process: This is handled by a CronJob container using the informaticsmatters/sql-backup container image. As the server hosts multiple databases it uses the pg_dumpall utility to dump the server contents to a backup volume (in the cluster) and rclone to copy this off cluster to an S3 bucket

  • Backup schedule: Every day at 03:07, keeping 28 copies

  • Backup location: STFC S3 Echo bucket (/im-infra-backup)

Infrastructure Database (Production)

This database manages Fragalysis & Squonk2 application logins for the production cluster, providing federated access to CAS. It is also used by the AWX ansible playbook server.

  • Backup process: This is handled by a CronJob container using the informaticsmatters/sql-backup conatiner image. As the server hosts multiple databases it uses the pg_dumpall utility to dump the server contents to a backup volume (in the cluster) and rclone to copy this off cluster to an S3 bucket

  • Backup schedule: Every day at 03:07, keeping 28 copies

  • Backup location: STFC S3 Echo bucket (/im-infra-production-backup)

Production Stack Database

This database is the Fragalysis django application database.

  • Backup process: This is handled by a CronJob container using the informaticsmatters/sql-backup container image. This backup only consists of the frag database, collected using the pg_dump utility to dump the server contents to a backup volume (in the cluster) and rclone to copy this off cluster to an S3 bucket

  • Backup schedule Every hour, keeping 24 copies

  • Backup location: STFC S3 Echo bucket (/nw-xch-prod-v2-production-stack-backup)

Production Stack Media

The production Fragalysis media directory, consisting of about 135,000 individula files consuming about 240GiB of disk space (September 2025).

  • Backup process: This is handled by a CronJob container using the /informaticsmatters/volume-replicator container image. The container uses rsync to synchronise data to an NFS volume and rclone to copy the data off the cluster to an S3 bucket

  • Backup schedule: Every day at 04:04, only the latest copy is kept

  • Backup location: STFC S3 Echo bucket (/fragalysis-stack-production-media)