############# Data Recovery ############# In the event of data loss you will need to reconstruct the missing components (databases and files) using installation instructions and backups. What follows is a brief outline of steps to recover lost systems, based on what's been lost. We outline the recovery of: - 1. Production stack database 2. Production stack media directory 3. Infrastructure database (Development or Production) 4. The Rancher server data In all cases we assume that you have kubernetes clusters and the applications. This section does not cover the creation of the underlying clusters or the installation of the original applications. This section is simply about restoring data to a pre-existing installation. .. epigraph:: You can read detailed documentation relating to the provisioning of a cluster, and installation of the key applications by referring to our :doc:`installation guide <../installation/index>`. ************** Stack database ************** A convenient ansible playbook that can be used to restore backed-up databases can be found in the Informatics Matters `bandr-ansible`_ repository. From a clone of the repository you should create a suitable Python environment and install the required packages. With this done you should prepare a suitable set of ``parameters.yaml`` variables to control the playbook. Here is a set used recently (replace the values as appropriate):: recovery_image_tag: 15.7 recovery_host: database recovery_database: frag recovery_database_secret: database recovery_database_admin_user: admin recovery_namespace: production-stack recovery_volume_pvc: recovery recovery_volume_size_g: 40 recovery_volume_storageclass: csi-cinder-sc-delete recovery_volume_pvc_name: recovery recovery_sa: stack recovery_use_rclone_bucket_and_path: /nw-xch-prod-v2-production-stack-backup recovery_rclone_s3_endpoint: https://s3.echo.stfc.ac.uk recovery_rclone_s3_provider: Ceph You then need to provide Kubernetes cluster credentials and bucket credentials via a few key environment variables:: export K8S_AUTH_HOST=https://???? export K8S_AUTH_API_KEY=???? export K8S_AUTH_VERIFY_SSL=false export AWS_ACCESS_KEY_ID=???? export AWS_SECRET_ACCESS_KEY=???? And then run the recovery playbook:: ansible-playbook site-recovery.yaml -e @parameters.yaml Recovery of the ``frag`` database will only take a few minutes, with most of the time consumed by the recovery process copying files from the backup bucket. *********** Stack media *********** This is most easily accomplished from within a shell in the Production stack **Pod**. From there you should move to the Django media directory (``/code/media``). You will need to install the Python ``awscli`` package and know the S3 credentials that give you access the bucket where the media files are kept:: pip install awscli export AWS_ACCESS_KEY_ID=???? export AWS_SECRET_ACCESS_KEY=???? export AWS_DEFAULT_REGION= export AWS_ENDPOINT_URL_S3=https://s3.echo.stfc.ac.uk aws s3 cp --recursive s3://fragalysis-stack-production-media /code/media Be prepared for the recovery of the media volume to take significant time. With 240Gi of files to transfer (September 2025), at about 50-60MiB/s expect recovery to take about an hour. ************************ Infrastructure databases ************************ As the infrastructure database server contains multiple databases we currently rely on the `pg_dumpall` utility in order to get a complete copy of the server. backups are performed every day, and are kept for a number of days, perfomed by a **CronJob** operating ion the corresponding ``im-infra`` **Namespaces**. Backups are located in an Echo S3 bucket: - - Development cluster: ``/im-infra-backup`` - Prodcution cluster: ``/im-infra-production-backup`` Armed with the prevailing Postgres admin user and password, recovery can be performed manually via a Pod shell or using an AWX playbook. We test recovery using the ``site-recovery.yaml`` playbook (version **2024.1**) from our `bandr-ansible`_ repository. ************** Rancher server ************** Recovery of the Rancher server relies on manual backups that are kept on an S3 bucket (typically ``/nw-rancher``). You can follow the Rancher instructions for recovery of data on a docker installation using their own instructions:: - See `restore-docker-installed-rancher`_ .. _bandr-ansible: https://github.com/InformaticsMatters/bandr-ansible .. _restore-docker-installed-rancher: https://ranchermanager.docs.rancher.com/how-to-guides/new-user-guides/backup-restore-and-disaster-recovery/restore-docker-installed-rancher