################## The Rancher Server ################## The **DEVELOPMENT** and **PRODUCTION** clusters have been created with, and are managed by `Rancher`_, deployed to the STFC/OpenStack cluster on a dedicated kubernetes cluster configured using `RKE`_: - `https://rancher-xchem.informaticsmatters.org `_ ...where you will need suitable credentials in order to log-in. A simplified depiction of the clusters can be seen in this diagram. Each cluster consists of key ``etcd`` and ``control plane`` nodes and various worker (``app`` and ``graph``) nodes. The instances are created and managed by the Rancher server. .. image:: ../images/frag-actions/frag-actions.016.png A full description of the Rancher installation and its configuration can be found in the following external (GoogleDoc) document: - * `OpenStack K8S clusters with Rancher`_ (AWS) .. warning:: The cluster instances are created automatically by the Rancher server. **DO NOT edit or delete any compute instance that may be a Rancher-managed Kubernetes instance via the STFC/OpenStack console**. To help you identify them the instances use a naming convention. In our case instance names that belong the the cluster hosting Rancher begin ``rke-``. Instance names that belong to the **DEVELOPMENT** or **PRODUCTION** cluster begin ``xch-``. ******************** Cluster etcd backups ******************** All Kubernetes objects are stored on `etcd`_. Periodically backing up the etcd cluster data is important to recover Kubernetes clusters under disaster scenarios, such as losing all master nodes. The snapshot file contains all the Kubernetes states and critical information. The backup of etcd for the RKE and Application clusters is automated using features built-in to RKE and Rancher, with snapshots written to an Informatics Matters AWS S3 bucket (detailed in the following sections). .. epigraph:: The credentials used to create AWS S3 backups are those of the user ``fragalysis-loader`` on the Informatics Matters AWS account. The secret access key is stored in the Informatics Matters `KeePassXC`_ application (under ``AWS -> AWS S3 (Fragalysis) User``). Application clusters ==================== A snapshot of each cluster's etcd content is configured to occur regularly with the local copy also copied to an Informatics Matters AWS S3 target in the ``eu-central-1`` region. Backups of the application clusters is performed by Rancher. Typical automated AWS S3 backup settings are illustrated in the following Rancher cluster configuration screenshot: - .. image:: ../images/rancher-s3-backup-configuration.png Cluster bucket and path details are as follows: - * The **PRODUCTION** cluster's etcd is backed up to ``im-rancher/xchem-production``. This occurs every 6 hours and 28 copies are kept (a 7-day approximate history) * The **DEVELOPMENT** cluster's etcd is backed up to ``im-rancher/xchem``. This occurs every 6 hours and 42 copies are kept (a 10-day approximate history) Individual backup file size [#f1]_ is approximately 7MB for **DEVELOPMENT** and 21MB for **PRODUCTION**. Rancher (RKE) cluster ===================== The etcd material for the RKE-formed Kubernetes cluster that hosts the Rancher server is also backed up. This backup is performed by RKE and configured using the RKE ``cluster.yml``, which can be found in the ``rancher`` directory of the Fragalysis `deployment repository`_. Cluster bucket and path details are as follows: - * The **RKE** cluster's etcd is backed up to ``im-rancher/rancher-xchem``. This occurs every 6 hours and 21 copies are kept (a 5-day approximate history) Individual backup file size [#f1]_ is approximately 8MB, with 3 files created per backup. For details of backup configuration refer to the `Rancher RKE backup`_ documentation. ********************* Cluster etcd recovery ********************* Application clusters ==================== Restoring an application cluster (**PRODUCTION** or **DEVELOPMENT**) from a backup is relatively straightforward. It can be done from within the Rancher console for the chosen cluster. Follow the Rancher `Restoring a Cluster from Backup`_ documentation, remembering that we're using a post v2.4.0 Rancher installation. Rancher (RKE) cluster ===================== Restoring the RKE-based cluster is a little more complicated, compared to restoring an application cluster, and you should follow the Rancher `Restoring Backups—Kubernetes`_ installs documentation, following the appropriate S3-based instructions. Some `example RKE scenarios`_ are illustrated [#f2]_. .. _deployment repository: https://github.com/InformaticsMatters/dls-fragalysis-stack-kubernetes .. _etcd: https://etcd.io .. _example rke scenarios: https://rancher.com/docs/rke/latest/en/etcd-snapshots/example-scenarios/ .. _keepassxc: https://keepassxc.org .. _rancher: https://rancher.com/products/rancher/ .. _restoring a cluster from backup: https://rancher.com/docs/rancher/v2.x/en/cluster-admin/restoring-etcd/ .. _restoring backups—kubernetes: https://rancher.com/docs/rancher/v2.x/en/backups/restorations/ha-restoration/ .. _rancher rke backup: https://rancher.com/docs/rancher/v2.x/en/backups/backups/ha-backups/ .. _rke: https://rancher.com/products/rke/ .. _OpenStack K8S clusters with Rancher: https://docs.google.com/document/d/15ffwm5daCW5gJ1ZNpX6A9mYP-rw3Bvyr9nZyiWKte00/edit?usp=sharing .. rubric:: Footnotes .. [#f1] July 2020 .. [#f2] We use RKE that's superior to v0.2.0