Cluster Requirements (AWS)
The following minimum (preliminary) cluster requirements will need to be satisfied before the Fragalysis Stack can be deployed and used.
Cluster
Kubernetes Admin User. The kubernetes cluster must provide a non-tokenised (non-expiring) user with cluster admin privileges. This is the user that the deployment playbooks will use to maintain the cluster.
An AWS IAM user capable of reading from an AWS S3 bucket, used to provision fragment graph and Frgalysis media data (i.e. a user with at least
AmazonS3ReadOnlyAccesspermissions).AWS S3. The cluster must allow READ access to AWS S3 where fragment data for the neo4j graph database and loader (media) data for the Stacks is expected to reside. The bucket name can be configured during deployment.
In EKS the Graph Node (see below) is likely to need the
arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccesspolicy, typically assigned to the node using theiam -> attachPolicyARNsblock in the cluster definition file. You can see this in the cluster example in our ansible-infrastructure repository.We can provide open-access to the existing Fragalysis Stack graph data but if you want to use your own fragment data you will need to ensure you publish it to a suitable bucket that can be accessed by the cluster.
One Application Node. A compute instance with the following minimum specification: -
8 cores
32Gi RAM
40Gi root volume
Kubernetes node labels
purpose=application
Kubernetes node taints
(none)
One Graph Node. A compute instance with the following minimum specification: -
8 cores
>128Gi RAM
40Gi root volume
Kubernetes node labels
purpose=bigmem
Kubernetes node taints
purpose=bigmem:NoSchedule
GitHub Access. The cluster must allow access to Ansible playbooks and roles that are located on publicly accessible repositories on GitHub. The cluster must not be prevented from accessing these repositories. The current list of GitHub repositories is listed below: -
InformaticsMatters/dls-fragalysis-stack-kubernetes
InformaticsMatters/docker-neo4j-ansible
Hostnames. You will need to provide routing to your cluster for at least two hostanmes, one for the fragalysis stack (i.e.
frafalysis.example.com) and one for the AWX server (i.e.awx.example.com).Networking (ingress). We deploy the nginx ingress controller as a DaemonSet, deployed to each compute instance. This acts as an internal load-balancer and routing service. It directs HTTPS traffic to the corresponding container (Pod).
Networking (load balancing). We need to load-balance traffic to the cluster. On AWS, rather than create a Application Load Balancer, which would normally result in a an ALB instance for each ingress, we create a
LoadBalancerService, which creates a single layer-4 AWS Network Load Balancer (NLB) for the entire cluster.If the use of an NLB is not acceptable and instead you want to use an ALB or your own load-balancing solution you will be responsible for its installation and management.
Networking (certificates). The fragalysis stack is a web-based application that the user normally interacts with using a resolvable hostname, i.e. fragalysis.example.com.
To simplify and streamline deployment, and avoid users having to provide their own certificates, our solution deploys the cert-manager, a native Kubernetes certificate management controller. We use it to automatically issue and renew certificates to allow SSL (HTTPS) connection to the stack using Let’s Encrypt. This relies on the certificate manager’s ability to connect to the Let’s Encrypt service.
If this is not possible in your cluster and you need HTTPS connections to the stack you’re deploying you will need to provide your own certificate solution.
A RWO storage class. This is typically GP2 and is often built-in, especially if you’re using EKS.
Control Machine
You will also need a suitable Control Machine, from which you will be running at least some Ansible playbooks. The control machine’s requirements are covered in the Command-Line Requirements (Control Machine) document.