Installing Monitoring (Basic)
We rely on Prometheus and Grafana for monitoring, which is typically installed
using the kube-prometheus-stack Helm chart. Once you have a KUBECONFIG and
access to the cluster installation is relatively simple, requiring just a name
(the unimaginative kube-prometheus-stack in the following example).
Run the following from the project root /monitoring directory,
where you will find a suitable dev-values.yaml configuration file.
kubectl create namespace monitoring helm install -f dev-values.yaml kube-prometheus-stack
oci://ghcr.io/prometheus-community/charts/kube-prometheus-stack
This will install the application in the monitoring Namespace with some
persistence, and provide the following ingresses: -
To get the Grafana admin password run the following: -
- kubectl get secret –namespace monitoring
-l app.kubernetes.io/component=admin-secret -o jsonpath=”{.items[0].data.admin-password}” | base64 –decode ; echo
You can update/upgrade the installation with: -
- helm upgrade -f dev-values.yaml kube-prometheus-stack
oci://ghcr.io/prometheus-community/charts/kube-prometheus-stack
If using lens you will need to set the following properties of its “Metrics” Cluster Settings in order to see _live_ CPU and Memory stats: -
METRICS SOURCE :
PrometheusPROMETHEUS :
HelmFilter empty containers :
Un-checkedPROMETHEUS SERVICE ADDRESS :
monitoring/kube-prometheus-stack-prometheus:9090
A similar set of values, for the production cluster, provide the following ingresses: -
Scraping new metrics
If you have your own application metrics you can instruct prometheus to scrape these by adding a suitable configuration to prometheus - prometheusSpec - additionalScrapeConfigs.
For example, we can collect Fragalysis Stack metrics from Alan’s development stack
with the following additionalScrapeConfigs declaration: -
- prometheus:
- prometheusSpec:
additionalScrapeConfigs: - job_name: stack-alan-default scheme: https scrape_interval: 10s static_configs: - targets:
fragalysis-alan-default.xchem-dev.diamond.ac.uk
labels: app: alan-default
If we then install a Django dashboard (like 17658) into Grafana we can see the
metrics generated, and restrict them to Alan’s stack by using the application value
alan-default.
Useful Dashboards
Node Exporter Full (1860)
Django (17658)
Removing Monitoring
To remove monitoring, refer to the official uninstall guide.
You might also need to remove the Alert Manager PVC. Check the namespace and delete if necessary: -
kubectl delete pvc alertmanager-kube-prometheus-stack-alertmanager-db-alertmanager-kube-prometheus-stack-alertmanager-0 -n monitoring
And then delete the _custom_ namespace: -
kubectl delete namespace monitoring