############################# Installing Monitoring (Basic) ############################# We rely on Prometheus and Grafana for monitoring, which is typically installed using the `kube-prometheus-stack`_ Helm chart. Once you have a ``KUBECONFIG`` and access to the cluster installation is relatively simple, requiring just a name (the unimaginative **kube-prometheus-stack** in the following example). Run the following from the project root ``/monitoring`` directory, where you will find a suitable ``dev-values.yaml`` configuration file. kubectl create namespace monitoring helm install -f dev-values.yaml kube-prometheus-stack \ oci://ghcr.io/prometheus-community/charts/kube-prometheus-stack This will install the application in the ``monitoring`` **Namespace** with some persistence, and provide the following ingresses: - - https://alartmanager.xchem-dev.diamond.ac.uk/ - https://grafana.xchem-dev.diamond.ac.uk/ To get the Grafana ``admin`` password run the following: - kubectl get secret --namespace monitoring \ -l app.kubernetes.io/component=admin-secret \ -o jsonpath="{.items[0].data.admin-password}" | base64 --decode ; echo You can update/upgrade the installation with: - helm upgrade -f dev-values.yaml kube-prometheus-stack \ oci://ghcr.io/prometheus-community/charts/kube-prometheus-stack If using lens you will need to set the following properties of its "Metrics" Cluster Settings in order to see _live_ CPU and Memory stats: - - **METRICS SOURCE** : ``Prometheus`` - **PROMETHEUS** : ``Helm`` - **Filter empty containers** : ``Un-checked`` - **PROMETHEUS SERVICE ADDRESS** : ``monitoring/kube-prometheus-stack-prometheus:9090`` A similar set of values, for the production cluster, provide the following ingresses: - - https://alartmanager.xchem.diamond.ac.uk/ - https://grafana.xchem.diamond.ac.uk/ ******************** Scraping new metrics ******************** If you have your own application metrics you can instruct prometheus to scrape these by adding a suitable configuration to `prometheus - prometheusSpec - additionalScrapeConfigs`. For example, we can collect Fragalysis Stack metrics from Alan's development stack with the following ``additionalScrapeConfigs`` declaration: - prometheus: prometheusSpec: additionalScrapeConfigs: - job_name: stack-alan-default scheme: https scrape_interval: 10s static_configs: - targets: - fragalysis-alan-default.xchem-dev.diamond.ac.uk labels: app: alan-default If we then install a `Django dashboard`_ (like ``17658``) into Grafana we can see the metrics generated, and restrict them to Alan's stack by using the ``application`` value ``alan-default``. ***************** Useful Dashboards ***************** - **Node Exporter Full** (1860) - **Django** (17658) ******************* Removing Monitoring ******************* To remove monitoring, refer to the official `uninstall`_ guide. You might also need to remove the Alert Manager PVC. Check the namespace and delete if necessary: - kubectl delete pvc alertmanager-kube-prometheus-stack-alertmanager-db-alertmanager-kube-prometheus-stack-alertmanager-0 -n monitoring And then delete the _custom_ namespace: - kubectl delete namespace monitoring .. _django dashboard: https://grafana.com/grafana/dashboards/17658-django/ .. _kube-prometheus-stack: https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack .. _uninstall: https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack#uninstall-helm-chart