dev prometheus
This commit is contained in:
@ -449,6 +449,17 @@ It's time to create your admin account through <https://portainer.sw.okami101.io
|
||||
If you go to the stacks menu, you will note that both `traefik` and `portainer` are *Limited* control, because these stacks were done outside Portainer. We will create and deploy next stacks directly from Portainer GUI.
|
||||
{{< /alert >}}
|
||||
|
||||
## CLI tools
|
||||
|
||||
[`ctop`](https://github.com/bcicen/ctop) is a very useful CLI tools that works like `htop` but dedicated for docker containers. Install it on every docker hosts :
|
||||
|
||||
```sh
|
||||
echo "deb http://packages.azlux.fr/debian/ buster main" | sudo tee /etc/apt/sources.list.d/azlux.list
|
||||
wget -qO - https://azlux.fr/repo.gpg.key | sudo apt-key add -
|
||||
sudo apt update
|
||||
sudo apt install -y docker-ctop
|
||||
```
|
||||
|
||||
## Keep the containers image up-to-date ⬆️
|
||||
|
||||
It's finally time to test our new cluster environment by testing some images through the Portainer GUI. We'll start by installing [`Diun`](https://crazymax.dev/diun/), a very useful tool which notify us when used docker images has available update in his Docker registry.
|
||||
|
@ -12,10 +12,106 @@ Build your own cheap while powerful self-hosted complete CI/CD solution by follo
|
||||
|
||||
This is the **Part V** of more global topic tutorial. [Back to first part]({{< ref "/posts/02-build-your-own-docker-swarm-cluster" >}}) to start from beginning.
|
||||
|
||||
{{< alert >}}
|
||||
This part is totally optional, as it's mainly focused on monitoring. Feel free to skip this part.
|
||||
{{< /alert >}}
|
||||
|
||||
## Metrics with Prometheus 🔦
|
||||
|
||||
Prometheus is become the standard de facto for self-hosted monitoring in part thanks to his architecture. It's a TSDB (Time Series Database) that will poll (aka scrape) standard metrics REST endpoints, provided by the tools to monitor. It's the case of Traefik, as we seen in [part III]({{< ref "04-build-your-own-docker-swarm-cluster-part-3#traefik-" >}}). For tools that don't support it natively, like databases, you'll find many exporters that will do the job for you.
|
||||
|
||||
### Prometheus install 💽
|
||||
|
||||
I'll not use GlusterFS volume for storing Prometheus data, because :
|
||||
|
||||
* 1 instance needed on the master
|
||||
* No critical data, it's just metrics
|
||||
* No need of backup, and it can be pretty huge
|
||||
|
||||
First go to the `master-01` node settings in Portainer inside *Swarm Cluster overview*, and apply a new label that indicates that this node is the host of Prometheus data.
|
||||
|
||||

|
||||
|
||||
It's equivalent of doing :
|
||||
|
||||
```sh
|
||||
export NODE_ID=$(docker info -f '{{.Swarm.NodeID}}')
|
||||
docker node update --label-add prometheus.data=true $NODE_ID
|
||||
```
|
||||
|
||||
Then create a config file at `/etc/prometheus/prometheus.yml` in `master-01` node :
|
||||
|
||||
```yml
|
||||
global:
|
||||
scrape_interval: 5s
|
||||
|
||||
scrape_configs:
|
||||
- job_name: "prometheus"
|
||||
static_configs:
|
||||
- targets: ["localhost:9090"]
|
||||
|
||||
- job_name: "traefik"
|
||||
static_configs:
|
||||
- targets: ["traefik_traefik:8080"]
|
||||
```
|
||||
|
||||
It consists on 2 scrapes job, use `targets` in order to indicate to Prometheus the `/metrics` endpoint locations. I configure `5s` as interval, that means Prometheus will scrape `/metrics` endpoints every 5 seconds.
|
||||
|
||||
Finally create a `prometheus` stack in Portainer :
|
||||
|
||||
```yml
|
||||
version: '3.7'
|
||||
|
||||
services:
|
||||
|
||||
prometheus:
|
||||
image: prom/prometheus
|
||||
networks:
|
||||
- private
|
||||
- traefik_public
|
||||
command:
|
||||
- --config.file=/etc/prometheus/prometheus.yml
|
||||
- --storage.tsdb.retention.size=5GB
|
||||
- --storage.tsdb.retention.time=15d
|
||||
volumes:
|
||||
- /etc/hosts:/etc/hosts
|
||||
- /etc/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml
|
||||
- data:/prometheus
|
||||
deploy:
|
||||
placement:
|
||||
constraints:
|
||||
- node.labels.prometheus.data == true
|
||||
labels:
|
||||
- traefik.enable=true
|
||||
- traefik.http.routers.prometheus.middlewares=admin-ip,admin-auth
|
||||
- traefik.http.services.prometheus.loadbalancer.server.port=9090
|
||||
|
||||
networks:
|
||||
private:
|
||||
traefik_public:
|
||||
external: true
|
||||
|
||||
volumes:
|
||||
data:
|
||||
```
|
||||
|
||||
The `private` network will serve us later for exporters. Next config are useful in order to control the DB usage, as metrics can go up very quickly :
|
||||
|
||||
| argument | description |
|
||||
| --------------------------- | --------------------------- |
|
||||
| storage.tsdb.retention.size | The max DB size |
|
||||
| storage.tsdb.retention.time | The max data retention date |
|
||||
|
||||
Deploy it and <https://prometheus.sw.okami101.io> should be available after few seconds. Use same traefik credentials for login.
|
||||
|
||||
You should now have access to some metrics !
|
||||
|
||||

|
||||
|
||||
In *Status > Targets*, you should have 2 endpoints enabled, which correspond to above scrape config.
|
||||
|
||||

|
||||
|
||||
### Nodes & Containers metrics with cAdvisor & Node exporter
|
||||
|
||||
## Visualization with Grafana 📈
|
||||
|
Binary file not shown.
After Width: | Height: | Size: 138 KiB |
Binary file not shown.
After Width: | Height: | Size: 132 KiB |
Binary file not shown.
After Width: | Height: | Size: 126 KiB |
Reference in New Issue
Block a user