Merge branch 'k8s'

This commit is contained in:
2023-08-30 21:06:48 +02:00
77 changed files with 7808 additions and 0 deletions

Binary file not shown.

Before

Width:  |  Height:  |  Size: 38 KiB

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 15 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 30 KiB

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 17 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 17 KiB

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 6.2 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 20 KiB

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 5.3 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 27 KiB

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 9.0 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 38 KiB

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 14 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 17 KiB

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 5.0 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 32 KiB

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 17 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 25 KiB

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 7.9 KiB

View File

@ -0,0 +1,162 @@
---
title: "A beautiful GitOps day - Build your self-hosted Kubernetes cluster"
date: 2023-08-18
description: "Follow this opinionated guide as starter-kit for your own Kubernetes platform..."
tags: ["kubernetes"]
---
{{< lead >}}
Use GitOps workflow for building a production grade on-premise Kubernetes cluster on cheap VPS provider, with complete CI/CD 🎉
{{< /lead >}}
## The goal 🎯
This guide is mainly intended for any developers or some SRE who want to build a Kubernetes cluster that respect following conditions:
1. **On-Premise management** (The Hard Way), so no vendor lock in to any managed Kubernetes provider (KaaS/CaaS)
2. Hosted on affordable VPS provider (**Hetzner**), with strong **Terraform support**, allowing **GitOps** principles
3. **High Availability** with cloud Load Balancer, resilient storage and DB with replication, allowing automatic upgrades or maintenance without any downtime for production apps
4. Include complete **monitoring**, **logging** and **tracing** stacks
5. Complete **CI/CD pipeline**
6. Budget target **~$60/month** for complete cluster with all above tools, can be far less if no need for HA, CI or monitoring features
### What you'll learn 📚
* Use Terraform to manage your infrastructure, for both cloud provider and Kubernetes, following the GitOps principles
* How to set up an On-Premise resilient Kubernetes cluster, from the ground up, with automatic upgrades and reboot
* Use [K3s](https://k3s.io/) as lightweight Kubernetes distribution
* Use [Traefik](https://traefik.io/) as ingress controller, combined to [cert-manager](https://cert-manager.io/) for distributed SSL certificates, and secured access attempt to our cluster through Hetzner Load Balancer
* Use [Longhorn](https://longhorn.io/) as resilient storage, installed to dedicated storage nodes pool and volumes, include PVC incremental backups to S3
* Install and configure data stateful components as **PostgreSQL** and **Redis** clusters to specific nodes pool via well-known [Bitnami Helms](https://bitnami.com/stacks/helm)
* Test our resilient storage with some No Code apps, as [n8n](https://n8n.io/) and [nocodb](https://nocodb.com/), managed by Flux
* Complete monitoring and logging stack with [Prometheus](https://prometheus.io/), [Grafana](https://grafana.com/), [Loki](https://grafana.com/oss/loki/)
* Mount a complete self-hosted CI pipeline with the lightweight [Gitea](https://gitea.io/) + [Concourse CI](https://concourse-ci.org/) combo
* Test above CI tools with a sample **.NET API app**, using database cluster, with automatic CD using Flux
* Integrate the app to our monitoring stack with [OpenTelemetry](https://opentelemetry.io/), and use [Tempo](https://grafana.com/oss/tempo/) for distributed tracing
* Go further with [SonarQube](https://www.sonarsource.com/products/sonarqube/) for Continuous Inspection on code quality, including automatic code coverage reports
* Do some load testing scenarios with [k6](https://k6.io/) and frontend SPA sample deployment using the .NET API.
### You probably don't need Kubernetes 🪧
All of this is of course overkill for any personal usage, and is only intended for learning purpose or getting a low-cost semi-pro grade K3s cluster.
**Docker Swarm** is probably the best solution for 99% of people that need a simple container orchestration system. Swarm stays an officially supported project, as it's built in into the Docker Engine, even if we shouldn't expect any new features.
I wrote a [complete dedicated 2022 guide here]({{< ref "/posts/02-build-your-own-docker-swarm-cluster" >}}) that explains all steps in order to have a semi-pro grade Swarm cluster (but no GitOps oriented, using only Portainer UI).
## Cluster Architecture 🏘️
Here are the node pools that we'll need for a complete self-hosted Kubernetes cluster, where each node pool is scalable independently:
| Node pool | Description |
| ------------ | ---------------------------------------------------------------------------------------------------------- |
| `controller` | The control planes nodes, use at least 3 or any greater odd number (when etcd) for HA kube API server |
| `worker` | Workers for your production/staging stateless apps |
| `storage` | Dedicated nodes for running Longhorn for resilient storage and DB, in case you won't use managed databases |
| `monitor` | Workers dedicated for monitoring, optional |
| `runner` | Workers dedicated for CI/CD pipelines execution, optional |
Here a HA architecture sample with replicated storage (via Longhorn) and PostgreSQL DB (controllers, monitoring and runners are excluded for simplicity):
{{< mermaid >}}
flowchart TB
client((Client))
client -- Port 80 + 443 --> lb{LB}
lb{LB}
lb -- Port 80 --> worker-01
lb -- Port 80 --> worker-02
lb -- Port 80 --> worker-03
subgraph worker-01
direction TB
traefik-01{Traefik}
app-01([My App replica 1])
traefik-01 --> app-01
end
subgraph worker-02
direction TB
traefik-02{Traefik}
app-02([My App replica 2])
traefik-02 --> app-02
end
subgraph worker-03
direction TB
traefik-03{Traefik}
app-03([My App replica 3])
traefik-03 --> app-03
end
overlay(Overlay network)
worker-01 --> overlay
worker-02 --> overlay
worker-03 --> overlay
overlay --> db-primary
overlay --> db-read
db-primary((Primary SVC))
db-primary -- Port 5432 --> storage-01
db-read((Read SVC))
db-read -- Port 5432 --> storage-02
db-read -- Port 5432 --> storage-03
subgraph storage-01
direction TB
pg-primary([PostgreSQL primary])
longhorn-01[(Longhorn<br>volume)]
pg-primary --> longhorn-01
end
subgraph storage-02
direction TB
pg-replica-01([PostgreSQL replica 1])
longhorn-02[(Longhorn<br>volume)]
pg-replica-01 --> longhorn-02
end
subgraph storage-03
direction TB
pg-replica-02([PostgreSQL replica 2])
longhorn-03[(Longhorn<br>volume)]
pg-replica-02 --> longhorn-03
end
db-streaming(Streaming replication)
storage-01 --> db-streaming
storage-02 --> db-streaming
storage-03 --> db-streaming
{{</ mermaid >}}
### Cloud provider choice ☁️
As a HA Kubernetes cluster can be quickly expensive, a good cloud provider is an essential part.
After testing many providers, as Digital Ocean, Vultr, Linode, Civo, OVH, Scaleway, it seems like **Hetzner** is very well suited **in my opinion**:
* Very competitive price for middle-range performance (plan only around **$6** for 2 CPU/4 GB for each node)
* No frills, just the basics, VMs, block volumes, load balancer, firewall, and that's it
* Nice UI + efficient CLI tool
* Official strong [Terraform support](https://registry.terraform.io/providers/hetznercloud/hcloud/latest), so GitOps ready
Please let me know in below comments if you have other better suggestions !
### Final cost estimate 💰
| Server Name | Type | Quantity | Unit Price |
| ------------ | -------- | --------------------- | ---------- |
| `worker` | **LB1** | 1 | 5.39 |
| `manager-0x` | **CX21** | 1 or 3 for HA cluster | 0.5 + 4.85 |
| `worker-0x` | **CX21** | 2 or 3 | 0.5 + 4.85 |
| `storage-0x` | **CX21** | 2 for HA database | 0.5 + 4.85 |
| `monitor-0x` | **CX21** | 1 | 0.5 + 4.85 |
| `runner-0x` | **CX21** | 1 | 0.5 + 4.85 |
**€0.5** is for primary IPs.
We will also need some expendable block volumes for our storage nodes. Let's start with **20 GB**, **2\*0.88**.
(5.39+**8**\*(0.5+4.85)+**2**\*0.88)\*1.2 = **€59.94** / month
We targeted **€60/month** for a minimal working CI/CD cluster, so we are good !
You can also prefer to take **2 larger** cx31 worker nodes (**8 GB** RAM) instead of **3 smaller** ones, which [will optimize resource usage](https://learnk8s.io/kubernetes-node-size), so:
(5.39+**7**\*0.5+**5**\*4.85+**2**\*9.2+**2**\*0.88)\*1.2 = **€63.96** / month
For an HA cluster, you'll need to put 2 more cx21 controllers, so **€72.78** (for 3 small workers option) or **€76.80** / month (for 2 big workers option).
## Lets party 🎉
Enough talk, [let's go Charles !]({{< ref "/posts/11-a-beautiful-gitops-day-1" >}})

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 15 KiB

View File

@ -0,0 +1,522 @@
---
title: "A beautiful GitOps day I - Cluster initialization with Terraform and K3s"
date: 2023-08-19
description: "Follow this opinionated guide as starter-kit for your own Kubernetes platform..."
tags: ["kubernetes", "terraform", "hetzner", "k3s", "gitops"]
---
{{< lead >}}
Use GitOps workflow for building a production grade on-premise Kubernetes cluster on cheap VPS provider, with complete CI/CD 🎉
{{< /lead >}}
This is the **Part I** of more global topic tutorial. [Back to guide summary]({{< ref "/posts/10-a-beautiful-gitops-day" >}}) for intro.
## The boring part (prerequisites)
Before attack the next part of this guide, I'll assume you have hard prerequisites.
### External providers
* A valid domain name with access to a the DNS zone administration, I'll use [Cloudflare](https://www.cloudflare.com/) and `kube.rocks` as sample domain
* [Hetzner Cloud](https://www.hetzner.com/cloud) account
* Any S3 bucket for long-term storage (backups, logs), I'll use [Scaleway](https://www.scaleway.com/) for this guide
* Any working SMTP account for transactional emails, not hardly required but maybe more handy
### Terraform variables
For better fluidity, here is the expected list of variables you'll need to prepare. Store them in a secured place.
| Variable | Sample value | Note |
| ----------------- | ------------------------------- | ------------------------------------------------------------------------------- |
| `hcloud_token` | xxx | Token of existing **empty** Hetzner Cloud project <sup>1</sup> |
| `domain_name` | kube.rocks | Valid registred domain name |
| `acme_email` | <me@kube.rocks> | Valid email for Let's Encrypt registration |
| `dns_api_token` | xxx | Token of your DNS provider for issuing certificates <sup>2</sup> |
| `ssh_public_key` | ssh-ed25519 xxx <me@kube.rocks> | Your public SSH key for cluster OS level access <sup>3</sup> |
| `whitelisted_ips` | [82.82.82.82] | List of dedicated public IPs allowed for cluster management access <sup>4</sup> |
| `s3_endpoint` | s3.fr-par.scw.cloud | Custom endpoint if not using AWS |
| `s3_region` | fr-par | |
| `s3_bucket` | kuberocks | |
| `s3_access_key` | xxx | |
| `s3_secret_key` | xxx | |
| `smtp_host` | smtp-relay.brevo.com | |
| `smtp_port` | 587 | |
| `smtp_user` | <me@kube.rocks> | |
| `smtp_password` | xxx | |
<sup>1</sup> Check [this link](https://github.com/hetznercloud/cli#getting-started>) for generating a token
<sup>2</sup> Check cert-manager documentation to generate the token for supporting DNS provider, [example for Cloudflare](https://cert-manager.io/docs/configuration/acme/dns01/cloudflare/#api-tokens)
<sup>3</sup> Generate a new SSH key with `ssh-keygen -t ed25519 -C "me@kube.rocks"`
<sup>4</sup> If your ISP doesn't provide static IP, you may need to use a custom VPN, hopefully Hetzner provide a self-hostable [one-click solution](https://github.com/hetznercloud/apps/tree/main/apps/hetzner/wireguard).
For more enterprise grade solution check [Teleport](https://goteleport.com/), which is not covered by this guide. Whatever the solution is, it's essential to have at least one of them for obvious security reasons.
### Local tools
* Git and SSH obviously
* [Terraform](https://www.terraform.io/downloads.html) >= 1.5.0
* Hcloud CLI >= 1.35.0 already connected to an **empty** project <https://github.com/hetznercloud/cli#getting-started>
* Kubernetes CLI
* [Flux CLI](https://fluxcd.io/flux/cmd/) for CD
* [Fly CLI](https://github.com/concourse/concourse/releases/latest) for CI
## Cluster initialization using Terraform
For that we'll using the official [Hetzner Cloud provider](https://registry.terraform.io/providers/hetznercloud/hcloud) for Terraform.
However, writing all terraform logic from scratch is a bit tedious, even more if including K3s initial setup, so a better approach is to use a dedicated module that will considerably reduce code boilerplate.
### Choosing K3s Terraform module
We have mainly 2 options:
* Using the strongest community driven module for Hetzner: [Kube Hetzner](https://registry.terraform.io/modules/kube-hetzner/kube-hetzner/hcloud/latest)
* Write our own reusable module or using my [existing start-kit module](https://registry.terraform.io/modules/okami101/k3s)
Here are the pros and cons of each module:
| | [Kube Hetzner](https://registry.terraform.io/modules/kube-hetzner/kube-hetzner/hcloud/latest) | [Okami101 K3s](https://registry.terraform.io/modules/okami101/k3s) |
| ----------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Support** | Strong community | Just intended as a reusable starter-kit |
| **Included helms** | Traefik, Longhorn, Cert Manager, Kured | None, just the K3s initial setup, as it's generally preferable to manage this helms dependencies on separated terraform project, allowing easier upgrading |
| **Hetzner integration** | Complete, use [Hcloud Controller](https://github.com/hetznercloud/hcloud-cloud-controller-manager) internally, allowing dynamic Load Balancing, autoscaling, cleaner node deletion | Basic, public Load Balancer is statically managed by the nodepool configuration, no autoscaling support |
| **OS** | openSUSE MicroOS, optimized for container worloads | Debian 11 or Ubuntu 22.04 |
| **Initial setup** | Require packer for initial Snapshot creation, and slower on node creation | Just about ~1 minute for complete cluster creation, 1 more for initialization setup |
| **Client support** | POSIX-based OS only, require WSL on Windows | All including Powershell |
| **Internal complexity** | Huge, you can't really put your head inside | Very accessible, easy to extend and fork, better for learning |
| **Upgrade** | You may need to follow new versions regularly | As a simple starter-kit, no need to support all community problems, so very few updates |
| **Quality** | Use many hacks to satisfy all community needs, plenty of remote-exec and file provisioner which is not recommended by HashiCorp themselves | Use standard **cloud-config** for initial provisioning, then **Salt** for cluster OS management |
| **Security** | Needs an SSH private key because of local provisioners, and SSH port opened to every node | Require only public SSH key, minimized opened SSH ports to only controllers, use SSH jump from a controller to access any internal worker node |
| **Reusability** | Vendor locked to Hetzner Cloud | Easy to adapt for a different cloud provider as long as it supports **cloud-config** (as 99% of them) |
So for resume, choose Kube Hetzner module if:
* You want to use an OS optimized for containers, but note as it takes more RAM usage than Debian-like distro (230 Mo VS 120Mo).
* Strong community support is important for you
* Need of [Hcloud Controller](https://github.com/hetznercloud/hcloud-cloud-controller-manager) functionalities from the ground up, giving support for **autoscaling** and **dynamic load balancing**
Choose the starter-kit module if:
* You want to use a more standard OS, as Debian or Ubuntu, which consume less RAM, managed by preinstalled Salt
* You prefer to start with a simplistic module, without internal hacks, giving you a better understanding of the cluster setup step-by-step and more moveable to another cloud provider
* Very quick to set up, as it doesn't require any packer image creation, and use cloud-config for initial setup, without any client OS dependencies
* Preferring manage additional helm dependencies on a separated terraform project
For this guide, I'll consider using the starter kit as it's more suited for tutorials and allow better understanding of all steps of cluster creation process. You'll can more easily switch to the Kube Hetzner version later.
### 1st Terraform project
Let's initialize basic cluster setup. Create an empty folder (I name it `demo-kube-hcloud` here) for our terraform project, and create following `kube.tf` file:
{{< highlight host="demo-kube-hcloud" file="kube.tf" >}}
```tf
terraform {
required_providers {
hcloud = {
source = "hetznercloud/hcloud"
}
}
backend "local" {}
}
variable "hcloud_token" {
type = string
sensitive = true
}
variable "my_public_ssh_keys" {
type = list(string)
sensitive = true
}
variable "my_ip_addresses" {
type = list(string)
sensitive = true
}
variable "s3_access_key" {
type = string
sensitive = true
}
variable "s3_secret_key" {
type = string
sensitive = true
}
provider "hcloud" {
token = var.hcloud_token
}
module "hcloud_kube" {
providers = {
hcloud = hcloud
}
source = "okami101/k3s/hcloud"
server_image = "ubuntu-22.04"
server_timezone = "Europe/Paris"
server_locale = "fr_FR.UTF-8"
server_packages = ["nfs-common"]
ssh_port = 2222
cluster_name = "kube"
cluster_user = "rocks"
my_public_ssh_keys = var.my_public_ssh_keys
my_ip_addresses = var.my_ip_addresses
k3s_channel = "stable"
tls_sans = ["cp.kube.rocks"]
disabled_components = ["traefik"]
kubelet_args = [
"eviction-hard=memory.available<250Mi"
]
etcd_s3_backup = {
etcd-s3-endpoint = "s3.fr-par.scw.cloud"
etcd-s3-access-key = var.s3_access_key
etcd-s3-secret-key = var.s3_secret_key
etcd-s3-region = "fr-par"
etcd-s3-bucket = "mykuberocks"
etcd-snapshot-schedule-cron = "0 0 * * *"
}
control_planes = {
server_type = "cx21"
location = "nbg1"
count = 1
private_interface = "ens10"
labels = []
taints = [
"node-role.kubernetes.io/control-plane:NoSchedule"
]
}
agent_nodepools = [
{
name = "worker"
server_type = "cx21"
location = "nbg1"
count = 1
private_interface = "ens10"
labels = []
taints = []
}
]
}
output "ssh_config" {
value = module.hcloud_kube.ssh_config
}
```
{{</ highlight >}}
#### Explanation
Get a complete description of the above file [here](https://github.com/okami101/terraform-hcloud-k3s/blob/main/kube.tf.example).
{{< tabs >}}
{{< tab tabName="State" >}}
```tf
backend "local" {}
```
I'm using a local backend for simplicity, but for teams sharing, you may use more appropriate backend, like S3 or Terraform Cloud (the most secured with encryption at REST, versioning and centralized locking).
Treat the Terraform state very carefully in secured place, as it's the only source of truth for your cluster. If leaked, consider the cluster as **compromised and you should active DRP (disaster recovery plan)**. The first vital action is at least to renew the Hetzner Cloud and S3 tokens immediately.
At any case, consider any leak of writeable Hetzner Cloud token as a **Game Over**. Indeed, even if the attacker has no direct access to existing servers, mainly because cluster SSH private key as well as kube config are not stored into Terraform state, he still has full control of infrastructure, and can do the following actions:
1. Create new server to same cluster network with its own SSH access.
2. Install a new K3s agent and connect it to the controllers thanks to the generated K3s token stored into Terraform state.
3. Sniff any data from the cluster that comes to the compromised server, including secrets, thanks to the new agent.
4. Get access to remote S3 backups.
In order to mitigate any risk of critical data leak, you may use data encryption whenever is possible. K3s offer it [natively for etcd](https://docs.k3s.io/security/secrets-encryption). Longhorn also offer it [natively for volumes](https://longhorn.io/docs/latest/advanced-resources/security/volume-encryption/) (including backups).
{{</ tab >}}
{{< tab tabName="Global" >}}
```tf
server_image = "ubuntu-22.04"
server_timezone = "Europe/Paris"
server_locale = "fr_FR.UTF-8"
server_packages = ["nfs-common"]
ssh_port = 2222
cluster_name = "kube"
cluster_user = "rocks"
my_public_ssh_keys = var.my_public_ssh_keys
my_ip_addresses = var.my_ip_addresses
```
Choose between `ubuntu-22.04` or `debian-11`, and set the timezone, locale and the default packages you want to install on initial node provisioning. Once server created you may use Salt for changing them globally in the cluster.
Why not `debian-12` ? Because it's sadly not yet supported by [Salt project](https://github.com/saltstack/salt/issues/64223)...
{{< alert >}}
`nfs-common` package is required for Longhorn in order to support RWX volumes.
{{</ alert >}}
`cluster_name` is the node's name prefix and will have the format `{cluster_name}-{pool_name}-{index}`, for example `kube-storage-01`. `cluster_user` is the username UID 1000 for SSH access with sudo rights. `root` user is disabled for remote access security reasons.
{{</ tab >}}
{{< tab tabName="K3s" >}}
```tf
k3s_channel = "stable"
tls_sans = ["cp.kube.rocks"]
disabled_components = ["traefik"]
kubelet_args = [
"eviction-hard=memory.available<250Mi"
]
```
This is the K3s specific configuration, where you can choose the channel (stable or latest), the TLS SANs, and the kubelet arguments.
I'm disabling included Traefik because we'll use a more flexible official Helm later.
I also prefer increase the eviction threshold to 250Mi, in order to avoid OS OOM killer.
{{</ tab >}}
{{< tab tabName="Backup" >}}
```tf
etcd_s3_backup = {
etcd-s3-endpoint = "s3.fr-par.scw.cloud"
etcd-s3-access-key = var.s3_access_key
etcd-s3-secret-key = var.s3_secret_key
etcd-s3-region = "fr-par"
etcd-s3-bucket = "mykuberocks"
etcd-snapshot-schedule-cron = "0 0 * * *"
}
```
This will enable automatic daily backup of etcd database on S3 bucket, which is useful for faster disaster recovery. See the official guide [here](https://docs.k3s.io/datastore/backup-restore).
{{</ tab >}}
{{< tab tabName="Cluster" >}}
```tf
control_planes = {
server_type = "cx21"
location = "nbg1"
count = 1
private_interface = "ens10"
labels = []
taints = [
"node-role.kubernetes.io/control-plane:NoSchedule"
]
}
agent_nodepools = [
{
name = "worker"
server_type = "cx21"
location = "nbg1"
count = 1
private_interface = "ens10"
labels = []
taints = []
}
]
```
This is the heart configuration of the cluster, where you can define the number of control planes and workers nodes, their type, and their network interface. We'll use 1 master and 1 worker to begin with.
The interface `ens10` is proper for intel CPU, use `enp7s0` for AMD.
Use the taint `node-role.kubernetes.io/control-plane:NoSchedule` in order to prevent any workload to be scheduled on the control plane.
{{</ tab >}}
{{< tab tabName="SSH" >}}
```tf
output "ssh_config" {
value = module.hcloud_k3s.ssh_config
}
```
Will print the SSH config access after cluster creation.
{{</ tab >}}
{{</ tabs >}}
#### Inputs
As input variables, you have the choice to use environment variables or separated `terraform.tfvars` file.
{{< tabs >}}
{{< tab tabName="terraform.tfvars file" >}}
{{< highlight host="demo-kube-hcloud" file="terraform.tfvars" >}}
```tf
hcloud_token = "xxx"
my_public_ssh_keys = [
"82.82.82.82/32"
]
my_ip_addresses = [
"ssh-ed25519 xxx"
]
s3_access_key = "xxx"
s3_secret_key = "xxx"
```
{{</ highlight >}}
{{</ tab >}}
{{< tab tabName="Environment variables" >}}
```sh
export TF_VAR_hcloud_token="xxx"
export TF_VAR_my_public_ssh_keys='["xxx"]'
export TF_VAR_my_ip_addresses='["ssh-ed25519 xxx me@kube.rocks"]'
export TF_VAR_s3_access_key="xxx"
export TF_VAR_s3_secret_key="xxx"
```
{{</ tab >}}
{{</ tabs >}}
#### Terraform apply
It's finally time to initialize the cluster:
```sh
terraform init
terraform apply
```
Check the printed plan and confirm. The cluster creation will take about 1 minute. When finished following SSH configuration should appear:
```sh
Host kube
HostName xxx.xxx.xxx.xxx
User rocks
Port 2222
Host kube-controller-01
HostName 10.0.0.2
HostKeyAlias kube-controller-01
User rocks
Port 2222
ProxyJump kube
Host kube-worker-01
HostName 10.0.1.1
HostKeyAlias kube-worker-01
User rocks
Port 2222
ProxyJump kube
```
#### Git-able project
As we are GitOps, you'll need to version the Terraform project. With a proper gitignore generator tool like [gitignore.io](https://docs.gitignore.io/install/command-line) It's just a matter of:
```sh
git init
gig terraform
```
And the project is ready to be pushed to any Git repository.
#### Cluster access
Merge above SSH config into your `~/.ssh/config` file, then test the connection with `ssh kube`.
{{< alert >}}
If you get "Connection refused", it's probably because the server is still on cloud-init phase. Wait a few minutes and try again. Be sure to have the same public IPs as the one you whitelisted in the Terraform variables. You can edit them and reapply the Terraform configuration at any moment.
{{</ alert >}}
Before using K3s, let's enable Salt for OS management by taping `sudo salt-key -A -y`. This will accept all pending keys, and allow Salt to connect to all nodes. To upgrade all nodes at one, just type `sudo salt '*' pkg.upgrade`.
In order to access the 1st worker node, you only have to use `ssh kube-worker-01`.
### K3s access and usage
It's time to log in to K3s and check the cluster status from local.
From the controller, copy `/etc/rancher/k3s/k3s.yaml` on your machine located outside the cluster as `~/.kube/config`. Then replace the value of the server field with the IP or name of your K3s server. `kubectl` can now manage your K3s cluster.
{{< alert >}}
If `~/.kube/config` already existing, you have to properly [merging the config inside it](https://able8.medium.com/how-to-merge-multiple-kubeconfig-files-into-one-36fc987c2e2f). You can use `kubectl config view --flatten` for that.
Then use `kubectl config use-context kube` for switching to your new cluster.
{{</ alert >}}
Type `kubectl get nodes` and you should see the 2 nodes of your cluster in **Ready** state.
```txt
NAME STATUS ROLES AGE VERSION
kube-controller-01 Ready control-plane,etcd,master 153m v1.27.4+k3s1
kube-worker-01 Ready <none> 152m v1.27.4+k3s1
```
#### Kubectl Aliases
As we'll use `kubectl` a lot, I highly encourage you to use aliases for better productivity:
* <https://github.com/ahmetb/kubectl-aliases> for bash
* <https://github.com/shanoor/kubectl-aliases-powershell> for Powershell
After the install the equivalent of `kubectl get nodes` is `kgno`.
#### Test adding new workers
Now, adding new workers is as simple as increment the `count` value of the worker nodepool 🚀
{{< highlight file="kube.tf" >}}
```tf
agent_nodepools = [
{
name = "worker"
// ...
count = 3
// ...
}
]
```
{{</ highlight >}}
Then apply the Terraform configuration again. After few minutes, you should see 2 new nodes in **Ready** state.
```txt
NAME STATUS ROLES AGE VERSION
kube-controller-01 Ready control-plane,etcd,master 166m v1.27.4+k3s1
kube-worker-01 Ready <none> 165m v1.27.4+k3s1
kube-worker-02 Ready <none> 42s v1.27.4+k3s1
kube-worker-03 Ready <none> 25s v1.27.4+k3s1
```
{{< alert >}}
You'll have to use `sudo salt-key -A -y` each time you'll add a new node to the cluster for global OS management.
{{</ alert >}}
#### Deleting workers
Simply decrement the `count` value of the worker nodepool, and apply the Terraform configuration again. After few minutes, you should see the node in **NotReady** state.
To finalize the deletion, delete the node from the cluster with `krm no kube-worker-03`.
{{< alert >}}
If node have some workloads running, you'll have to consider a proper [draining](https://kubernetes.io/docs/tasks/administer-cluster/safely-drain-node/) before deleting it.
{{</ alert >}}
## 1st check ✅
We now have a working cluster, fully GitOps managed, easy to scale up, let's install [a load balanced ingress controller for external access through SSL]({{< ref "/posts/12-a-beautiful-gitops-day-2" >}}).

View File

@ -0,0 +1 @@
<svg data-name="Layer 1" xmlns="http://www.w3.org/2000/svg" width="854.63024" height="686" viewBox="0 0 854.63024 686" xmlns:xlink="http://www.w3.org/1999/xlink"><title>runner_start</title><rect y="600" width="821" height="9.05263" fill="#3f3d56"/><rect y="676.94737" width="821" height="9.05263" fill="#3f3d56"/><path d="M922.86275,715.32765c-.49067-.80178-12.06007-20.1196-16.071-60.23352-3.679-36.80241-1.31333-98.83576,30.858-185.36715,60.9476-163.928-14.04572-296.19408-14.812-297.51158l3.69987-2.14643c.19418.33408,19.54541,34.05675,30.977,87.75495a382.84576,382.84576,0,0,1-15.856,213.39386C880.81537,634.866,926.049,712.3356,926.5125,713.09979Z" transform="translate(-172.68488 -107)" fill="#3f3d56"/><circle cx="726.34587" cy="27.79495" r="27.79495" fill="#3f3d56"/><circle cx="814.00686" cy="130.42244" r="27.79495" fill="#3f3d56"/><circle cx="754.14082" cy="198.84077" r="27.79495" fill="#6c63ff"/><circle cx="826.83529" cy="256.56874" r="27.79495" fill="#6c63ff"/><circle cx="732.76009" cy="346.3678" r="27.79495" fill="#3f3d56"/><path d="M939.65413,716.35075s-27.79494-68.41833,55.5899-119.73208Z" transform="translate(-172.68488 -107)" fill="#3f3d56"/><path d="M905.47066,715.10967s-12.64973-72.75721-110.55654-72.13362Z" transform="translate(-172.68488 -107)" fill="#3f3d56"/><circle cx="136.5" cy="387.5" r="41" fill="#a0616a"/><path d="M316.61319,443.50393c-7.84095-1.92541-16.27241-2.24666-23.8686.48934-7.82332,2.81781-14.26307,8.70853-18.99966,15.54294s-7.91769,14.6126-10.76728,22.4244a78.44181,78.44181,0,0,0-3.84443,13.13982,44.99218,44.99218,0,0,0,8.16687,34.32543c-1.20064-3.16564,1.82223-6.61689,5.13777-7.30246s6.693.474,9.96284,1.35215a61.559,61.559,0,0,0,17.68,2.07852c2.28419-.06472,4.67886-.2947,6.57841-1.5649,5.95161-3.97978,2.80132-14.26366,7.75373-19.43414,1.76742-1.84524,4.28949-2.70812,6.56745-3.86552,8.20045-4.16651,13.60409-12.57547,16.15533-21.41283,1.599-5.53888,5.56732-21.4494,2.1851-26.598C336.26419,448.026,321.80448,444.77869,316.61319,443.50393Z" transform="translate(-172.68488 -107)" fill="#2f2e41"/><path d="M420.75137,449.29118a24.39569,24.39569,0,0,0-18.67736,1.04125c-10.69477,5.01556-29.6079,17.167-28.88913,40.16757,1,32,12,53,12,53l-2,65-10,118s-49,40-30,50,51-48,51-48l25-67,9-65s15-68,0-98C428.18488,498.5,445.51332,458.32953,420.75137,449.29118Z" transform="translate(-172.68488 -107)" fill="#a0616a"/><path d="M540.18488,552.5s-17,72,43,92,139,54,139,54l10,17,23-31-4-24-39-11s-73-56-105-60l4-37Z" transform="translate(-172.68488 -107)" fill="#a0616a"/><path d="M699.992,494.717,721.18488,529.5s70,40,88,67,61,56,61,56l-19,45s-54-63-77-69-125-88-125-88Z" transform="translate(-172.68488 -107)" fill="#a0616a"/><path d="M511.25319,372.81879c-7.80567,1.22636-15.61761,2.4539-23.32959,4.17376-17.91807,3.99593-35.14787,10.61624-51.96679,17.97482a608.4047,608.4047,0,0,0-68.44723,35.267,132.25462,132.25462,0,0,0-16.96877,11.52223c-3.55981,2.96479-6.911,6.36338-8.82048,10.58428a33.40584,33.40584,0,0,0-2.32964,9.22124c-1.55047,11.11993-1.47826,22.59448,1.53455,33.41019s9.12353,20.95468,18.11834,27.67407a29.8797,29.8797,0,0,0,7.84122,4.34736,45.67529,45.67529,0,0,0,11.67415,1.95731c12.61732,1.0033,25.79808,1.92307,37.47387-2.96358,7.41963-3.10532,13.70126-8.3547,20.52849-12.6071,28.91252-18.00838,65.6459-17.24158,97.26049-29.92044,3.07335-1.23254,6.20327-2.67,8.353-5.18856a22.57757,22.57757,0,0,0,3.61973-7.18764q5.99545-16.59368,11.99092-33.18736c2.53264-7.00959,5.06682-14.02386,7.19258-21.16737,2.2235-7.47193,3.99546-15.0694,5.766-22.66143,1.52358-6.53317,3.0928-13.37591,2.507-20.1298-.864-9.96222-3.16626-10.36663-12.316-8.92459Q536.096,368.92794,511.25319,372.81879Z" transform="translate(-172.68488 -107)" fill="#6c63ff"/><path d="M390.75137,449.29118a24.39569,24.39569,0,0,0-18.67736,1.04125c-10.69477,5.01556-29.6079,17.167-28.88913,40.16757,1,32,12,53,12,53l-2,65-10,118s-49,40-30,50,51-48,51-48l25-67,9-65s15-68,0-98C398.18488,498.5,415.51332,458.32953,390.75137,449.29118Z" transform="translate(-172.68488 -107)" fill="#a0616a"/><path d="M564.34468,364.5437s88.8402-8.0437,84.8402,46.9563-34,150-34,150l-77,4s-10-65,4-87Z" transform="translate(-172.68488 -107)" fill="#2f2e41"/><path d="M626.18488,382.5s22,14,34,42,48,83,48,83l-59,42-44-45Z" transform="translate(-172.68488 -107)" fill="#2f2e41"/><path d="M732.18488,694.5s-19,7-6,21a126.6093,126.6093,0,0,1,20,29s-4,26,13,25,24-36,20-48-9-69-9-69-35,3-34,8S742.18488,690.5,732.18488,694.5Z" transform="translate(-172.68488 -107)" fill="#2f2e41"/><path d="M851.18488,674.5s-18,9-13,17,21,31,21,31-6,30,11,33,27-36,24-44-3-39-3-39,1-31-9-31-27,4-27,4S859.18488,673.5,851.18488,674.5Z" transform="translate(-172.68488 -107)" fill="#2f2e41"/><rect x="96" y="617.87081" width="29" height="50.61244" fill="#6c63ff"/></svg>

After

Width:  |  Height:  |  Size: 4.6 KiB

View File

@ -0,0 +1,697 @@
---
title: "A beautiful GitOps day II - Load Balancer & Ingress with SSL"
date: 2023-08-20
description: "Follow this opinionated guide as starter-kit for your own Kubernetes platform..."
tags: ["kubernetes", "traefik", "cert-manager"]
---
{{< lead >}}
Use GitOps workflow for building a production grade on-premise Kubernetes cluster on cheap VPS provider, with complete CI/CD 🎉
{{< /lead >}}
This is the **Part II** of more global topic tutorial. [Back to guide summary]({{< ref "/posts/10-a-beautiful-gitops-day" >}}) for intro.
## Cluster maintenance
### 2nd Terraform project
For this part let's create a new Terraform project (`demo-kube-k3s` here) that will be dedicated to Kubernetes infrastructure provisioning. Start from a new empty folder and create the following `main.tf` file then `terraform init`.
{{< highlight host="demo-kube-k3s" file="main.tf" >}}
```tf
terraform {
backend "local" {}
}
```
{{</ highlight >}}
Let's begin with automatic upgrades management.
### CRD prerequisites
Before we go next steps, we need to install critical monitoring CRDs that will be used by many components for monitoring, a subject that will be covered later.
```sh
kubectl apply --server-side -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/v0.67.1/example/prometheus-operator-crd/monitoring.coreos.com_servicemonitors.yaml
kubectl apply --server-side -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/v0.67.1/example/prometheus-operator-crd/monitoring.coreos.com_podmonitors.yaml
```
### Automatic reboot
When OS kernel is upgraded, the system needs to be rebooted to apply it. This is a critical operation for a Kubernetes cluster as can cause downtime. To avoid this, we'll use [kured](https://github.com/kubereboot/kured) that will take care of cordon & drains before rebooting nodes one by one.
{{< highlight host="demo-kube-k3s" file="kured.tf" >}}
```tf
resource "helm_release" "kubereboot" {
chart = "kured"
version = "5.1.0"
repository = "https://kubereboot.github.io/charts"
name = "kured"
namespace = "kube-system"
set {
name = "configuration.period"
value = "1m"
}
set {
name = "tolerations[0].effect"
value = "NoSchedule"
}
set {
name = "tolerations[0].operator"
value = "Exists"
}
set {
name = "metrics.create"
value = "true"
}
}
```
{{</ highlight >}}
For all `helm_release` resource you'll see from this guide, you may check the last chart version available. Example for `kured`:
```sh
helm repo add kured https://kubereboot.github.io/charts
helm search repo kured
```
After applying this with `terraform apply`, ensure that the `daemonset` is running on all nodes with `kg ds -n kube-system`.
`tolerations` will ensure all tainted nodes will receive the daemonset.
`metrics.create` will create a `ServiceMonitor` custom k8s resource that allow Prometheus to scrape all kured metrics. You can check it with `kg smon -n kube-system -o yaml`. The monitoring subject will be covered in a future post, but let's be monitoring ready from the start.
You can test it by exec `touch /var/run/reboot-required` to a specific node to reboot. Use `klo ds/kured -n kube-system` to check the kured logs. After about 1 minute, a reboot should be triggered after node draining.
### Automatic K3s upgrade
Now let's take care of K3s upgrade. We'll use [system-upgrade-controller](https://github.com/rancher/system-upgrade-controller). It will take care of upgrading K3s binary automatically on all nodes one by one.
However, as Terraform doesn't offer a proper way to apply a remote multi-document Yaml file natively, the simplest way is to sacrifice some GitOps by installing system-upgrade-controller manually.
{{< alert >}}
Don't push yourself get fully 100% GitOps everywhere if the remedy give far more code complexity. Sometimes a simple documentation of manual steps in README is better.
{{</ alert >}}
```sh
# installing system-upgrade-controller
ka https://github.com/rancher/system-upgrade-controller/releases/latest/download/system-upgrade-controller.yaml
# checking system-upgrade-controller deployment status
kg deploy -n system-upgrade
```
Next apply the following upgrade plans for servers and agents.
{{< highlight host="demo-kube-k3s" file="plans.tf" >}}
```tf
resource "kubernetes_manifest" "server_plan" {
manifest = {
apiVersion = "upgrade.cattle.io/v1"
kind = "Plan"
metadata = {
name = "server-plan"
namespace = "system-upgrade"
}
spec = {
concurrency = 1
cordon = true
nodeSelector = {
matchExpressions = [
{
key = "node-role.kubernetes.io/control-plane"
operator = "Exists"
}
]
}
tolerations = [
{
operator = "Exists"
effect = "NoSchedule"
}
]
serviceAccountName = "system-upgrade"
upgrade = {
image = "rancher/k3s-upgrade"
}
channel = "https://update.k3s.io/v1-release/channels/stable"
}
}
}
resource "kubernetes_manifest" "agent_plan" {
manifest = {
apiVersion = "upgrade.cattle.io/v1"
kind = "Plan"
metadata = {
name = "agent-plan"
namespace = "system-upgrade"
}
spec = {
concurrency = 1
cordon = true
nodeSelector = {
matchExpressions = [
{
key = "node-role.kubernetes.io/control-plane"
operator = "DoesNotExist"
}
]
}
tolerations = [
{
operator = "Exists"
effect = "NoSchedule"
}
]
prepare = {
args = ["prepare", "server-plan"]
image = "rancher/k3s-upgrade"
}
serviceAccountName = "system-upgrade"
upgrade = {
image = "rancher/k3s-upgrade"
}
channel = "https://update.k3s.io/v1-release/channels/stable"
}
}
}
```
{{</ highlight >}}
{{< alert >}}
You may set the same channel as previous step for hcloud cluster creation.
{{</ alert >}}
## External access
Now it's time to expose our cluster to the outside world. We'll use Traefik as ingress controller and cert-manager for SSL certificates management.
### Traefik
Apply following file:
{{< highlight host="demo-kube-k3s" file="traefik.tf" >}}
```tf
locals {
certificate_secret_name = "tls-cert"
}
resource "kubernetes_namespace_v1" "traefik" {
metadata {
name = "traefik"
}
}
resource "helm_release" "traefik" {
chart = "traefik"
version = "24.0.0"
repository = "https://traefik.github.io/charts"
name = "traefik"
namespace = kubernetes_namespace_v1.traefik.metadata[0].name
set {
name = "ports.web.redirectTo"
value = "websecure"
}
set {
name = "ports.websecure.forwardedHeaders.trustedIPs"
value = "{127.0.0.1/32,10.0.0.0/8}"
}
set {
name = "ports.websecure.proxyProtocol.trustedIPs"
value = "{127.0.0.1/32,10.0.0.0/8}"
}
set {
name = "logs.access.enabled"
value = "true"
}
set {
name = "providers.kubernetesCRD.allowCrossNamespace"
value = "true"
}
set {
name = "tlsStore.default.defaultCertificate.secretName"
value = local.certificate_secret_name
}
set {
name = "metrics.prometheus.serviceMonitor.namespaceSelector"
value = ""
}
}
```
{{</ highlight >}}
`ports.web.redirectTo` will redirect all HTTP traffic to HTTPS.
`forwardedHeaders` and `proxyProtocol` will allow Traefik to get real IP of clients.
`providers.kubernetesCRD.allowCrossNamespace` will allow Traefik to read ingress from all namespaces.
`tlsStore.default.defaultCertificate.secretName` will be used to store the default certificate that will be used for all ingress that don't have a specific certificate.
`metrics.prometheus.serviceMonitor.namespaceSelector` will allow Prometheus to scrape Traefik metrics from all namespaces.
By default, it will deploy 1 single replica of Traefik. But don't worry, when upgrading, the default update strategy is `RollingUpdate`, so it will be upgraded without any downtime. Increment `deployment.replicas` if you need more performance.
### Load balancer
Traefik should be running with `kg deploy -n traefik`. Check now with `kg svc -n traefik` if the traefik `LoadBalancer` service is available in all nodes.
```txt
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
traefik LoadBalancer 10.43.134.216 10.0.0.2,10.0.1.1,10.0.1.2,10.0.1.3 80:32180/TCP,443:30273/TCP 21m
```
External IP are privates IPs of all nodes. In order to access them, we need to put a load balancer in front of workers. It's time to get back to our 1st Terraform project.
{{< highlight host="demo-kube-hcloud" file="kube.tf" >}}
```tf
//...
module "hcloud_kube" {
//...
agent_nodepools = [
{
name = "worker"
// ...
lb_type = "lb11"
}
]
}
resource "hcloud_load_balancer_service" "http_service" {
load_balancer_id = module.hcloud_kube.lbs.worker.id
protocol = "tcp"
listen_port = 80
destination_port = 80
}
resource "hcloud_load_balancer_service" "https_service" {
load_balancer_id = module.hcloud_kube.lbs.worker.id
protocol = "tcp"
listen_port = 443
destination_port = 443
proxyprotocol = true
}
```
{{</ highlight >}}
Use `hcloud load-balancer-type list` to get the list of available load balancer types.
{{< alert >}}
Don't forget to add `hcloud_load_balancer_service` resource for each service (aka port) you want to serve.
We use `tcp` protocol as Traefik will handle SSL termination. Set `proxyprotocol` to true to allow Traefik to get real IP of clients.
{{</ alert >}}
One applied, use `hcloud load-balancer list` to get the public IP of the load balancer and try to curl it. You should be properly redirected to HTTPS and have certificate error. It's time to get SSL certificates.
### cert-manager
First we need to install cert-manager for proper distributed SSL management. First install CRDs manually.
```sh
ka https://github.com/cert-manager/cert-manager/releases/download/v1.12.3/cert-manager.crds.yaml
```
Then apply the following Terraform code.
{{< highlight host="demo-kube-k3s" file="cert-manager.tf" >}}
```tf
resource "kubernetes_namespace_v1" "cert_manager" {
metadata {
name = "cert-manager"
}
}
resource "helm_release" "cert_manager" {
chart = "cert-manager"
version = "v1.12.3"
repository = "https://charts.jetstack.io"
name = "cert-manager"
namespace = kubernetes_namespace_v1.cert_manager.metadata[0].name
set {
name = "prometheus.servicemonitor.enabled"
value = true
}
}
```
{{</ highlight >}}
{{< alert >}}
You can use `installCRDs` option to install CRDs automatically. But uninstall cert-manager will delete all associated resources including generated certificates. That's why I generally prefer to install CRDs manually.
As always we enable `prometheus.servicemonitor.enabled` to allow Prometheus to scrape cert-manager metrics.
{{</ alert >}}
All should be ok with `kg deploy -n cert-manager`.
#### Wildcard certificate via DNS01
We'll use [DNS01 challenge](https://cert-manager.io/docs/configuration/acme/dns01/) to get wildcard certificate for our domain. This is the most convenient way to get a certificate for a domain without having to expose the cluster.
{{< alert >}}
You may use a DNS provider supported by cert-manager. Check the [list of supported providers](https://cert-manager.io/docs/configuration/acme/dns01/#supported-dns01-providers). As cert-manager is highly extensible, you can easily create your own provider with some efforts. Check [available contrib webhooks](https://cert-manager.io/docs/configuration/acme/dns01/#webhook).
{{</ alert >}}
First prepare variables and set them accordingly:
{{< highlight host="demo-kube-k3s" file="main.tf" >}}
```tf
variable "domain" {
type = string
}
variable "acme_email" {
type = string
}
variable "dns_api_token" {
type = string
sensitive = true
}
```
{{</ highlight >}}
{{< highlight host="demo-kube-k3s" file="terraform.tfvars" >}}
```tf
acme_email = "me@kube.rocks"
domain = "kube.rocks"
dns_api_token = "xxx"
```
{{</ highlight >}}
Then we need to create a default `Certificate` k8s resource associated to a valid `ClusterIssuer` resource that will manage its generation. Apply the following Terraform code for issuing the new wildcard certificate for your domain.
{{< highlight host="demo-kube-k3s" file="certificates.tf" >}}
```tf
resource "kubernetes_secret_v1" "cloudflare_api_token" {
metadata {
name = "cloudflare-api-token-secret"
namespace = kubernetes_namespace_v1.cert_manager.metadata[0].name
}
data = {
"api-token" = var.dns_api_token
}
}
resource "kubernetes_manifest" "letsencrypt_production_issuer" {
manifest = {
apiVersion = "cert-manager.io/v1"
kind = "ClusterIssuer"
metadata = {
name = "letsencrypt-production"
}
spec = {
acme = {
email = var.acme_email
privateKeySecretRef = {
name = "letsencrypt-production"
}
server = "https://acme-v02.api.letsencrypt.org/directory"
solvers = [
{
dns01 = {
cloudflare = {
apiTokenSecretRef = {
name = kubernetes_secret_v1.cloudflare_api_token.metadata[0].name
key = "api-token"
}
}
}
}
]
}
}
}
}
resource "kubernetes_manifest" "tls_certificate" {
manifest = {
apiVersion = "cert-manager.io/v1"
kind = "Certificate"
metadata = {
name = "default-certificate"
namespace = kubernetes_namespace_v1.traefik.metadata[0].name
}
spec = {
commonName = var.domain
dnsNames = [
var.domain,
"*.${var.domain}",
]
issuerRef = {
kind = kubernetes_manifest.letsencrypt_production_issuer.manifest.kind
name = kubernetes_manifest.letsencrypt_production_issuer.manifest.metadata.name
}
secretName = local.certificate_secret_name
privateKey = {
rotationPolicy = "Always"
}
}
}
}
```
{{</ highlight >}}
{{< alert >}}
You can set `acme.privateKeySecretRef.name` to **letsencrypt-staging** for testing purpose and avoid wasting LE quota limit.
Set `privateKey.rotationPolicy` to `Always` to ensure that the certificate will be [renewed automatically](https://cert-manager.io/docs/usage/certificate/) 30 days before expires without downtime.
{{</ alert >}}
In the meantime, go to your DNS provider and add a new `*.kube.rocks` entry pointing to the load balancer IP.
Try `test.kube.rocks` to check certificate validity. If not valid, check the certificate status with `kg cert -n traefik` and get challenge status `kg challenges -n traefik`. The certificate must be in `Ready` state after few minutes.
### Access to Traefik dashboard
Traefik dashboard is useful for checking all active ingress and their status. Let's expose it with a simple ingress and protecting with IP whitelist and basic auth, which can be done with middlewares.
First prepare variables and set them accordingly:
{{< highlight host="demo-kube-k3s" file="main.tf" >}}
```tf
variable "http_username" {
type = string
}
variable "http_password" {
type = string
sensitive = true
}
variable "whitelisted_ips" {
type = list(string)
sensitive = true
}
resource "null_resource" "encrypted_admin_password" {
triggers = {
orig = var.http_password
pw = bcrypt(var.http_password)
}
lifecycle {
ignore_changes = [triggers["pw"]]
}
}
```
{{</ highlight >}}
{{< highlight host="demo-kube-k3s" file="terraform.tfvars" >}}
```tf
http_username = "admin"
http_password = "xxx"
whitelisted_ips = ["82.82.82.82"]
```
{{</ highlight >}}
{{< alert >}}
Note on `encrypted_admin_password`, we generate a bcrypt hash of the password compatible for HTTP basic auth and keep the original to avoid to regenerate it each time.
{{</ alert >}}
Then apply the following Terraform code:
{{< highlight host="demo-kube-k3s" file="traefik.tf" >}}
```tf
resource "helm_release" "traefik" {
//...
set {
name = "ingressRoute.dashboard.entryPoints"
value = "{websecure}"
}
set {
name = "ingressRoute.dashboard.matchRule"
value = "Host(`traefik.${var.domain}`)"
}
set {
name = "ingressRoute.dashboard.middlewares[0].name"
value = "middleware-ip"
}
set {
name = "ingressRoute.dashboard.middlewares[1].name"
value = "middleware-auth"
}
}
resource "kubernetes_secret_v1" "traefik_auth_secret" {
metadata {
name = "auth-secret"
namespace = kubernetes_namespace_v1.traefik.metadata[0].name
}
data = {
"users" = "${var.http_username}:${null_resource.encrypted_admin_password.triggers.pw}"
}
}
resource "kubernetes_manifest" "traefik_middleware_auth" {
manifest = {
apiVersion = "traefik.io/v1alpha1"
kind = "Middleware"
metadata = {
name = "middleware-auth"
namespace = kubernetes_namespace_v1.traefik.metadata[0].name
}
spec = {
basicAuth = {
secret = kubernetes_secret_v1.traefik_auth_secret.metadata[0].name
}
}
}
}
resource "kubernetes_manifest" "traefik_middleware_ip" {
manifest = {
apiVersion = "traefik.io/v1alpha1"
kind = "Middleware"
metadata = {
name = "middleware-ip"
namespace = kubernetes_namespace_v1.traefik.metadata[0].name
}
spec = {
ipWhiteList = {
sourceRange = var.whitelisted_ips
}
}
}
}
```
{{</ highlight >}}
Now go to `https://traefik.kube.rocks` and you should be asked for credentials. After login, you should see the dashboard.
[![Traefik Dashboard](traefik-dashboard.png)](traefik-dashboard.png)
In the meantime, it allows us to validate that `auth` and `ip` middelwares are working properly.
#### Forbidden troubleshooting
If you get `Forbidden`, it's because `middleware-ip` can't get your real IP, try to disable it firstly to confirm you have dashboard access with credentials. Then try to re-enable it by changing the [IP strategy](https://doc.traefik.io/traefik/middlewares/http/ipwhitelist/#ipstrategy). For example, if you're behind another reverse proxy like Cloudflare, increment `depth` to 1:
{{< highlight host="demo-kube-k3s" file="traefik.tf" >}}
```tf
//...
resource "kubernetes_manifest" "traefik_middleware_ip" {
manifest = {
//...
spec = {
ipWhiteList = {
sourceRange = var.whitelisted_ips
ipStrategy = {
depth = 1
}
}
}
}
}
```
{{</ highlight >}}
In the case of Cloudflare, you may need also to trust the [Cloudflare IP ranges](https://www.cloudflare.com/ips-v4) in addition to Hetzner load balancer. Just set `ports.websecure.forwardedHeaders.trustedIPs` and `ports.websecure.proxyProtocol.trustedIPs` accordingly.
{{< highlight host="demo-kube-k3s" file="main.tf" >}}
```tf
variable "cloudflare_ips" {
type = list(string)
sensitive = true
}
```
{{</ highlight >}}
{{< highlight host="demo-kube-k3s" file="traefik.tf" >}}
```tf
locals {
trusted_ips = concat(["127.0.0.1/32", "10.0.0.0/8"], var.cloudflare_ips)
}
resource "helm_release" "traefik" {
//...
set {
name = "ports.websecure.forwardedHeaders.trustedIPs"
value = "{${join(",", local.trusted_ips)}}"
}
set {
name = "ports.websecure.proxyProtocol.trustedIPs"
value = "{${join(",", local.trusted_ips)}}"
}
}
```
{{</ highlight >}}
Or for testing purpose set `ports.websecure.forwardedHeaders.insecure` and `ports.websecure.proxyProtocol.insecure` to true.
## 2nd check ✅
Our cluster is now perfectly securely accessible from outside with automatic routing. The next important part is now to have a [resilient storage and database]({{< ref "/posts/13-a-beautiful-gitops-day-3" >}}).

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 16 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 521 KiB

View File

@ -0,0 +1,815 @@
---
title: "A beautiful GitOps day III - HA storage & DB"
date: 2023-08-21
description: "Follow this opinionated guide as starter-kit for your own Kubernetes platform..."
tags: ["kubernetes", "longhorn", "bitnami", "postgresql", "redis"]
---
{{< lead >}}
Use GitOps workflow for building a production grade on-premise Kubernetes cluster on cheap VPS provider, with complete CI/CD 🎉
{{< /lead >}}
This is the **Part III** of more global topic tutorial. [Back to guide summary]({{< ref "/posts/10-a-beautiful-gitops-day" >}}) for intro.
## Resilient storage with Longhorn
In Kubernetes world, the most difficult while essential part is probably the storage. It's not easy to find a solution that combine resiliency, scalability and performance.
{{< alert >}}
If you are not familiar with Kubernetes storage, you must at least be aware of pros and cons of `RWO` and `RWX` volumes when creating `PVC`.
In general `RWO` is more performant, but only one pod can mount it, while `RWX` is slower, but allow sharing between multiple pods.
`RWO` is a single node volume, and `RWX` is a shared volume between multiple nodes.
{{</ alert >}}
`K3s` comes with a built-in `local-path` provisioner, which is the most performant `RWO` solution by directly using local NVMe SSD. But it's not resilient neither scalable. I think it's a good solution for what you consider as not critical data.
A dedicated NFS server is a good `RWX` solution, by using [this provisioner](https://github.com/kubernetes-sigs/nfs-subdir-external-provisioner). It allows scalability and resiliency with [GlusterFS](https://www.gluster.org/). But it stays a single point of failure in case of network problems, and give of course low IOPS. It's also a separate server to maintain.
For Hetzner, the easiest `RWO` solution is to use the [official CSI](https://github.com/hetznercloud/csi-driver) for automatic block volumes mounting. It's far more performant than NFS (but still less than local SSD), but there is no resiliency neither scalability. It's really easy to go with and very resource efficient for the cluster. Multiple pods can [reference same volume](https://github.com/hetznercloud/csi-driver/issues/146) which allow reusability without wasting 10 GB each time.
As a more advanced solution storage, [Longhorn](https://longhorn.io/) seems to get some traction by combining most requirements with nice UI, with the price of high resource usage inside cluster. Moreover, it offers integrated backup solution with snapshots and remote S3, which avoid us to have to manage a dedicated backup solution like [velero](https://velero.io/) and save us from adding some backup specific annotations everywhere.
### Storage node pool
When it comes storage management, it's generally recommended having a separate node pool for it for dedicated scalability.
{{< mermaid >}}
flowchart TB
subgraph worker-01
app-01([My App replica 1])
end
subgraph worker-02
app-02([My App replica 2])
end
subgraph worker-03
app-03([My App replica 3])
end
overlay(Overlay network)
worker-01 --> overlay
worker-02 --> overlay
worker-03 --> overlay
overlay --> storage-01
overlay --> storage-02
subgraph storage-01
longhorn-01[(Longhorn<br>volume)]
end
subgraph storage-02
longhorn-02[(Longhorn<br>volume)]
end
streaming(Data replication)
storage-01 --> streaming
storage-02 --> streaming
{{</ mermaid >}}
Let's get back to our 1st Hcloud Terraform Project, and add a new node pool for storage:
{{< highlight host="demo-kube-hcloud" file="kube.tf" >}}
```tf
module "hcloud_kube" {
//...
agent_nodepools = [
//...
{
name = "storage"
server_type = "cx21"
location = "nbg1"
count = 2
private_interface = "ens10"
labels = [
"node.kubernetes.io/server-usage=storage"
]
taints = [
"node-role.kubernetes.io/storage:NoSchedule"
]
}
]
}
```
{{< /highlight >}}
Be sure to have labels and taints correctly set, as we'll use them later for Longhorn installation. This node pool will be dedicated for storage, so the tainted label will prevent any other pod workload to be scheduled on it.
After `terraform apply`, check that new storage nodes are ready with `kgno`. Now we'll also apply a configurable dedicated block volume on each node for more flexible space management.
{{< highlight host="demo-kube-hcloud" file="kube.tf" >}}
```tf
module "hcloud_kube" {
//...
agent_nodepools = [
//...
{
name = "storage"
//...
volume_size = 10
}
]
}
```
{{< /highlight >}}
SSH to both storage nodes to check if a 20GB volume is correctly mounted by `df -h` command. It should be like:
```txt
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 38G 4,2G 32G 12% /
...
/dev/sdb 20G 24K 19,5G 1% /mnt/HC_Volume_XXXXXXXX
```
The volume is of course automatically mounted on each node reboot, it's done via `/etc/fstab`. Retain `/mnt/HC_Volume_XXXXXXXX` path on both storage as we'll use them later for Longhorn configuration.
{{< alert >}}
Note as if you set volume in same time as node pool creation, Hetzner doesn't seem to automatically mount the volume. So it's preferable to create the node pool first, then add the volume as soon as the node in ready state. You can always detach / re-attach volumes manually through UI, which will force a proper remount.
{{</ alert >}}
### Longhorn variables
Let's add s3 related variables in order to preconfigure Longhorn backup:
{{< highlight host="demo-kube-k3s" file="main.tf" >}}
```tf
variable "s3_endpoint" {
type = string
}
variable "s3_region" {
type = string
}
variable "s3_bucket" {
type = string
}
variable "s3_access_key" {
type = string
sensitive = true
}
variable "s3_secret_key" {
type = string
sensitive = true
}
```
{{< /highlight >}}
{{< highlight host="demo-kube-k3s" file="terraform.tf.vars" >}}
```tf
s3_endpoint = "s3.fr-par.scw.cloud"
s3_region = "fr-par"
s3_bucket = "mykuberocks"
s3_access_key = "xxx"
s3_secret_key = "xxx"
```
{{< /highlight >}}
### Longhorn installation
Return to the 2nd Kubernetes terraform project, and add Longhorn installation:
{{< highlight host="demo-kube-k3s" file="longhorn.tf" >}}
```tf
resource "kubernetes_namespace_v1" "longhorn" {
metadata {
name = "longhorn-system"
}
}
resource "kubernetes_secret_v1" "longhorn_backup_credential" {
metadata {
name = "longhorn-backup-credential"
namespace = kubernetes_namespace_v1.longhorn.metadata[0].name
}
data = {
AWS_ENDPOINTS = "https://${var.s3_endpoint}"
AWS_ACCESS_KEY_ID = var.s3_access_key
AWS_SECRET_ACCESS_KEY = var.s3_secret_key
AWS_REGION = var.s3_region
}
}
resource "helm_release" "longhorn" {
chart = "longhorn"
version = "1.5.1"
repository = "https://charts.longhorn.io"
name = "longhorn"
namespace = kubernetes_namespace_v1.longhorn.metadata[0].name
set {
name = "persistence.defaultClass"
value = "false"
}
set {
name = "persistence.defaultClassReplicaCount"
value = "2"
}
set {
name = "defaultSettings.defaultReplicaCount"
value = "2"
}
set {
name = "defaultSettings.backupTarget"
value = "s3://${var.s3_bucket}@${var.s3_region}/"
}
set {
name = "defaultSettings.backupTargetCredentialSecret"
value = kubernetes_secret_v1.longhorn_backup_credential.metadata[0].name
}
set {
name = "defaultSettings.taintToleration"
value = "node-role.kubernetes.io/storage:NoSchedule"
}
set {
name = "longhornManager.tolerations[0].key"
value = "node-role.kubernetes.io/storage"
}
set {
name = "longhornManager.tolerations[0].effect"
value = "NoSchedule"
}
}
```
{{< /highlight >}}
{{< alert >}}
Set both `persistence.defaultClassReplicaCount` (used for Kubernetes configuration in longhorn storage class) and `defaultSettings.defaultReplicaCount` (for volumes created from the UI) to 2 as we have 2 storage nodes.
The toleration is required to allow Longhorn pods (managers and drivers) to be scheduled on storage nodes in addition to workers.
Note as we need to have longhorn deployed on workers too, otherwise pods scheduled on these nodes can't be attached to longhorn volumes.
{{</ alert >}}
Use `kgpo -n longhorn-system -o wide` to check that Longhorn pods are correctly running on storage nodes as well as worker nodes. You should have `instance-manager` deployed on each node.
### Monitoring
Longhorn Helm doesn't include Prometheus integration yet, in this case all we have to do is to deploy a `ServiceMonitor` which allow metrics scraping to Longhorn pods.
{{< highlight host="demo-kube-k3s" file="longhorn.tf" >}}
```tf
resource "kubernetes_manifest" "longhorn_service_monitor" {
manifest = {
apiVersion = "monitoring.coreos.com/v1"
kind = "ServiceMonitor"
metadata = {
name = "metrics"
namespace = kubernetes_namespace_v1.longhorn.metadata[0].name
}
spec = {
endpoints = [
{
port = "manager"
}
]
selector = {
matchLabels = {
app = "longhorn-manager"
}
}
}
}
}
```
{{< /highlight >}}
Monitoring will have dedicated post later.
### Ingress
Now we only have to expose Longhorn UI. We'll use `IngressRoute` provided by Traefik.
{{< highlight host="demo-kube-k3s" file="longhorn.tf" >}}
```tf
resource "kubernetes_manifest" "longhorn_ingress" {
manifest = {
apiVersion = "traefik.io/v1alpha1"
kind = "IngressRoute"
metadata = {
name = "longhorn"
namespace = kubernetes_namespace_v1.longhorn.metadata[0].name
}
spec = {
entryPoints = ["websecure"]
routes = [
{
match = "Host(`longhorn.${var.domain}`)"
kind = "Rule"
middlewares = [
{
namespace = "traefik"
name = "middleware-ip"
},
{
namespace = "traefik"
name = "middleware-auth"
}
]
services = [
{
name = "longhorn-frontend"
port = "http"
}
]
}
]
}
}
}
```
{{< /highlight >}}
{{< alert >}}
It's vital that you have at least IP and AUTH middlewares with a strong password for Longhorn UI access, as its concern the most critical part of cluster.
Of course, you can skip this ingress and directly use `kpf svc/longhorn-frontend -n longhorn-system 8000:80` to access Longhorn UI securely.
{{</ alert >}}
### Nodes and volumes configuration
Longhorn is now installed and accessible, but we still have to configure it. Let's disable volume scheduling on worker nodes, as we want to use only storage nodes for it. All can be done via Longhorn UI but let's do CLI way.
```sh
k patch nodes.longhorn.io kube-worker-01 kube-worker-02 kube-worker-03 -n longhorn-system --type=merge --patch '{"spec": {"allowScheduling": false}}'
```
By default, Longhorn use local disk for storage, which is great for high IOPS critical workloads as databases, but we want also use our expandable dedicated block volume as default for larger dataset.
Type this commands for both storage nodes or use Longhorn UI from **Node** tab:
```sh
# get the default-disk-xxx identifier
kg nodes.longhorn.io okami-storage-01 -n longhorn-system -o yaml
# patch main default-disk-xxx as fast storage
k patch nodes.longhorn.io kube-storage-0x -n longhorn-system --type=merge --patch '{"spec": {"disks": {"default-disk-xxx": {"tags": ["fast"]}}}}'
# add a new schedulable disk by adding HC_Volume_XXXXXXXX path
k patch nodes.longhorn.io kube-storage-0x -n longhorn-system --type=merge --patch '{"spec": {"disks": {"disk-mnt": {"allowScheduling": true, "evictionRequested": false, "path": "/mnt/HC_Volume_XXXXXXXX/", "storageReserved": 0}}}}'
```
Now all that's left is to create a dedicated storage class for fast local volumes. We'll use it for IOPS critical statefulset workloads like PostgreSQL and Redis. Let's apply next `StorageClass` configuration and check it with `kg sc`:
{{< highlight host="demo-kube-k3s" file="longhorn.tf" >}}
```tf
resource "kubernetes_storage_class_v1" "longhorn_fast" {
metadata {
name = "longhorn-fast"
}
storage_provisioner = "driver.longhorn.io"
allow_volume_expansion = true
reclaim_policy = "Delete"
volume_binding_mode = "Immediate"
parameters = {
numberOfReplicas = "1"
staleReplicaTimeout = "30"
fromBackup = ""
fsType = "ext4"
diskSelector = "fast"
}
}
```
{{< /highlight >}}
Longhorn is now ready for volumes creation on both block and fast local disks.
{{< alert >}}
If you need automatic encrypted volumes, which highly recommended for critical data, add `encrypted: "true"` below `parameters` section. You'll need to [set up a proper encryption](https://longhorn.io/docs/latest/advanced-resources/security/volume-encryption/) passphrase inside k8s `Secret`. In the meantime, backups will be encrypted as well, so you haven't to worry about it.
{{< /alert >}}
[![Longhorn UI](longhorn-ui.png)](longhorn-ui.png)
## PostgreSQL with replication
Now it's time to set up some critical statefulset persistence workloads. Let's begin with a PostgreSQL cluster with replication.
### PostgreSQL variables
{{< highlight host="demo-kube-k3s" file="main.tf" >}}
```tf
variable "pgsql_user" {
type = string
}
variable "pgsql_admin_password" {
type = string
sensitive = true
}
variable "pgsql_password" {
type = string
sensitive = true
}
variable "pgsql_replication_password" {
type = string
sensitive = true
}
```
{{< /highlight >}}
{{< highlight host="demo-kube-k3s" file="terraform.tf.vars" >}}
```tf
pgsql_user = "kube"
pgsql_password = "xxx"
pgsql_admin_password = "xxx"
pgsql_replication_password = "xxx"
```
{{< /highlight >}}
### PostgreSQL installation
Before continue it's important to identify which storage node will serve the primary database, and which one will serve the replica by adding these labels:
```sh
k label nodes kube-storage-01 node-role.kubernetes.io/primary=true
k label nodes kube-storage-02 node-role.kubernetes.io/read=true
```
We can finally apply next Terraform configuration:
{{< highlight host="demo-kube-k3s" file="postgresql.tf" >}}
```tf
resource "kubernetes_namespace_v1" "postgres" {
metadata {
name = "postgres"
}
}
resource "kubernetes_secret_v1" "postgresql_auth" {
metadata {
name = "postgresql-auth"
namespace = kubernetes_namespace_v1.postgres.metadata[0].name
}
data = {
"postgres-password" = var.pgsql_admin_password
"password" = var.pgsql_password
"replication-password" = var.pgsql_replication_password
}
}
resource "helm_release" "postgresql" {
chart = "postgresql"
version = var.chart_postgresql_version
repository = "https://charts.bitnami.com/bitnami"
name = "postgresql"
namespace = kubernetes_namespace_v1.postgres.metadata[0].name
set {
name = "architecture"
value = "replication"
}
set {
name = "auth.username"
value = var.pgsql_user
}
set {
name = "auth.database"
value = var.pgsql_user
}
set {
name = "auth.existingSecret"
value = kubernetes_secret_v1.postgresql_auth.metadata[0].name
}
set {
name = "auth.replicationUsername"
value = "replication"
}
set {
name = "architecture"
value = "replication"
}
set {
name = "metrics.enabled"
value = "true"
}
set {
name = "metrics.serviceMonitor.enabled"
value = "true"
}
set {
name = "primary.tolerations[0].key"
value = "node-role.kubernetes.io/storage"
}
set {
name = "primary.tolerations[0].effect"
value = "NoSchedule"
}
set {
name = "primary.nodeSelector.node-role\\.kubernetes\\.io/primary"
type = "string"
value = "true"
}
set {
name = "primary.persistence.size"
value = "10Gi"
}
set {
name = "primary.persistence.storageClass"
value = "longhorn-fast"
}
set {
name = "readReplicas.tolerations[0].key"
value = "node-role.kubernetes.io/storage"
}
set {
name = "readReplicas.tolerations[0].effect"
value = "NoSchedule"
}
set {
name = "readReplicas.nodeSelector.node-role\\.kubernetes\\.io/read"
type = "string"
value = "true"
}
set {
name = "readReplicas.persistence.size"
value = "10Gi"
}
set {
name = "readReplicas.persistence.storageClass"
value = "longhorn-fast"
}
}
```
{{</ highlight >}}
{{< alert >}}
Don't forget to use fast storage by setting `primary.persistence.storageClass` and `readReplicas.persistence.storageClass` accordingly.
{{</ alert >}}
Now check that PostgreSQL pods are correctly running on storage nodes with `kgpo -n postgres -o wide`.
```txt
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
postgresql-primary-0 2/2 Running 0 151m 10.42.5.253 okami-storage-01 <none> <none>
postgresql-read-0 2/2 Running 0 152m 10.42.2.216 okami-storage-02 <none> <none>
```
And that's it, we have replicated PostgreSQL cluster ready to use ! Go to longhorn UI and be sure that 2 volumes are created on fast disk under **Volume** menu.
## Redis cluster
After PostgreSQL, set up a master/slave redis is a piece of cake. You may prefer [redis cluster](https://redis.io/docs/management/scaling/) by using [Bitnami redis cluster](https://artifacthub.io/packages/helm/bitnami/redis-cluster), but it [doesn't work](https://github.com/bitnami/charts/issues/12901) at the time of writing this guide.
### Redis variables
{{< highlight host="demo-kube-k3s" file="main.tf" >}}
```tf
variable "redis_password" {
type = string
sensitive = true
}
```
{{< /highlight >}}
{{< highlight host="demo-kube-k3s" file="terraform.tf.vars" >}}
```tf
redis_password = "xxx"
```
{{< /highlight >}}
### Redis installation
{{< highlight host="demo-kube-k3s" file="redis.tf" >}}
```tf
resource "kubernetes_namespace_v1" "redis" {
metadata {
name = "redis"
}
}
resource "kubernetes_secret_v1" "redis_auth" {
metadata {
name = "redis-auth"
namespace = kubernetes_namespace_v1.redis.metadata[0].name
}
data = {
"redis-password" = var.redis_password
}
}
resource "helm_release" "redis" {
chart = "redis"
version = "17.15.6"
repository = "https://charts.bitnami.com/bitnami"
name = "redis"
namespace = kubernetes_namespace_v1.redis.metadata[0].name
set {
name = "architecture"
value = "standalone"
}
set {
name = "auth.existingSecret"
value = kubernetes_secret_v1.redis_auth.metadata[0].name
}
set {
name = "auth.existingSecretPasswordKey"
value = "redis-password"
}
set {
name = "metrics.enabled"
value = "true"
}
set {
name = "metrics.serviceMonitor.enabled"
value = "true"
}
set {
name = "master.tolerations[0].key"
value = "node-role.kubernetes.io/storage"
}
set {
name = "master.tolerations[0].effect"
value = "NoSchedule"
}
set {
name = "master.nodeSelector.node-role\\.kubernetes\\.io/primary"
type = "string"
value = "true"
}
set {
name = "master.persistence.size"
value = "10Gi"
}
set {
name = "master.persistence.storageClass"
value = "longhorn-fast"
}
set {
name = "replica.replicaCount"
value = "1"
}
set {
name = "replica.tolerations[0].key"
value = "node-role.kubernetes.io/storage"
}
set {
name = "replica.tolerations[0].effect"
value = "NoSchedule"
}
set {
name = "replica.nodeSelector.node-role\\.kubernetes\\.io/read"
type = "string"
value = "true"
}
set {
name = "replica.persistence.size"
value = "10Gi"
}
set {
name = "replica.persistence.storageClass"
value = "longhorn-fast"
}
}
```
{{< /highlight >}}
And that's it, job done ! Always check that Redis pods are correctly running on storage nodes with `kgpo -n redis -o wide` and volumes are ready on Longhorn.
## Backups
Final essential step is to set up s3 backup for volumes. We already configured S3 backup location on [longhorn variables step](#longhorn-variables), so we only have to configure backup strategy. We can use UI for that, but don't we are GitOps ? So let's do it with Terraform.
{{< highlight host="demo-kube-k3s" file="longhorn.tf" >}}
```tf
locals {
job_backups = {
daily = {
cron = "15 0 * * *"
retain = 7
},
weekly = {
cron = "30 0 * * 1"
retain = 4
}
monthly = {
cron = "45 0 1 * *"
retain = 3
}
}
}
resource "kubernetes_manifest" "longhorn_jobs" {
for_each = local.job_backups
manifest = {
apiVersion = "longhorn.io/v1beta2"
kind = "RecurringJob"
metadata = {
name = each.key
namespace = kubernetes_namespace_v1.longhorn.metadata[0].name
}
spec = {
concurrency = 1
cron = each.value.cron
groups = ["default"]
name = each.key
retain = each.value.retain
task = "backup"
}
}
depends_on = [
helm_release.longhorn
]
}
```
{{< /highlight >}}
{{< alert >}}
`depends_on` is required to ensure that Longhorn CRDs is correctly installed before creating jobs when relaunching all terraform project from start.
{{< /alert >}}
Bam it's done ! After apply, check trough UI under **Recurring Job** menu if backup jobs are created. The `default` group is the default one, which backup all volumes. You can of course set custom groups to specific volumes, allowing very flexible backup strategies.
Thanks to GitOps, the default backup strategy described by `job_backups` is marbled and self-explanatory:
* Daily backup until **7 days**
* Weekly backup until **4 weeks**
* Monthly backup until **3 months**
Configure this variable according to your needs.
### DB dumps
If you need some regular dump of your database without requiring a dedicated Kubernetes `CronJob`, you can simply use following crontab line on control plane node:
```sh
0 */8 * * * root /usr/local/bin/k3s kubectl exec sts/postgresql-primary -n postgres -- /bin/sh -c 'PGUSER="okami" PGPASSWORD="$POSTGRES_PASSWORD" pg_dumpall -c | gzip > /bitnami/postgresql/dump_$(date "+\%H")h.sql.gz'
```
It will generate 3 daily dumps, one every 8 hours, on the same primary db volume, allowing easy `psql` restore from the same container.
## 3th check ✅
Persistence is now insured by Longhorn as main resilient storage. And we have production grade DB replicated cluster. It's finally now time to play with all of this by testing some [real world apps]({{< ref "/posts/14-a-beautiful-gitops-day-4" >}}) with a proper CD solution.

Binary file not shown.

After

Width:  |  Height:  |  Size: 182 KiB

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 6.2 KiB

View File

@ -0,0 +1,678 @@
---
title: "A beautiful GitOps day IV - CD with Flux"
date: 2023-08-22
description: "Follow this opinionated guide as starter-kit for your own Kubernetes platform..."
tags: ["kubernetes", "cd", "flux", "nocode", "n8n", "nocodb"]
---
{{< lead >}}
Use GitOps workflow for building a production grade on-premise Kubernetes cluster on cheap VPS provider, with complete CI/CD 🎉
{{< /lead >}}
This is the **Part IV** of more global topic tutorial. [Back to guide summary]({{< ref "/posts/10-a-beautiful-gitops-day" >}}) for intro.
## Flux
In GitOps world, 2 tools are leading for CD in k8s: **Flux** and **ArgoCD**. As Flux is CLI first and more lightweight, it's my personal goto. You may wonder why don't continue with actual k3s Terraform project ?
You already noted that by adding more and more Helm dependencies to terraform, the plan time is increasing, as well as the state file. So not very scalable.
It's the perfect moment to draw a clear line between **IaC** (Infrastructure as Code) and **CD** (Continuous Delivery). IaC is for infrastructure, CD is for application. So to resume our GitOps stack:
1. IaC for Hcloud cluster initialization (*the basement*): **Terraform**
2. IaC for Kubernetes configuration (*the walls*): **Helm** through **Terraform**
3. CD for any application deployments (*the furniture*): **Flux**
{{< alert >}}
You can probably eliminate with some efforts the 2nd stack by using both `Kube-Hetzner`, which take care of ingress and storage, and using Flux directly for the remaining helms like database cluster. Or maybe you can also add custom helms to `Kube-Hetzner`.
But as it's increase complexity and dependencies problem, I prefer personally to keep a clear separation between the middle part and the rest, as it's more straightforward for me. Just a matter of taste 🥮
{{< /alert >}}
### Flux bootstrap
Create a dedicated Git repository for Flux somewhere, I'm using GitHub, which with [his CLI](https://cli.github.com/) is just a matter of:
```sh
gh repo create demo-kube-flux --private --add-readme
gh repo clone demo-kube-flux
```
{{< alert >}}
Put `--add-readme` option to have a non-empty repo, otherwise Flux bootstrap will give you an error.
{{< /alert >}}
Let's back to `demo-kube-k3s` terraform project and add Flux bootstrap connected to above repository:
{{< highlight host="demo-kube-k3s" file="main.tf" >}}
```tf
terraform {
//...
required_providers {
flux = {
source = "fluxcd/flux"
}
github = {
source = "integrations/github"
}
}
}
//...
variable "github_token" {
sensitive = true
type = string
}
variable "github_org" {
type = string
}
variable "github_repository" {
type = string
}
```
{{< /highlight >}}
{{< highlight host="demo-kube-k3s" file="flux.tf" >}}
```tf
github_org = "mykuberocks"
github_repository = "demo-kube-flux"
github_token = "xxx"
```
{{< /highlight >}}
{{< alert >}}
Create a [Github token](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens) with repo permissions and add it to `github_token` variable.
{{< /alert >}}
{{< highlight host="demo-kube-k3s" file="flux.tf" >}}
```tf
provider "github" {
owner = var.github_org
token = var.github_token
}
resource "tls_private_key" "flux" {
algorithm = "ECDSA"
ecdsa_curve = "P256"
}
resource "github_repository_deploy_key" "this" {
title = "Flux"
repository = var.github_repository
key = tls_private_key.flux.public_key_openssh
read_only = false
}
provider "flux" {
kubernetes = {
config_path = "~/.kube/config"
}
git = {
url = "ssh://git@github.com/${var.github_org}/${var.github_repository}.git"
ssh = {
username = "git"
private_key = tls_private_key.flux.private_key_pem
}
}
}
resource "flux_bootstrap_git" "this" {
path = "clusters/demo"
components_extra = [
"image-reflector-controller",
"image-automation-controller"
]
depends_on = [github_repository_deploy_key.this]
}
```
{{< /highlight >}}
Note as we'll use `components_extra` to add `image-reflector-controller` and `image-automation-controller` to Flux, as it will serve us later for new image tag detection.
After applying this, use `kg deploy -n flux-system` to check that Flux is correctly installed and running.
### Managing secrets
As always with GitOps, a secured secrets management is critical. Nobody wants to expose sensitive data in a git repository. An easy to go solution is to use [Bitnami Sealed Secrets](https://github.com/bitnami-labs/sealed-secrets), which will deploy a dedicated controller in your cluster that will automatically decrypt sealed secrets.
Open `demo-kube-flux` project and create helm deployment for sealed secret.
{{< highlight host="demo-kube-flux" file="clusters/demo/flux-add-ons/sealed-secrets.yaml" >}}
```yaml
---
apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: HelmRepository
metadata:
name: sealed-secrets
namespace: flux-system
spec:
interval: 1h0m0s
url: https://bitnami-labs.github.io/sealed-secrets
---
apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
name: sealed-secrets
namespace: flux-system
spec:
chart:
spec:
chart: sealed-secrets
reconcileStrategy: ChartVersion
sourceRef:
kind: HelmRepository
name: sealed-secrets
version: ">=2.12.0"
interval: 1m
releaseName: sealed-secrets-controller
targetNamespace: flux-system
install:
crds: Create
upgrade:
crds: CreateReplace
```
{{< /highlight >}}
{{< alert >}}
Don't touch manifests under `flux-system` folder, as it's managed by Flux itself and overload on each flux bootstrap.
{{< /alert >}}
Then push it and check that sealed secret controller is correctly deployed with `kg deploy sealed-secrets-controller -n flux-system`.
Private key is automatically generated, so last step is to fetch the public key. Type this in project root to include it in your git repository:
```sh
kpf svc/sealed-secrets-controller -n flux-system 8080
curl http://localhost:8080/v1/cert.pem > pub-sealed-secrets.pem
```
{{< alert >}}
By the way install the client with `brew install kubeseal` (Mac / Linux) or `scoop install kubeseal` (Windows).
{{< /alert >}}
## Install some tools
It's now finally time to install some tools to help us in our CD journey.
### pgAdmin
A 1st good example is typically pgAdmin, which is a web UI for Postgres. We'll use it to manage our database cluster. It requires a local PVC to store its data user and settings.
{{< highlight host="demo-kube-flux" file="clusters/demo/postgres/deploy-pgadmin.yaml" >}}
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: pgadmin
namespace: postgres
spec:
strategy:
type: Recreate
selector:
matchLabels:
app: pgadmin
template:
metadata:
labels:
app: pgadmin
spec:
securityContext:
runAsUser: 5050
runAsGroup: 5050
fsGroup: 5050
fsGroupChangePolicy: "OnRootMismatch"
containers:
- name: pgadmin
image: dpage/pgadmin4:latest
ports:
- containerPort: 80
env:
- name: PGADMIN_DEFAULT_EMAIL
valueFrom:
secretKeyRef:
name: pgadmin-auth
key: default-email
- name: PGADMIN_DEFAULT_PASSWORD
valueFrom:
secretKeyRef:
name: pgadmin-auth
key: default-password
volumeMounts:
- name: pgadmin-data
mountPath: /var/lib/pgadmin
volumes:
- name: pgadmin-data
persistentVolumeClaim:
claimName: pgadmin-data
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pgadmin-data
namespace: postgres
spec:
resources:
requests:
storage: 128Mi
volumeMode: Filesystem
storageClassName: longhorn
accessModes:
- ReadWriteOnce
---
apiVersion: v1
kind: Service
metadata:
name: pgadmin
namespace: postgres
spec:
selector:
app: pgadmin
ports:
- port: 80
---
apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
name: pgadmin
namespace: postgres
spec:
entryPoints:
- websecure
routes:
- match: Host(`pgadmin.kube.rocks`)
kind: Rule
middlewares:
- name: middleware-ip
namespace: traefik
services:
- name: pgadmin
port: 80
```
{{< /highlight >}}
Here are the secrets to adapt to your needs:
{{< highlight host="demo-kube-flux" file="clusters/demo/postgres/secret-pgadmin.yaml" >}}
```yaml
apiVersion: v1
kind: Secret
metadata:
name: pgadmin-auth
namespace: postgres
type: Opaque
data:
default-email: YWRtaW5Aa3ViZS5yb2Nrcw==
default-password: YWRtaW4=
```
{{< /highlight >}}
Now be sure to encrypt it with `kubeseal` and remove original file:
```sh
cat clusters/demo/postgres/secret-pgadmin.yaml | kubeseal --format=yaml --cert=pub-sealed-secrets.pem > clusters/demo/postgres/sealed-secret-pgadmin.yaml
rm clusters/demo/postgres/secret-pgadmin.yaml
```
{{< alert >}}
Don't forget to remove the original secret file before commit for obvious reason ! If too late, consider password leaked and regenerate a new one.
You may use [VSCode extension](https://github.com/codecontemplator/vscode-kubeseal)
{{< /alert >}}
Push it and wait a minute, and go to `pgadmin.kube.rocks` and login with chosen credentials. Now try to register a new server with `postgresql-primary.postgres` as hostname, and the rest with your PostgreSQL credential on previous installation. It should work !
{{< alert >}}
If you won't wait each time after code push, do `flux reconcile kustomization flux-system --with-source` (require `flux-cli`). It also allows easy debugging by printing any syntax error in your manifests.
{{< /alert >}}
You can test the read replica too by register a new server using the hostname `postgresql-read.postgres`. Try to do some update on primary and check that it's replicated on read replica. Any modification on replicas should be rejected as it's on transaction read only mode.
It's time to use some useful apps.
### n8n
Let's try some app that require a bit more configuration and real database connection with n8n, a workflow automation tool.
{{< highlight host="demo-kube-flux" file="clusters/demo/n8n/deploy-n8n.yaml" >}}
```yaml
apiVersion: apps/v1
kind: Namespace
metadata:
name: n8n
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: n8n
namespace: n8n
spec:
strategy:
type: Recreate
selector:
matchLabels:
app: n8n
template:
metadata:
labels:
app: n8n
spec:
containers:
- name: n8n
image: n8nio/n8n:latest
ports:
- containerPort: 5678
env:
- name: N8N_PROTOCOL
value: https
- name: N8N_HOST
value: n8n.kube.rocks
- name: N8N_PORT
value: "5678"
- name: NODE_ENV
value: production
- name: WEBHOOK_URL
value: https://n8n.kube.rocks/
- name: DB_TYPE
value: postgresdb
- name: DB_POSTGRESDB_DATABASE
value: n8n
- name: DB_POSTGRESDB_HOST
value: postgresql-primary.postgres
- name: DB_POSTGRESDB_USER
value: n8n
- name: DB_POSTGRESDB_PASSWORD
valueFrom:
secretKeyRef:
name: n8n-db
key: password
- name: N8N_EMAIL_MODE
value: smtp
- name: N8N_SMTP_HOST
value: smtp.mailgun.org
- name: N8N_SMTP_PORT
value: "587"
- name: N8N_SMTP_USER
valueFrom:
secretKeyRef:
name: n8n-smtp
key: user
- name: N8N_SMTP_PASS
valueFrom:
secretKeyRef:
name: n8n-smtp
key: password
- name: N8N_SMTP_SENDER
value: n8n@kube.rocks
volumeMounts:
- name: n8n-data
mountPath: /home/node/.n8n
volumes:
- name: n8n-data
persistentVolumeClaim:
claimName: n8n-data
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: n8n-data
namespace: n8n
spec:
resources:
requests:
storage: 1Gi
volumeMode: Filesystem
storageClassName: longhorn
accessModes:
- ReadWriteOnce
---
apiVersion: v1
kind: Service
metadata:
name: n8n
namespace: n8n
labels:
app: n8n
spec:
selector:
app: n8n
ports:
- port: 5678
---
apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
name: n8n
namespace: n8n
spec:
entryPoints:
- websecure
routes:
- match: Host(`n8n.kube.rocks`)
kind: Rule
services:
- name: n8n
port: 5678
```
{{< /highlight >}}
Here are the secrets to adapt to your needs:
{{< highlight host="demo-kube-flux" file="clusters/demo/n8n/secret-n8n-db.yaml" >}}
```yaml
apiVersion: v1
kind: Secret
metadata:
name: n8n-db
namespace: n8n
type: Opaque
data:
password: YWRtaW4=
```
{{< /highlight >}}
{{< highlight host="demo-kube-flux" file="clusters/demo/n8n/secret-n8n-smtp.yaml" >}}
```yaml
apiVersion: v1
kind: Secret
metadata:
name: n8n-smtp
namespace: n8n
type: Opaque
data:
user: YWRtaW4=
password: YWRtaW4=
```
{{< /highlight >}}
While writing these secrets, create `n8n` DB and set `n8n` user with proper credentials as owner.
Then don't forget to seal secrets and remove original files the same way as pgAdmin. Once pushed, n8n should be deploying, automatically migrate the db, and soon after `n8n.kube.rocks` should be available, allowing you to create your 1st account.
### NocoDB
Let's try a final candidate with NocoDB, an Airtable-like generator for Postgres. It's very similar to n8n.
{{< highlight host="demo-kube-flux" file="clusters/demo/nocodb/deploy-nocodb.yaml" >}}
```yaml
apiVersion: apps/v1
kind: Namespace
metadata:
name: nocodb
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: nocodb
namespace: nocodb
spec:
strategy:
type: Recreate
selector:
matchLabels:
app: nocodb
template:
metadata:
labels:
app: nocodb
spec:
containers:
- name: nocodb
image: nocodb/nocodb:latest
ports:
- containerPort: 8080
env:
- name: DB_PASSWORD
valueFrom:
secretKeyRef:
name: nocodb-db
key: password
- name: DATABASE_URL
value: postgresql://nocodb:$(DB_PASSWORD)@postgresql-primary.postgres/nocodb
- name: NC_AUTH_JWT_SECRET
valueFrom:
secretKeyRef:
name: nocodb-auth
key: jwt-secret
- name: NC_SMTP_HOST
value: smtp.mailgun.org
- name: NC_SMTP_PORT
value: "587"
- name: NC_SMTP_USERNAME
valueFrom:
secretKeyRef:
name: nocodb-smtp
key: user
- name: NC_SMTP_PASSWORD
valueFrom:
secretKeyRef:
name: nocodb-smtp
key: password
- name: NC_SMTP_FROM
value: nocodb@kube.rocks
volumeMounts:
- name: nocodb-data
mountPath: /usr/app/data
volumes:
- name: nocodb-data
persistentVolumeClaim:
claimName: nocodb-data
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: nocodb-data
namespace: nocodb
spec:
resources:
requests:
storage: 1Gi
volumeMode: Filesystem
storageClassName: longhorn
accessModes:
- ReadWriteOnce
---
apiVersion: v1
kind: Service
metadata:
name: nocodb
namespace: nocodb
spec:
selector:
app: nocodb
ports:
- port: 8080
---
apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
name: nocodb
namespace: nocodb
spec:
entryPoints:
- websecure
routes:
- match: Host(`nocodb.kube.rocks`)
kind: Rule
services:
- name: nocodb
port: 8080
```
{{< /highlight >}}
Here are the secrets to adapt to your needs:
{{< highlight host="demo-kube-flux" file="clusters/demo/nocodb/secret-nocodb-db.yaml" >}}
```yaml
apiVersion: v1
kind: Secret
metadata:
name: nocodb-db
namespace: nocodb
type: Opaque
data:
password: YWRtaW4=
```
{{< /highlight >}}
{{< highlight host="demo-kube-flux" file="clusters/demo/nocodb/secret-nocodb-auth.yaml" >}}
```yaml
apiVersion: v1
kind: Secret
metadata:
name: nocodb-auth
namespace: nocodb
type: Opaque
data:
jwt-secret: MDAwMDAwMDAtMDAwMC0wMDAwLTAwMDAtMDAwMDAwMDAwMDAw
```
{{< /highlight >}}
{{< highlight host="demo-kube-flux" file="clusters/demo/nocodb/secret-nocodb-smtp.yaml" >}}
```yaml
apiVersion: v1
kind: Secret
metadata:
name: nocodb-smtp
namespace: nocodb
type: Opaque
data:
user: YWRtaW4=
password: YWRtaW4=
```
{{< /highlight >}}
The final process is identical to n8n.
## 4th check ✅
We now have a functional continuous delivery with some nice no-code tools to play with ! The final missing stack for a production grade cluster is to install a complete monitoring stack, this is the [next part]({{< ref "/posts/15-a-beautiful-gitops-day-5" >}}).

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 9.8 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 372 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 318 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 319 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 356 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 644 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 406 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 449 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 326 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 303 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 422 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 336 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 401 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 308 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 330 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 434 KiB

View File

@ -0,0 +1,821 @@
---
title: "A beautiful GitOps day V - Monitoring and Logging Stack"
date: 2023-08-23
description: "Follow this opinionated guide as starter-kit for your own Kubernetes platform..."
tags: ["kubernetes", "monitoring", "logging", "prometheus", "loki", "grafana"]
---
{{< lead >}}
Use GitOps workflow for building a production grade on-premise Kubernetes cluster on cheap VPS provider, with complete CI/CD 🎉
{{< /lead >}}
This is the **Part V** of more global topic tutorial. [Back to guide summary]({{< ref "/posts/10-a-beautiful-gitops-day" >}}) for intro.
## Monitoring
Monitoring is a critical part of any production grade platform. It allows you to be proactive and react before your users are impacted. It also helps get a quick visualization of cluster architecture and current usage.
### Monitoring node pool
As well as storage pool, creating a dedicated node pool for monitoring stack is a good practice in order to scale it separately from the apps.
You now have a good understanding of how to create a node pool, so apply next configuration from our 1st Terraform project:
{{< highlight host="demo-kube-hcloud" file="kube.tf" >}}
```tf
module "hcloud_kube" {
//...
agent_nodepools = [
//...
{
name = "monitor"
server_type = "cx21"
location = "nbg1"
count = 1
private_interface = "ens10"
labels = [
"node.kubernetes.io/server-usage=monitor"
]
taints = [
"node-role.kubernetes.io/monitor:NoSchedule"
]
}
]
}
```
{{< /highlight >}}
### Prometheus Stack
When using k8s, the standard de facto is to install [Prometheus stack](https://artifacthub.io/packages/helm/prometheus-community/kube-prometheus-stack). It includes all necessary CRDs and components for a proper monitoring stack.
You have 2 choices to install it, are we using Flux or Terraform ? Flux include a full documentation of [how to install it with](https://fluxcd.io/flux/guides/monitoring/).
But remember previous chapter with the house analogies. I personally consider monitoring as part of my infrastructure. And I prefer to keep all my infrastructure configuration in Terraform, and only use Flux for apps. Moreover, the Prometheus stack is a pretty big Helm chart, and upgrading it can be a bit tricky. So I prefer to have a full control of it with Terraform.
Go back to 2nd Terraform project and let's apply this pretty big boy:
{{< highlight host="demo-kube-k3s" file="monitoring.tf" >}}
```tf
resource "kubernetes_namespace_v1" "monitoring" {
metadata {
name = "monitoring"
}
}
resource "helm_release" "kube_prometheus_stack" {
chart = "kube-prometheus-stack"
version = "49.2.0"
repository = "https://prometheus-community.github.io/helm-charts"
name = "kube-prometheus-stack"
namespace = kubernetes_namespace_v1.monitoring.metadata[0].name
set {
name = "prometheus.prometheusSpec.retention"
value = "15d"
}
set {
name = "prometheus.prometheusSpec.retentionSize"
value = "5GB"
}
set {
name = "prometheus.prometheusSpec.serviceMonitorSelectorNilUsesHelmValues"
value = "false"
}
set {
name = "prometheus.prometheusSpec.podMonitorSelectorNilUsesHelmValues"
value = "false"
}
set {
name = "prometheus.prometheusSpec.enableRemoteWriteReceiver"
value = "true"
}
set {
name = "prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.accessModes[0]"
value = "ReadWriteOnce"
}
set {
name = "prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage"
value = "8Gi"
}
set {
name = "prometheus.prometheusSpec.tolerations[0].key"
value = "node-role.kubernetes.io/storage"
}
set {
name = "prometheus.prometheusSpec.tolerations[0].operator"
value = "Exists"
}
set {
name = "prometheus.prometheusSpec.nodeSelector.node\\.kubernetes\\.io/server-usage"
value = "monitor"
}
set {
name = "alertmanager.enabled"
value = "false"
}
set {
name = "grafana.enabled"
value = "false"
}
set {
name = "grafana.forceDeployDatasources"
value = "true"
}
set {
name = "grafana.forceDeployDashboards"
value = "true"
}
}
```
{{< /highlight >}}
The application is deployed under `monitoring` namespace. It takes few minutes to be fully up and running. You can check the status with `kgpo -n monitoring`.
Important notes:
* We set a retention of **15 days** and **5 GB** of storage for Prometheus. Set this according to your needs.
* We allow `serviceMonitorSelector` and `podMonitorSelector` for scrapping monitor CRDs from all namespaces.
* We set `enableRemoteWriteReceiver` to allow remote write to databases for advanced specific usage, as by default Prometheus works with pull model on its own.
* As we don't set any storage class, the default one will be used, which is `local-path` when using K3s. If you want to use longhorn instead and benefit of automatic monitoring backup, you can set it with `...volumeClaimTemplate.spec.storageClassName`. But don't forget to deploy Longhorn manager by adding monitor toleration.
* As it's a huge chart, I want to minimize dependencies by disabling Grafana, as I prefer manage it separately. However, in this case we may set `grafana.forceDeployDatasources` and `grafana.forceDeployDashboards` to `true` in order to benefit of all included Kubernetes dashboards and automatic Prometheus datasource injection, and deploy them to config maps that can be used for next Grafana install by provisioning.
And finally the ingress for external access:
{{< highlight host="demo-kube-k3s" file="monitoring.tf" >}}
```tf
resource "kubernetes_manifest" "prometheus_ingress" {
manifest = {
apiVersion = "traefik.io/v1alpha1"
kind = "IngressRoute"
metadata = {
name = "prometheus"
namespace = kubernetes_namespace_v1.monitoring.metadata[0].name
}
spec = {
entryPoints = ["websecure"]
routes = [
{
match = "Host(`prometheus.${var.domain}`)"
kind = "Rule"
middlewares = [
{
name = "middleware-ip"
namespace = "traefik"
},
{
name = "middleware-auth"
namespace = "traefik"
}
]
services = [
{
name = "prometheus-operated"
port = "http-web"
}
]
}
]
}
}
}
```
{{< /highlight >}}
No go to `prometheus.kube.rocks`, after login you should access the Prometheus UI. Check under `/targets` that all targets are up and running. In previous chapters, because we have enabled monitoring for all our apps supporting metrics, you should see following available targets:
* 1 instance of Traefik
* 1 instance of cert-manager
* 1 instance of each PostgreSQL primary and read
* 2 instances of Redis
* 5 instances of Longhorn manager
This is exactly how it works, the `ServiceMonitor` custom resource is responsible to discover and centralize all metrics for prometheus, allowing automatic discovery without touch the Prometheus config. Use `kg smon -A` to list them all.
### Monitoring Flux
There is one missing however, let's add monitoring for flux. Go back to flux project and push following manifests:
{{< highlight host="demo-kube-flux" file="clusters/demo/flux-add-ons/flux-monitoring.yaml" >}}
```yaml
---
apiVersion: source.toolkit.fluxcd.io/v1
kind: GitRepository
metadata:
name: flux-monitoring
namespace: flux-system
spec:
interval: 30m0s
ref:
branch: main
url: https://github.com/fluxcd/flux2-monitoring-example
---
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
name: monitoring-config
namespace: flux-system
spec:
interval: 1h0m0s
path: ./monitoring/configs
prune: true
sourceRef:
kind: GitRepository
name: flux-monitoring
```
{{< /highlight >}}
The `spec.path` under `Kustomization` tells Flux to scrape [remote monitoring manifests](https://github.com/fluxcd/flux2-monitoring-example/tree/main/monitoring/configs), avoiding us to write all of them manually. It includes the `PodMonitor` as well as Grafana dashboards.
After some minutes, flux should be appearing in Prometheus targets.
[![Prometheus targets](prometheus-targets.png)](prometheus-targets.png)
### Grafana
We have the basement of our monitoring stack, it's time to get a UI to visualize all these metrics. Grafana is the most popular tool for that, and it's also available as Helm chart. Prepare some variables:
{{< highlight host="demo-kube-k3s" file="main.tf" >}}
```tf
variable "smtp_host" {
sensitive = true
}
variable "smtp_port" {
type = string
}
variable "smtp_user" {
type = string
sensitive = true
}
variable "smtp_password" {
type = string
sensitive = true
}
variable "grafana_db_password" {
type = string
sensitive = true
}
```
{{< /highlight >}}
Create `grafana` database through pgAdmin with same user and according `grafana_db_password`.
{{< highlight host="demo-kube-k3s" file="terraform.tfvars" >}}
```tf
smtp_host = "smtp.mailgun.org"
smtp_port = "587"
smtp_user = "xxx"
smtp_password = "xxx"
```
{{< /highlight >}}
Apply next configuration to Terraform project:
{{< highlight host="demo-kube-k3s" file="grafana.tf" >}}
```tf
resource "helm_release" "grafana" {
chart = "grafana"
version = "6.58.9"
repository = "https://grafana.github.io/helm-charts"
name = "grafana"
namespace = kubernetes_namespace_v1.monitoring.metadata[0].name
set {
name = "serviceMonitor.enabled"
value = "true"
}
set {
name = "sidecar.datasources.enabled"
value = "true"
}
set {
name = "sidecar.dashboards.enabled"
value = "true"
}
set {
name = "env.GF_SERVER_DOMAIN"
value = var.domain
}
set {
name = "env.GF_SERVER_ROOT_URL"
value = "https://grafana.${var.domain}"
}
set {
name = "env.GF_SMTP_ENABLED"
value = "true"
}
set {
name = "env.GF_SMTP_HOST"
value = "${var.smtp_host}:${var.smtp_port}"
}
set {
name = "env.GF_SMTP_USER"
value = var.smtp_user
}
set {
name = "env.GF_SMTP_PASSWORD"
value = var.smtp_password
}
set {
name = "env.GF_SMTP_FROM_ADDRESS"
value = "grafana@${var.domain}"
}
set {
name = "env.GF_DATABASE_TYPE"
value = "postgres"
}
set {
name = "env.GF_DATABASE_HOST"
value = "postgresql-primary.postgres"
}
set {
name = "env.GF_DATABASE_NAME"
value = "grafana"
}
set {
name = "env.GF_DATABASE_USER"
value = "grafana"
}
set {
name = "env.GF_DATABASE_PASSWORD"
value = var.grafana_db_password
}
}
resource "kubernetes_manifest" "grafana_ingress" {
manifest = {
apiVersion = "traefik.io/v1alpha1"
kind = "IngressRoute"
metadata = {
name = "grafana"
namespace = kubernetes_namespace_v1.monitoring.metadata[0].name
}
spec = {
entryPoints = ["websecure"]
routes = [
{
match = "Host(`grafana.${var.domain}`)"
kind = "Rule"
services = [
{
name = "grafana"
port = "service"
}
]
}
]
}
}
}
```
{{< /highlight >}}
We enable both data source and dashboard sidecars by setting `sidecar.datasources.enabled` and `sidecar.dashboards.enabled`. These sidecars will automatically inject all dashboards and data sources from `ConfigMap`, like those provided by Prometheus stack and Flux. `serviceMonitor.enabled` will create a `ServiceMonitor` for Prometheus to scrape Grafana metrics.
Grafana should be deploying and migrate database successfully. Let's log in immediately after in `https://grafana.kube.rocks/login` with admin account. You can get the password with `kg secret -n monitoring grafana -o jsonpath='{.data.admin-password}' | base64 -d`.
### Native dashboards
If you go to `https://grafana.kube.rocks/dashboards`, you should see a many dashboards available that should already perfectly work, giving you a complete vision of:
* Some core components of K8s, like coredns, kube api server, all kubelets
* Detail of pods, namespace, workloads
* Nodes thanks to Node exporter
* Prometheus and Grafana itself stats
* Flux stats
{{< alert >}}
Some other core components like etcd, scheduler, proxy, and controller manager need to have metrics enabled to be scraped. See K3s docs or [this issue](https://github.com/k3s-io/k3s/issues/3619)
{{< /alert >}}
#### Prometheus
[![Prometheus](dashboards-prometheus.png)](dashboards-prometheus.png)
#### Nodes
[![Nodes](dashboards-nodes.png)](dashboards-nodes.png)
#### Cluster
[![Cluster compute](dashboards-cluster-compute.png)](dashboards-cluster-compute.png)
[![Cluster networks](dashboards-cluster-network.png)](dashboards-cluster-network.png)
[![Pods](dashboards-pods.png)](dashboards-pods.png)
#### Kube components
[![Kube API Server](dashboards-api-server.png)](dashboards-api-server.png)
[![Kubelets](dashboards-kubelets.png)](dashboards-kubelets.png)
[![CoreDNS](dashboards-coredns.png)](dashboards-coredns.png)
#### Flux
[![Flux](dashboards-flux.png)](dashboards-flux.png)
### Additional dashboards
You can easily import some additional dashboards by importing them from Grafana marketplace or include them in `ConfigMap` for automatic provisioning.
#### Traefik
[Link](https://grafana.com/grafana/17346)
[![Traefik](dashboards-traefik.png)](dashboards-traefik.png)
#### cert-manager
[Link](https://github.com/monitoring-mixins/website/blob/master/assets/cert-manager/dashboards/cert-manager.json)
[![cert-manager](dashboards-cert-manager.png)](dashboards-cert-manager.png)
#### Longhorn
[Link](https://grafana.com/grafana/16888)
[![Longhorn](dashboards-longhorn.png)](dashboards-longhorn.png)
#### PostgreSQL
[Link](https://grafana.com/grafana/9628)
[![PostgreSQL](dashboards-postgresql.png)](dashboards-postgresql.png)
#### Redis
[Link](https://grafana.com/grafana/dashboards/763)
[![Redis](dashboards-redis.png)](dashboards-redis.png)
## Logging
Last but not least, we need to add a logging stack. The most popular one is [Elastic Stack](https://www.elastic.co/elastic-stack), but it's very resource intensive. A more lightweight option is to use [Loki](https://grafana.com/oss/loki/), also part of Grafana Labs.
In order to work on scalable mode, we need to have a S3 storage backend. We will reuse same S3 compatible storage as longhorn backup here, but it's recommended to use a separate bucket and credentials.
### Loki
Let's install it now:
{{< highlight host="demo-kube-k3s" file="logging.tf" >}}
```tf
resource "kubernetes_namespace_v1" "logging" {
metadata {
name = "logging"
}
}
resource "helm_release" "loki" {
chart = "loki"
version = "5.15.0"
repository = "https://grafana.github.io/helm-charts"
name = "loki"
namespace = kubernetes_namespace_v1.logging.metadata[0].name
set {
name = "loki.auth_enabled"
value = "false"
}
set {
name = "loki.compactor.retention_enabled"
value = "true"
}
set {
name = "loki.limits_config.retention_period"
value = "24h"
}
set {
name = "loki.storage.bucketNames.chunks"
value = var.s3_bucket
}
set {
name = "loki.storage.bucketNames.ruler"
value = var.s3_bucket
}
set {
name = "loki.storage.bucketNames.admin"
value = var.s3_bucket
}
set {
name = "loki.storage.s3.endpoint"
value = var.s3_endpoint
}
set {
name = "loki.storage.s3.region"
value = var.s3_region
}
set {
name = "loki.storage.s3.accessKeyId"
value = var.s3_access_key
}
set {
name = "loki.storage.s3.secretAccessKey"
value = var.s3_secret_key
}
set {
name = "read.replicas"
value = "1"
}
set {
name = "backend.replicas"
value = "1"
}
set {
name = "write.replicas"
value = "2"
}
set {
name = "write.tolerations[0].key"
value = "node-role.kubernetes.io/storage"
}
set {
name = "write.tolerations[0].effect"
value = "NoSchedule"
}
set {
name = "write.nodeSelector.node-role\\.kubernetes\\.io/storage"
type = "string"
value = "true"
}
set {
name = "monitoring.dashboards.namespace"
value = kubernetes_namespace_v1.monitoring.metadata[0].name
}
set {
name = "monitoring.selfMonitoring.enabled"
value = "false"
}
set {
name = "monitoring.selfMonitoring.grafanaAgent.installOperator"
value = "false"
}
set {
name = "monitoring.lokiCanary.enabled"
value = "false"
}
set {
name = "test.enabled"
value = "false"
}
}
```
{{< /highlight >}}
Use `loki.limits_config.retention_period` to set a maximum period retention. You need to set at least **2** for `write.replicas` or you'll get this 500 API error "*too many unhealthy instances in the ring*". As we force them to be deployed on storage nodes, be sure to have 2 storage nodes.
### Promtail
Okay so Loki is running but not fed, for that we'll deploy [Promtail](https://grafana.com/docs/loki/latest/clients/promtail/), which is a log collector that will be deployed on each node and collect logs from all pods and send them to Loki.
{{< highlight host="demo-kube-k3s" file="logging.tf" >}}
```tf
resource "helm_release" "promtail" {
chart = "promtail"
version = "6.15.0"
repository = "https://grafana.github.io/helm-charts"
name = "promtail"
namespace = kubernetes_namespace_v1.logging.metadata[0].name
set {
name = "tolerations[0].effect"
value = "NoSchedule"
}
set {
name = "tolerations[0].operator"
value = "Exists"
}
set {
name = "serviceMonitor.enabled"
value = "true"
}
}
```
{{< /highlight >}}
Ha, finally a simple Helm chart ! Seems too good to be true. We just have to add generic `tolerations` in order to deploy Promtail `DaemonSet` on every node for proper log scrapping.
### Loki data source
Because we are GitOps, we want to have all Loki dashboards and data sources automatically configured. It's already done for dashboards, but we need to add a data source.
Let's apply next Terraform resource:
{{< highlight host="demo-kube-k3s" file="logging.tf" >}}
```tf
resource "kubernetes_config_map_v1" "loki_grafana_datasource" {
metadata {
name = "loki-grafana-datasource"
namespace = kubernetes_namespace_v1.monitoring.metadata[0].name
labels = {
grafana_datasource = "1"
}
}
data = {
"datasource.yaml" = <<EOF
apiVersion: 1
datasources:
- name: Loki
type: loki
uid: loki
url: http://loki-gateway.logging/
access: proxy
EOF
}
}
```
{{< /highlight >}}
Now go to `https://grafana.kube.rocks/connections/datasources/edit/loki` and ensure that Loki respond correctly by click on *Test*.
Go can now admire logs in Loki UI at `https://grafana.kube.rocks/explore` !
[![Loki explore](loki-explore.png)](loki-explore.png)
### Loki dashboards
We have nothing more to do, all dashboards are already provided by Loki Helm chart.
[![Loki explore](dashboards-loki.png)](dashboards-loki.png)
## Helm Exporter
We have installed many Helm Charts so far, but how we manage upgrading plans ? We may need to be aware of new versions and security fixes. For that, we can use Helm Exporter:
{{< highlight host="demo-kube-k3s" file="monitoring.tf" >}}
```tf
resource "helm_release" "helm_exporter" {
chart = "helm-exporter"
version = "1.2.5+1cbc9c5"
repository = "https://shanestarcher.com/helm-charts"
name = "helm-exporter"
namespace = kubernetes_namespace_v1.monitoring.metadata[0].name
set {
name = "serviceMonitor.create"
value = "true"
}
set {
name = "grafanaDashboard.enabled"
value = "true"
}
set {
name = "grafanaDashboard.grafanaDashboard.namespace"
value = kubernetes_namespace_v1.monitoring.metadata[0].name
}
values = [
file("values/helm-exporter-values.yaml")
]
}
```
{{< /highlight >}}
As the helm exporter config is a bit tedious, it's more straightforward to use a separate helm values file. Here is a sample configuration for Helm Exporter for scraping all charts that we'll need:
{{< highlight host="demo-kube-k3s" file="values/helm-exporter-values.tf" >}}
```yaml
config:
helmRegistries:
registryNames:
- bitnami
override:
- registry:
url: "https://concourse-charts.storage.googleapis.com"
charts:
- concourse
- registry:
url: "https://dl.gitea.io/charts"
charts:
- gitea
- registry:
url: "https://grafana.github.io/helm-charts"
charts:
- grafana
- loki
- promtail
- tempo
- registry:
url: "https://charts.longhorn.io"
charts:
- longhorn
- registry:
url: "https://charts.jetstack.io"
charts:
- cert-manager
- registry:
url: "https://traefik.github.io/charts"
charts:
- traefik
- registry:
url: "https://bitnami-labs.github.io/sealed-secrets"
charts:
- sealed-secrets
- registry:
url: "https://prometheus-community.github.io/helm-charts"
charts:
- kube-prometheus-stack
- registry:
url: "https://SonarSource.github.io/helm-chart-sonarqube"
charts:
- sonarqube
- registry:
url: "https://kubereboot.github.io/charts"
charts:
- kured
- registry:
url: "https://shanestarcher.com/helm-charts"
charts:
- helm-exporter
```
{{< /highlight >}}
You can easily start from provisioned dashboard and customize it for using `helm_chart_outdated` instead of `helm_chart_info` to list all outdated helms.
## 5th check ✅
We now have a full monitoring suite with performant logging collector ! What a pretty massive subject done. At this stage, you have a good starting point to run many apps on your cluster with high scalability and observability. We are done for the pure **operational** part. It's finally time to tackle the **building** part for a complete development stack. Go [next part]({{< ref "/posts/16-a-beautiful-gitops-day-6" >}}) to begin with continuous integration.

Binary file not shown.

After

Width:  |  Height:  |  Size: 328 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 365 KiB

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 14 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 223 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 197 KiB

View File

@ -0,0 +1,642 @@
---
title: "A beautiful GitOps day VI - CI tools"
date: 2023-08-24
description: "Follow this opinionated guide as starter-kit for your own Kubernetes platform..."
tags: ["kubernetes", "ci", "gitea", "concourse"]
---
{{< lead >}}
Use GitOps workflow for building a production grade on-premise Kubernetes cluster on cheap VPS provider, with complete CI/CD 🎉
{{< /lead >}}
This is the **Part VI** of more global topic tutorial. [Back to guide summary]({{< ref "/posts/10-a-beautiful-gitops-day" >}}) for intro.
## Self-hosted VCS
It's finally time to build our CI stack. Let's start with a self-hosted VCS. We'll use [Gitea](https://gitea.io/) as a lightweight GitHub clone, and far less resource intensive than GitLab. You can of course perfectly skip this entire chapter and stay with GitHub/GitLab if you prefer. But one of the goal of this tutorial is to maximize self-hosting, so let's go !
As I consider the CI as part of infrastructure, I'll use the dedicated Terraform project for Helms management. But again it's up to you if you prefer using Flux, it'll work too.
### Gitea
The Gitea Helm Chart is a bit tricky to configure properly. Let's begin with some additional required variables:
{{< highlight host="demo-kube-k3s" file="main.tf" >}}
```tf
variable "gitea_admin_username" {
type = string
}
variable "gitea_admin_password" {
type = string
sensitive = true
}
variable "gitea_admin_email" {
type = string
}
variable "gitea_db_password" {
type = string
sensitive = true
}
```
{{< /highlight >}}
{{< highlight host="demo-kube-k3s" file="terraform.tfvars" >}}
```tf
gitea_admin_username = "kuberocks"
gitea_admin_password = "xxx"
gitea_admin_email = "admin@kube.rocks"
gitea_db_password = "xxx"
```
{{< /highlight >}}
Then the Helm chart itself:
{{< highlight host="demo-kube-k3s" file="gitea.tf" >}}
```tf
locals {
redis_connection = "redis://:${urlencode(var.redis_password)}@redis-master.redis:6379/0"
}
resource "kubernetes_namespace_v1" "gitea" {
metadata {
name = "gitea"
}
}
resource "helm_release" "gitea" {
chart = "gitea"
version = "9.2.0"
repository = "https://dl.gitea.io/charts"
name = "gitea"
namespace = kubernetes_namespace_v1.gitea.metadata[0].name
set {
name = "gitea.admin.username"
value = var.gitea_admin_username
}
set {
name = "gitea.admin.password"
value = var.gitea_admin_password
}
set {
name = "gitea.admin.email"
value = var.gitea_admin_email
}
set {
name = "strategy.type"
value = "Recreate"
}
set {
name = "postgresql-ha.enabled"
value = "false"
}
set {
name = "redis-cluster.enabled"
value = "false"
}
set {
name = "persistence.storageClass"
value = "longhorn"
}
set {
name = "persistence.size"
value = "5Gi"
}
set {
name = "gitea.metrics.enabled"
value = "true"
}
set {
name = "gitea.metrics.serviceMonitor.enabled"
value = "true"
}
set {
name = "gitea.config.server.DOMAIN"
value = "gitea.${var.domain}"
}
set {
name = "gitea.config.server.SSH_DOMAIN"
value = "gitea.${var.domain}"
}
set {
name = "gitea.config.server.ROOT_URL"
value = "https://gitea.${var.domain}"
}
set {
name = "gitea.config.database.DB_TYPE"
value = "postgres"
}
set {
name = "gitea.config.database.HOST"
value = "postgresql-primary.postgres"
}
set {
name = "gitea.config.database.NAME"
value = "gitea"
}
set {
name = "gitea.config.database.USER"
value = "gitea"
}
set {
name = "gitea.config.database.PASSWD"
value = var.gitea_db_password
}
set {
name = "gitea.config.indexer.REPO_INDEXER_ENABLED"
value = "true"
}
set {
name = "gitea.config.mailer.ENABLED"
value = "true"
}
set {
name = "gitea.config.mailer.FROM"
value = "gitea@${var.domain}"
}
set {
name = "gitea.config.mailer.SMTP_ADDR"
value = var.smtp_host
}
set {
name = "gitea.config.mailer.SMTP_PORT"
value = var.smtp_port
}
set {
name = "gitea.config.mailer.USER"
value = var.smtp_user
}
set {
name = "gitea.config.mailer.PASSWD"
value = var.smtp_password
}
set {
name = "gitea.config.cache.ADAPTER"
value = "redis"
}
set {
name = "gitea.config.cache.HOST"
value = local.redis_connection
}
set {
name = "gitea.config.session.PROVIDER"
value = "redis"
}
set {
name = "gitea.config.session.PROVIDER_CONFIG"
value = local.redis_connection
}
set {
name = "gitea.config.queue.TYPE"
value = "redis"
}
set {
name = "gitea.config.queue.CONN_STR"
value = local.redis_connection
}
set {
name = "gitea.config.service.DISABLE_REGISTRATION"
value = "true"
}
set {
name = "gitea.config.repository.DEFAULT_BRANCH"
value = "main"
}
set {
name = "gitea.config.metrics.ENABLED_ISSUE_BY_REPOSITORY"
value = "true"
}
set {
name = "gitea.config.metrics.ENABLED_ISSUE_BY_LABEL"
value = "true"
}
set {
name = "gitea.config.webhook.ALLOWED_HOST_LIST"
value = "*"
}
}
```
{{< /highlight >}}
Note as we disable included Redis and PostgreSQL sub charts, because w'l reuse our existing ones. Also note the use of `urlencode` function for Redis password, as it can contain special characters.
The related ingress:
{{< highlight host="demo-kube-k3s" file="gitea.tf" >}}
```tf
resource "kubernetes_manifest" "gitea_ingress" {
manifest = {
apiVersion = "traefik.io/v1alpha1"
kind = "IngressRoute"
metadata = {
name = "gitea-http"
namespace = kubernetes_namespace_v1.gitea.metadata[0].name
}
spec = {
entryPoints = ["websecure"]
routes = [
{
match = "Host(`gitea.${var.domain}`)"
kind = "Rule"
services = [
{
name = "gitea-http"
port = "http"
}
]
}
]
}
}
}
```
{{< /highlight >}}
You should be able to log in `https://gitea.kube.rocks` with chosen admin credentials.
### Push a basic Web API project
Let's generate a basic .NET Web API project. Create a new dotnet project like following (you may install [last .NET SDK](https://dotnet.microsoft.com/en-us/download)):
```sh
mkdir kuberocks-demo
cd kuberocks-demo
dotnet new sln
dotnet new gitignore
dotnet new editorconfig
dotnet new webapi -o src/KubeRocks.WebApi
dotnet sln add src/KubeRocks.WebApi
git init
git add .
git commit -m "first commit"
```
Then create a new repo `kuberocks/demo` on Gitea, and follow the instructions of *existing repository* section to push your code.
[![Gitea repo](gitea-repo.png)](gitea-repo.png)
All should work as expected when HTTPS, even the fuzzy repo search. But what if you prefer SSH ?
### Pushing via SSH
We'll use SSH to push our code to Gitea. Put your public SSH key in your Gitea profile and follow push instructions from the sample repo. Here the SSH remote is `git@gitea.kube.rocks:kuberocks/demo.git`.
When you'll try to pull, you'll get a connection timeout error. It's time to tackle SSH access to our cluster.
Firstly, we have to open SSH port to our load balancer. Go back to the 1st Hcloud Terraform project and create a new service for SSH:
{{< highlight host="demo-kube-hcloud" file="kube.tf" >}}
```tf
resource "hcloud_load_balancer_service" "ssh_service" {
load_balancer_id = module.hcloud_kube.lbs.worker.id
protocol = "tcp"
listen_port = 22
destination_port = 22
}
```
{{< /highlight >}}
SSH port is now opened, we have a new **connection refused** error. Let's configure SSH access from Traefik to Gitea pod.
{{< highlight host="demo-kube-k3s" file="traefik.tf" >}}
```tf
resource "helm_release" "traefik" {
//...
set {
name = "ports.ssh.port"
value = "2222"
}
set {
name = "ports.ssh.expose"
value = "true"
}
set {
name = "ports.ssh.exposedPort"
value = "22"
}
set {
name = "ports.ssh.protocol"
value = "TCP"
}
}
```
{{< /highlight >}}
And finally, the route ingress:
{{< highlight host="demo-kube-k3s" file="gitea.tf" >}}
```tf
resource "kubernetes_manifest" "gitea_ingress_ssh" {
manifest = {
apiVersion = "traefik.io/v1alpha1"
kind = "IngressRouteTCP"
metadata = {
name = "gitea-ssh"
namespace = kubernetes_namespace_v1.gitea.metadata[0].name
}
spec = {
entryPoints = ["ssh"]
routes = [
{
match = "HostSNI(`*`)"
services = [
{
name = "gitea-ssh"
port = "ssh"
}
]
}
]
}
}
}
```
{{< /highlight >}}
Now retry pull again and it should work seamlessly !
### Gitea monitoring
[Link](https://grafana.com/grafana/dashboards/17802)
[![Gitea monitoring](gitea-monitoring.png)](gitea-monitoring.png)
## CI
Now we have a working self-hosted VCS, let's add a CI tool. We'll use [Concourse CI](https://concourse-ci.org/), which is optimized for Kubernetes and have high scalability (and open source of course), with the price of some configuration and slight learning curve.
{{< alert >}}
If you prefer to have CI directly included into your VCS, which simplify configuration drastically, although limited to same Gitea host, note that Gitea team is working on a built-in CI, see [Gitea Actions](https://docs.gitea.com/usage/actions/overview) (not production ready).
I personally prefer to have a dedicated CI tool, as it's more flexible and can be used for any external VCS if needed.
{{< /alert >}}
### CI node pool
Concourse CI is composed of 2 main components:
* **Web UI**: the main UI, which is used to configure pipelines and visualize jobs, persisted in a PostgreSQL database
* **Worker**: the actual CI worker, which will execute jobs for any app building
It's obvious that the workers, which are the most resource intensive, should be scaled independently, without any impact on others critical components of our cluster. So, as you already guess, we'll use a dedicated pool for building. Let's apply this:
{{< highlight host="demo-kube-hcloud" file="kube.tf" >}}
```tf
module "hcloud_kube" {
//...
agent_nodepools = [
//...
{
name = "runner"
server_type = "cx21"
location = "nbg1"
count = 1
private_interface = "ens10"
labels = [
"node.kubernetes.io/server-usage=runner"
]
taints = [
"node-role.kubernetes.io/runner:NoSchedule"
]
}
]
}
```
{{< /highlight >}}
### Concourse CI
The variables:
{{< highlight host="demo-kube-k3s" file="main.tf" >}}
```tf
variable "concourse_user" {
type = string
}
variable "concourse_password" {
type = string
sensitive = true
}
variable "concourse_db_password" {
type = string
sensitive = true
}
```
{{< /highlight >}}
{{< highlight host="demo-kube-k3s" file="terraform.tfvars" >}}
```tf
concourse_user = "kuberocks"
concourse_password = "xxx"
concourse_db_password = "xxx"
```
{{< /highlight >}}
Let's apply Concourse Helm Chart:
{{< highlight host="demo-kube-k3s" file="concourse.tf" >}}
```tf
resource "kubernetes_namespace_v1" "concourse" {
metadata {
name = "concourse"
}
}
resource "helm_release" "concourse" {
chart = "concourse"
version = "17.2.0"
repository = "https://concourse-charts.storage.googleapis.com"
name = "concourse"
namespace = kubernetes_namespace_v1.concourse.metadata[0].name
set {
name = "concourse.web.externalUrl"
value = "https://concourse.${var.domain}"
}
set {
name = "postgresql.enabled"
value = "false"
}
set {
name = "secrets.postgresUser"
value = "concourse"
}
set {
name = "secrets.postgresPassword"
value = var.concourse_db_password
}
set {
name = "concourse.web.auth.mainTeam.localUser"
value = var.concourse_user
}
set {
name = "secrets.localUsers"
value = "${var.concourse_user}:${var.concourse_password}"
}
set {
name = "concourse.web.postgres.host"
value = "postgresql-primary.postgres"
}
set {
name = "concourse.web.postgres.database"
value = "concourse"
}
set {
name = "concourse.web.auth.cookieSecure"
value = "true"
}
set {
name = "concourse.web.prometheus.enabled"
value = "true"
}
set {
name = "concourse.web.prometheus.serviceMonitor.enabled"
value = "true"
}
set {
name = "concourse.worker.runtime"
value = "containerd"
}
set {
name = "worker.replicas"
value = "1"
}
set {
name = "worker.minAvailable"
value = "0"
}
set {
name = "worker.tolerations[0].key"
value = "node-role.kubernetes.io/runner"
}
set {
name = "worker.tolerations[0].effect"
value = "NoSchedule"
}
set {
name = "worker.nodeSelector.node\\.kubernetes\\.io/server-usage"
value = "runner"
}
}
resource "kubernetes_manifest" "concourse_ingress" {
manifest = {
apiVersion = "traefik.io/v1alpha1"
kind = "IngressRoute"
metadata = {
name = "concourse"
namespace = kubernetes_namespace_v1.concourse.metadata[0].name
}
spec = {
entryPoints = ["websecure"]
routes = [
{
match = "Host(`concourse.${var.domain}`)"
kind = "Rule"
services = [
{
name = "concourse-web"
port = "atc"
}
]
}
]
}
}
}
```
{{< /highlight >}}
Be sure to disable the PostgreSQL sub chart via `postgresql.enabled`.
You may set `worker.replicas` as the number of nodes in your runner pool. As usual, note the use of `nodeSelector` and `tolerations` to ensure workers are deployed on runner nodes.
Then go to `https://concourse.kube.rocks` and log in with chosen credentials.
## 6th check ✅
We have everything we need for app building with automatic deployment ! Go [next part]({{< ref "/posts/17-a-beautiful-gitops-day-7" >}}) for creating a complete CI/CD workflow !

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 12 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 363 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 92 KiB

View File

@ -0,0 +1,571 @@
---
title: "A beautiful GitOps day VII - Create a CI+CD workflow"
date: 2023-08-25
description: "Follow this opinionated guide as starter-kit for your own Kubernetes platform..."
tags: ["kubernetes", "ci", "cd", "concourse", "flux"]
---
{{< lead >}}
Use GitOps workflow for building a production grade on-premise Kubernetes cluster on cheap VPS provider, with complete CI/CD 🎉
{{< /lead >}}
This is the **Part VII** of more global topic tutorial. [Back to guide summary]({{< ref "/posts/10-a-beautiful-gitops-day" >}}) for intro.
## Workflow
It's now time to step back and think about how we'll use our CI. Our goal is to build our above dotnet Web API with Concourse CI as a container image, ready to deploy to our cluster through Flux. So we finish the complete CI/CD pipeline. To resume the scenario:
1. Concourse CI check the Gitea repo periodically (pull model) for any new code and trigger a build if applicable
2. When container image build passed, Concourse CI push the new image to our private registry, which is already included into Gitea
3. Image Automation, which is a component as part of Flux, check the registry periodically (pull model), if new image tag detected, it will write the last tag into Flux repository
4. Flux check the flux GitHub registry periodically (pull model), if any new or updated manifest detected, it will deploy it automatically to our cluster
{{< alert >}}
Although it's the most secured way and configuration less, instead of default pull model, which is generally a check every minute, it's possible to use WebHook instead in order to reduce time between code push and deployment.
{{< /alert >}}
The flow pipeline is pretty straightforward:
{{< mermaid >}}
graph RL
subgraph R [Private registry]
C[/Container Registry/]
end
S -- scan --> R
S -- push --> J[(Flux repository)]
subgraph CD
D{Flux} -- check --> J
D -- deploy --> E((Kube API))
end
subgraph S [Image Scanner]
I[Image Reflector] -- trigger --> H[Image Automation]
end
subgraph CI
A{Concourse} -- check --> B[(Code repository)]
A -- push --> C
F((Worker)) -- build --> A
end
{{< /mermaid >}}
## CI part
### The credentials
We need to:
1. Give read/write access to our Gitea repo and container registry for Concourse. Note as we need write access in code repository for concourse because we need to store the new image tag. We'll using [semver resource](https://github.com/concourse/semver-resource) for that.
2. Give read registry credentials to Flux for regular image tag checking as well as Kubernetes in order to allow image pulling from the private registry.
Let's create 2 new user `concourse` with admin acces and `container` as standard user on Gitea. Store these credentials on new variables:
{{< highlight host="demo-kube-k3s" file="main.tf" >}}
```tf
variable "concourse_git_username" {
type = string
}
variable "concourse_git_password" {
type = string
sensitive = true
}
variable "container_registry_username" {
type = string
}
variable "container_registry_password" {
type = string
sensitive = true
}
```
{{< /highlight >}}
{{< highlight host="demo-kube-k3s" file="terraform.tfvars" >}}
```tf
concourse_git_username = "concourse"
concourse_git_password = "xxx"
container_registry_username = "container"
container_registry_password = "xxx"
```
{{< /highlight >}}
Apply the credentials for Concourse:
{{< highlight host="demo-kube-k3s" file="concourse.tf" >}}
```tf
resource "kubernetes_secret_v1" "concourse_registry" {
metadata {
name = "registry"
namespace = "concourse-main"
}
data = {
name = "gitea.${var.domain}"
username = var.concourse_git_username
password = var.concourse_git_password
}
depends_on = [
helm_release.concourse
]
}
resource "kubernetes_secret_v1" "concourse_git" {
metadata {
name = "git"
namespace = "concourse-main"
}
data = {
url = "https://gitea.${var.domain}"
username = var.concourse_git_username
password = var.concourse_git_password
git-user = "Concourse CI <concourse@kube.rocks>"
commit-message = "bump to %version% [ci skip]"
}
depends_on = [
helm_release.concourse
]
}
```
{{< /highlight >}}
Note as we use `concourse-main` namespace, already created by Concourse Helm installer, which is a dedicated namespace for the default team `main`. Because of that, we should keep `depends_on` to ensure the namespace is created before the secrets.
{{< alert >}}
Don't forget the `[ci skip]` in commit message, which is the commit for version bumping, otherwise you'll have an infinite build loop !
{{< /alert >}}
Then same for Flux and the namespace that will receive the app:
{{< highlight host="demo-kube-k3s" file="flux.tf" >}}
```tf
resource "kubernetes_secret_v1" "image_pull_secrets" {
for_each = toset(["flux-system", "kuberocks"])
metadata {
name = "dockerconfigjson"
namespace = each.value
}
type = "kubernetes.io/dockerconfigjson"
data = {
".dockerconfigjson" = jsonencode({
auths = {
"gitea.${var.domain}" = {
auth = base64encode("${var.container_registry_username}:${var.container_registry_password}")
}
}
})
}
}
```
{{< /highlight >}}
{{< alert >}}
Create the namespace `kuberocks` first by `k create namespace kuberocks`, or you'll get an error.
{{< /alert >}}
### The Dockerfile
Now that all required credentials are in place, we have to tell Concourse how to check our repo and build our container image. This is done through a pipeline, which is a specific Concourse YAML file.
Firstly create following files in root of your repo that we'll use for building a production ready container image:
{{< highlight host="kuberocks-demo" file=".dockerignore" >}}
```txt
**/bin/
**/obj/
```
{{< /highlight >}}
{{< highlight host="kuberocks-demo" file="Dockerfile" >}}
```Dockerfile
FROM mcr.microsoft.com/dotnet/aspnet:7.0
WORKDIR /publish
COPY /publish .
EXPOSE 80
ENTRYPOINT ["dotnet", "KubeRocks.WebApi.dll"]
```
{{< /highlight >}}
### The pipeline
Let's reuse our flux repository and create a file `pipelines/demo.yaml` with following content:
{{< highlight host="demo-kube-flux" file="pipelines/demo.yaml" >}}
```tf
resources:
- name: version
type: semver
source:
driver: git
uri: ((git.url))/kuberocks/demo
branch: main
file: version
username: ((git.username))
password: ((git.password))
git_user: ((git.git-user))
commit_message: ((git.commit-message))
- name: source-code
type: git
icon: coffee
source:
uri: ((git.url))/kuberocks/demo
branch: main
username: ((git.username))
password: ((git.password))
- name: docker-image
type: registry-image
icon: docker
source:
repository: ((registry.name))/kuberocks/demo
tag: latest
username: ((registry.username))
password: ((registry.password))
jobs:
- name: build
plan:
- get: source-code
trigger: true
- task: build-source
config:
platform: linux
image_resource:
type: registry-image
source:
repository: mcr.microsoft.com/dotnet/sdk
tag: "7.0"
inputs:
- name: source-code
path: .
outputs:
- name: binaries
path: publish
caches:
- path: /root/.nuget/packages
run:
path: /bin/sh
args:
- -ec
- |
dotnet format --verify-no-changes
dotnet build -c Release
dotnet publish src/KubeRocks.WebApi -c Release -o publish --no-restore --no-build
- task: build-image
privileged: true
config:
platform: linux
image_resource:
type: registry-image
source:
repository: concourse/oci-build-task
inputs:
- name: source-code
path: .
- name: binaries
path: publish
outputs:
- name: image
run:
path: build
- put: version
params: { bump: patch }
- put: docker-image
params:
additional_tags: version/number
image: image/image.tar
```
{{< /highlight >}}
A bit verbose compared to other CI, but it gets the job done. The price of maximum flexibility. Now in order to apply it we may need to install `fly` CLI tool. Just a matter of `scoop install concourse-fly` on Windows. Then:
```sh
# login to your Concourse instance
fly -t kuberocks login -c https://concourse.kube.rocks
# create the pipeline and active it
fly -t kuberocks set-pipeline -p demo -c pipelines/demo.yaml
fly -t kuberocks unpause-pipeline -p demo
```
A build will be trigger immediately. You can follow it on Concourse UI.
[![Concourse pipeline](concourse-pipeline.png)](concourse-pipeline.png)
If everything is ok, check in `https://gitea.kube.rocks/admin/packages`, you should see a new image appear on the list ! A new file `version` is automatically pushed in code repo in order to keep tracking of the image tag version.
[![Concourse build](concourse-build.png)](concourse-build.png)
### Automatic pipeline update
If you don't want to use fly CLI every time for any pipeline update, you maybe interested in `set_pipeline` feature. Create following file:
{{< highlight host="demo-kube-flux" file="pipelines/main.yaml" >}}
```tf
resources:
- name: ci
type: git
icon: git
source:
uri: https://github.com/kuberocks/demo-kube-flux
jobs:
- name: configure-pipelines
plan:
- get: ci
trigger: true
- set_pipeline: demo
file: ci/pipelines/demo.yaml
```
{{< /highlight >}}
Then apply it:
```sh
fly -t kuberocks set-pipeline -p main -c pipelines/main.yaml
```
Now you can manually trigger the pipeline, or wait for the next check, and it will update the demo pipeline automatically. If you're using a private repo for your pipelines, you may need to add a new secret for the git credentials and set `username` and `password` accordingly.
You almost no need of fly CLI anymore, except for adding new pipelines ! You can even go further with `set_pipeline: self` which is always an experimental feature.
## CD part
### The deployment
If you followed the previous parts of this tutorial, you should have clue about how to deploy your app. Let's create deploy it with Flux:
{{< highlight host="demo-kube-flux" file="clusters/demo/kuberocks/deploy-demo.yaml" >}}
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: demo
namespace: kuberocks
spec:
replicas: 1
selector:
matchLabels:
app: demo
template:
metadata:
labels:
app: demo
spec:
imagePullSecrets:
- name: dockerconfigjson
containers:
- name: api
image: gitea.kube.rocks/kuberocks/demo:latest
ports:
- containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
name: demo
namespace: kuberocks
labels:
app: demo
spec:
selector:
app: demo
ports:
- name: http
port: 80
---
apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
name: demo
namespace: kuberocks
spec:
entryPoints:
- websecure
routes:
- match: Host(`demo.kube.rocks`)
kind: Rule
services:
- name: demo
port: http
```
{{< /highlight >}}
Note as we have set `imagePullSecrets` in order to use fetch previously created credentials for private registry access. The rest is pretty straightforward. Once pushed, after about 1 minute, you should see your app deployed in `https://demo.kube.rocks`. Check the API response on `https://demo.kube.rocks/WeatherForecast`.
However, one last thing is missing: the automatic deployment.
### Image automation
If you checked the above flowchart, you'll note that Image automation is a separate process from Flux that only scan the registry for new image tags and push any new tag to Flux repository. Then Flux will detect the new commit in Git repository, including the new tag, and automatically deploy it to K8s.
By default, if not any strategy is set, K8s will do a **rolling deployment**, i.e. creating new replica firstly before terminating the old one. This will prevent any downtime on the condition of you set as well **readiness probe** in your pod spec, which is a later topic.
Let's define the image update automation task for main Flux repository:
{{< highlight host="demo-kube-flux" file="clusters/demo/flux-add-ons/image-update-automation.yaml" >}}
```yaml
apiVersion: image.toolkit.fluxcd.io/v1beta1
kind: ImageUpdateAutomation
metadata:
name: flux-system
namespace: flux-system
spec:
interval: 1m0s
sourceRef:
kind: GitRepository
name: flux-system
git:
checkout:
ref:
branch: main
commit:
author:
email: fluxcdbot@kube.rocks
name: fluxcdbot
messageTemplate: "{{range .Updated.Images}}{{println .}}{{end}}"
push:
branch: main
update:
path: ./clusters/demo
strategy: Setters
```
{{< /highlight >}}
Now we need to tell Image Reflector how to scan the repository, as well as the attached policy for tag update:
{{< highlight host="demo-kube-flux" file="clusters/demo/kuberocks/images-demo.yaml" >}}
```yaml
apiVersion: image.toolkit.fluxcd.io/v1beta1
kind: ImageRepository
metadata:
name: demo
namespace: flux-system
spec:
image: gitea.kube.rocks/kuberocks/demo
interval: 1m0s
secretRef:
name: dockerconfigjson
---
apiVersion: image.toolkit.fluxcd.io/v1beta1
kind: ImagePolicy
metadata:
name: demo
namespace: flux-system
spec:
imageRepositoryRef:
name: demo
namespace: flux-system
policy:
semver:
range: 0.0.x
```
{{< /highlight >}}
{{< alert >}}
As usual, don't forget `dockerconfigjson` for private registry access.
{{< /alert >}}
And finally edit the deployment to use the policy by adding a specific marker next to the image tag:
{{< highlight host="demo-kube-flux" file="clusters/demo/kuberocks/deploy-demo.yaml" >}}
```yaml
# ...
containers:
- name: api
image: gitea.kube.rocks/kuberocks/demo:latest # {"$imagepolicy": "flux-system:demo"}
# ...
```
{{< /highlight >}}
It will tell to `Image Automation` where to update the tag in the Flux repository. The format is `{"$imagepolicy": "<policy-namespace>:<policy-name>"}`.
Push the changes and wait for about 1 minute then pull the flux repo. You should see a new commit coming and `latest` should be replaced by an explicit tag like so:
{{< highlight host="demo-kube-flux" file="clusters/demo/kuberocks/deploy-demo.yaml" >}}
```yaml
# ...
containers:
- name: api
image: gitea.kube.rocks/kuberocks/demo:0.0.1 # {"$imagepolicy": "flux-system:demo"}
# ...
```
{{< /highlight >}}
Check if the pod as been correctly updated with `kgpo -n kuberocks`. Use `kd -n kuberocks deploy/demo` to check if the same tag is here and no `latest`.
```txt
Pod Template:
Labels: app=demo
Containers:
api:
Image: gitea.kube.rocks/kuberocks/demo:0.0.1
Port: 80/TCP
```
### Retest all workflow
Damn, I think we're done 🎉 ! It's time retest the full process. Add new controller endpoint from our demo project and push the code:
{{< highlight host="kuberocks-demo" file="src/KubeRocks.WebApi/Controllers/WeatherForecastController.cs" >}}
```cs
//...
public class WeatherForecastController : ControllerBase
{
//...
[HttpGet("{id}", Name = "GetWeatherForecastById")]
public WeatherForecast GetById(int id)
{
return new WeatherForecast
{
Date = DateOnly.FromDateTime(DateTime.Now.AddDays(id)),
TemperatureC = Random.Shared.Next(-20, 55),
Summary = Summaries[Random.Shared.Next(Summaries.Length)]
};
}
}
```
{{< /highlight >}}
Wait the pod to be updated, then check the new endpoint `https://demo.kube.rocks/WeatherForecast/1`. The API should return a new unique random weather forecast with the tomorrow date.
## 7th check ✅
We have done for the set-up of our automated CI/CD workflow process. Go [next part]({{< ref "/posts/18-a-beautiful-gitops-day-8" >}}) for going further with a real DB app that handle automatic migrations.

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 10 KiB

View File

@ -0,0 +1,653 @@
---
title: "A beautiful GitOps day VIII - Further deployment with DB"
date: 2023-08-26
description: "Follow this opinionated guide as starter-kit for your own Kubernetes platform..."
tags: ["kubernetes", "postgresql", "efcore"]
---
{{< lead >}}
Use GitOps workflow for building a production grade on-premise Kubernetes cluster on cheap VPS provider, with complete CI/CD 🎉
{{< /lead >}}
This is the **Part VIII** of more global topic tutorial. [Back to guide summary]({{< ref "/posts/10-a-beautiful-gitops-day" >}}) for intro.
## Real DB App sample
Let's add some DB usage to our sample app. We'll use the classical `Articles<->Authors<->Comments` relationships. First create `docker-compose.yml` file in root of demo project:
{{< highlight host="kuberocks-demo" file="docker-compose.yml" >}}
```yaml
version: "3"
services:
db:
image: postgres:15
environment:
POSTGRES_USER: main
POSTGRES_PASSWORD: main
POSTGRES_DB: main
ports:
- 5432:5432
```
{{< /highlight >}}
Launch it with `docker compose up -d` and check database running with `docker ps`.
Time to create basic code that list plenty of articles from an API endpoint. Go back to `kuberocks-demo` and create a new separate project dedicated to app logic:
```sh
dotnet new classlib -o src/KubeRocks.Application
dotnet sln add src/KubeRocks.Application
dotnet add src/KubeRocks.WebApi reference src/KubeRocks.Application
dotnet add src/KubeRocks.Application package Microsoft.EntityFrameworkCore
dotnet add src/KubeRocks.Application package Npgsql.EntityFrameworkCore.PostgreSQL
dotnet add src/KubeRocks.WebApi package Microsoft.EntityFrameworkCore.Design
```
{{< alert >}}
This is not a DDD course ! We will keep it simple and focus on Kubernetes part.
{{< /alert >}}
### Define the entities
{{< highlight host="kuberocks-demo" file="src/KubeRocks.Application/Entities/Article.cs" >}}
```cs
using System.ComponentModel.DataAnnotations;
namespace KubeRocks.Application.Entities;
public class Article
{
public int Id { get; set; }
public required User Author { get; set; }
[MaxLength(255)]
public required string Title { get; set; }
[MaxLength(255)]
public required string Slug { get; set; }
public required string Description { get; set; }
public required string Body { get; set; }
public DateTime CreatedAt { get; set; } = DateTime.UtcNow;
public DateTime UpdatedAt { get; set; } = DateTime.UtcNow;
public ICollection<Comment> Comments { get; } = new List<Comment>();
}
```
{{< /highlight >}}
{{< highlight host="kuberocks-demo" file="src/KubeRocks.Application/Entities/Comment.cs" >}}
```cs
namespace KubeRocks.Application.Entities;
public class Comment
{
public int Id { get; set; }
public required Article Article { get; set; }
public required User Author { get; set; }
public required string Body { get; set; }
public DateTime CreatedAt { get; set; } = DateTime.UtcNow;
}
```
{{< /highlight >}}
{{< highlight host="kuberocks-demo" file="src/KubeRocks.Application/Entities/User.cs" >}}
```cs
using System.ComponentModel.DataAnnotations;
namespace KubeRocks.Application.Entities;
public class User
{
public int Id { get; set; }
[MaxLength(255)]
public required string Name { get; set; }
[MaxLength(255)]
public required string Email { get; set; }
public ICollection<Article> Articles { get; } = new List<Article>();
public ICollection<Comment> Comments { get; } = new List<Comment>();
}
```
{{< /highlight >}}
{{< highlight host="kuberocks-demo" file="src/KubeRocks.Application/Contexts/AppDbContext.cs" >}}
```cs
namespace KubeRocks.Application.Contexts;
using KubeRocks.Application.Entities;
using Microsoft.EntityFrameworkCore;
public class AppDbContext : DbContext
{
public DbSet<User> Users => Set<User>();
public DbSet<Article> Articles => Set<Article>();
public DbSet<Comment> Comments => Set<Comment>();
public AppDbContext(DbContextOptions<AppDbContext> options) : base(options)
{
}
protected override void OnModelCreating(ModelBuilder modelBuilder)
{
base.OnModelCreating(modelBuilder);
modelBuilder.Entity<User>()
.HasIndex(u => u.Email).IsUnique()
;
modelBuilder.Entity<Article>()
.HasIndex(u => u.Slug).IsUnique()
;
}
}
```
{{< /highlight >}}
{{< highlight host="kuberocks-demo" file="src/KubeRocks.Application/Extensions/ServiceExtensions.cs" >}}
```cs
using KubeRocks.Application.Contexts;
using Microsoft.EntityFrameworkCore;
using Microsoft.Extensions.Configuration;
using Microsoft.Extensions.DependencyInjection;
namespace KubeRocks.Application.Extensions;
public static class ServiceExtensions
{
public static IServiceCollection AddKubeRocksServices(this IServiceCollection services, IConfiguration configuration)
{
return services.AddDbContext<AppDbContext>((options) =>
{
options.UseNpgsql(configuration.GetConnectionString("DefaultConnection"));
});
}
}
```
{{< /highlight >}}
{{< highlight host="kuberocks-demo" file="src/KubeRocks.WebApi/Program.cs" >}}
```cs
using KubeRocks.Application.Extensions;
//...
// Add services to the container.
builder.Services.AddKubeRocksServices(builder.Configuration);
//...
```
{{< /highlight >}}
{{< highlight host="kuberocks-demo" file="src/KubeRocks.WebApi/appsettings.Development.json" >}}
```json
{
//...
"ConnectionStrings": {
"DefaultConnection": "Host=localhost;Username=main;Password=main;Database=main;"
}
}
```
{{< /highlight >}}
Now as all models are created, we can generate migrations and update database accordingly:
```sh
dotnet new tool-manifest
dotnet tool install dotnet-ef
dotnet dotnet-ef -p src/KubeRocks.Application -s src/KubeRocks.WebApi migrations add InitialCreate
dotnet dotnet-ef -p src/KubeRocks.Application -s src/KubeRocks.WebApi database update
```
### Inject some dummy data
We'll use Bogus on a separate console project:
```sh
dotnet new console -o src/KubeRocks.Console
dotnet sln add src/KubeRocks.Console
dotnet add src/KubeRocks.WebApi reference src/KubeRocks.Application
dotnet add src/KubeRocks.Console package Bogus
dotnet add src/KubeRocks.Console package ConsoleAppFramework
dotnet add src/KubeRocks.Console package Respawn
```
{{< highlight host="kuberocks-demo" file="src/KubeRocks.Console/appsettings.json" >}}
```json
{
"ConnectionStrings": {
"DefaultConnection": "Host=localhost;Username=main;Password=main;Database=main;"
}
}
```
{{< /highlight >}}
{{< highlight host="kuberocks-demo" file="src/KubeRocks.Console/KubeRocks.Console.csproj" >}}
```xml
<Project Sdk="Microsoft.NET.Sdk">
<!-- ... -->
<PropertyGroup>
<!-- ... -->
<RunWorkingDirectory>$(MSBuildProjectDirectory)</RunWorkingDirectory>
</PropertyGroup>
<ItemGroup>
<None Update="appsettings.json">
<CopyToOutputDirectory>PreserveNewest</CopyToOutputDirectory>
</None>
</ItemGroup>
</Project>
```
{{< /highlight >}}
{{< highlight host="kuberocks-demo" file="src/KubeRocks.Console/Commands/DbCommand.cs" >}}
```cs
using Bogus;
using KubeRocks.Application.Contexts;
using KubeRocks.Application.Entities;
using Microsoft.EntityFrameworkCore;
using Npgsql;
using Respawn;
using Respawn.Graph;
namespace KubeRocks.Console.Commands;
[Command("db")]
public class DbCommand : ConsoleAppBase
{
private readonly AppDbContext _context;
public DbCommand(AppDbContext context)
{
_context = context;
}
[Command("migrate", "Migrate database")]
public async Task Migrate()
{
await _context.Database.MigrateAsync();
}
[Command("fresh", "Wipe data")]
public async Task FreshData()
{
await Migrate();
using var conn = new NpgsqlConnection(_context.Database.GetConnectionString());
await conn.OpenAsync();
var respawner = await Respawner.CreateAsync(conn, new RespawnerOptions
{
TablesToIgnore = new Table[] { "__EFMigrationsHistory" },
DbAdapter = DbAdapter.Postgres
});
await respawner.ResetAsync(conn);
}
[Command("seed", "Fake data")]
public async Task SeedData()
{
await Migrate();
await FreshData();
var users = new Faker<User>()
.RuleFor(m => m.Name, f => f.Person.FullName)
.RuleFor(m => m.Email, f => f.Person.Email)
.Generate(50);
await _context.Users.AddRangeAsync(users);
await _context.SaveChangesAsync();
var articles = new Faker<Article>()
.RuleFor(a => a.Title, f => f.Lorem.Sentence().TrimEnd('.'))
.RuleFor(a => a.Description, f => f.Lorem.Paragraphs(1))
.RuleFor(a => a.Body, f => f.Lorem.Paragraphs(5))
.RuleFor(a => a.Author, f => f.PickRandom(users))
.RuleFor(a => a.CreatedAt, f => f.Date.Recent(90).ToUniversalTime())
.RuleFor(a => a.Slug, (f, a) => a.Title.Replace(" ", "-").ToLowerInvariant())
.Generate(500)
.Select(a =>
{
new Faker<Comment>()
.RuleFor(a => a.Body, f => f.Lorem.Paragraphs(2))
.RuleFor(a => a.Author, f => f.PickRandom(users))
.RuleFor(a => a.CreatedAt, f => f.Date.Recent(7).ToUniversalTime())
.Generate(new Faker().Random.Number(10))
.ForEach(c => a.Comments.Add(c));
return a;
});
await _context.Articles.AddRangeAsync(articles);
await _context.SaveChangesAsync();
}
}
```
{{< /highlight >}}
{{< highlight host="kuberocks-demo" file="src/KubeRocks.Console/Program.cs" >}}
```cs
using KubeRocks.Application.Extensions;
using KubeRocks.Console.Commands;
var builder = ConsoleApp.CreateBuilder(args);
builder.ConfigureServices((ctx, services) =>
{
services.AddKubeRocksServices(ctx.Configuration);
});
var app = builder.Build();
app.AddSubCommands<DbCommand>();
app.Run();
```
{{< /highlight >}}
Then launch the command:
```sh
dotnet run --project src/KubeRocks.Console db seed
```
Ensure with your favorite DB client that data is correctly inserted.
### Define endpoint access
All that's left is to create the endpoint. Let's define all DTO first:
```sh
dotnet add src/KubeRocks.WebApi package Mapster
```
{{< highlight host="kuberocks-demo" file="src/KubeRocks.WebApi/Models/ArticleListDto.cs" >}}
```cs
namespace KubeRocks.WebApi.Models;
public class ArticleListDto
{
public required string Title { get; set; }
public required string Slug { get; set; }
public required string Description { get; set; }
public required string Body { get; set; }
public DateTime CreatedAt { get; set; }
public DateTime UpdatedAt { get; set; }
public required AuthorDto Author { get; set; }
}
```
{{< /highlight >}}
{{< highlight host="kuberocks-demo" file="src/KubeRocks.WebApi/Models/ArticleDto.cs" >}}
```cs
namespace KubeRocks.WebApi.Models;
public class ArticleDto : ArticleListDto
{
public List<CommentDto> Comments { get; set; } = new();
}
```
{{< /highlight >}}
{{< highlight host="kuberocks-demo" file="src/KubeRocks.WebApi/Models/AuthorDto.cs" >}}
```cs
namespace KubeRocks.WebApi.Models;
public class AuthorDto
{
public required string Name { get; set; }
}
```
{{< /highlight >}}
{{< highlight host="kuberocks-demo" file="src/KubeRocks.WebApi/Models/CommentDto.cs" >}}
```cs
namespace KubeRocks.WebApi.Models;
public class CommentDto
{
public required string Body { get; set; }
public DateTime CreatedAt { get; set; }
public required AuthorDto Author { get; set; }
}
```
{{< /highlight >}}
And finally the controller:
{{< highlight host="kuberocks-demo" file="src/KubeRocks.WebApi/Controllers/ArticlesController.cs" >}}
```cs
using KubeRocks.Application.Contexts;
using KubeRocks.WebApi.Models;
using Mapster;
using Microsoft.AspNetCore.Mvc;
using Microsoft.EntityFrameworkCore;
namespace KubeRocks.WebApi.Controllers;
[ApiController]
[Route("[controller]")]
public class ArticlesController
{
private readonly AppDbContext _context;
public record ArticlesResponse(IEnumerable<ArticleListDto> Articles, int ArticlesCount);
public ArticlesController(AppDbContext context)
{
_context = context;
}
[HttpGet(Name = "GetArticles")]
public async Task<ArticlesResponse> Get([FromQuery] int page = 1, [FromQuery] int size = 10)
{
var articles = await _context.Articles
.OrderByDescending(a => a.Id)
.Skip((page - 1) * size)
.Take(size)
.ProjectToType<ArticleListDto>()
.ToListAsync();
var articlesCount = await _context.Articles.CountAsync();
return new ArticlesResponse(articles, articlesCount);
}
[HttpGet("{slug}", Name = "GetArticleBySlug")]
public async Task<ActionResult<ArticleDto>> GetBySlug(string slug)
{
var article = await _context.Articles
.Include(a => a.Author)
.Include(a => a.Comments.OrderByDescending(c => c.Id))
.ThenInclude(c => c.Author)
.FirstOrDefaultAsync(a => a.Slug == slug);
if (article is null)
{
return new NotFoundResult();
}
return article.Adapt<ArticleDto>();
}
}
```
{{< /highlight >}}
Launch the app and check that `/Articles` and `/Articles/{slug}` endpoints are working as expected.
## Deployment with database
### Database connection
It's time to connect our app to the production database. Create a demo DB & user through pgAdmin and create the appropriate secret:
{{< highlight host="demo-kube-flux" file="clusters/demo/kuberocks/secrets-demo-db.yaml" >}}
```yaml
apiVersion: v1
kind: Secret
metadata:
name: demo-db
type: Opaque
data:
password: ZGVtbw==
```
{{< /highlight >}}
Generate the according sealed secret like previously chapters with `kubeseal` under `sealed-secret-demo-db.yaml` file and delete `secret-demo-db.yaml`.
```sh
cat clusters/demo/kuberocks/secret-demo.yaml | kubeseal --format=yaml --cert=pub-sealed-secrets.pem > clusters/demo/kuberocks/sealed-secret-demo.yaml
rm clusters/demo/kuberocks/secret-demo.yaml
```
Let's inject the appropriate connection string as environment variable:
{{< highlight host="demo-kube-flux" file="clusters/demo/kuberocks/deploy-demo.yaml" >}}
```yaml
# ...
spec:
# ...
template:
# ...
spec:
# ...
containers:
- name: api
# ...
env:
- name: DB_PASSWORD
valueFrom:
secretKeyRef:
name: demo-db
key: password
- name: ConnectionStrings__DefaultConnection
value: Host=postgresql-primary.postgres;Username=demo;Password='$(DB_PASSWORD)';Database=demo;
#...
```
{{< /highlight >}}
### Database migration
The DB connection should be done, but the database isn't migrated yet, the easiest is to add a migration step directly in startup app:
{{< highlight host="kuberocks-demo" file="src/KubeRocks.WebApi/Program.cs" >}}
```cs
// ...
var app = builder.Build();
using var scope = app.Services.CreateScope();
await using var dbContext = scope.ServiceProvider.GetRequiredService<AppDbContext>();
await dbContext.Database.MigrateAsync();
// ...
```
{{< /highlight >}}
The database should be migrated on first app launch on next deploy. Go to `https://demo.kube.rocks/Articles` to confirm all is ok. It should return next empty response:
```json
{
articles: []
articlesCount: 0
}
```
{{< alert >}}
Don't hesitate to abuse of `klo -n kuberocks deploy/demo` to debug any troubleshooting when pod is on error state.
{{< /alert >}}
### Database seeding
We'll try to seed the database directly from local. Change temporarily the connection string in `appsettings.json` to point to the production database:
{{< highlight host="kuberocks-demo" file="src/KubeRocks.Console/appsettings.json" >}}
```json
{
"ConnectionStrings": {
"DefaultConnection": "Host=localhost:54321;Username=demo;Password='xxx';Database=demo;"
}
}
```
{{< /highlight >}}
Then:
```sh
# forward the production database port to local
kpf svc/postgresql -n postgres 54321:tcp-postgresql
# launch the seeding command
dotnet run --project src/KubeRocks.Console db seed
```
{{< alert >}}
We may obviously never do this on real production database, but as it's only for seeding, it will never concern them.
{{< /alert >}}
Return to `https://demo.kube.rocks/Articles` to confirm articles are correctly returned.
## 8th check ✅
We now have a little more realistic app. Go [next part]({{< ref "/posts/19-a-beautiful-gitops-day-9" >}}), we'll talk about further monitoring integration and tracing with OpenTelemetry.

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 15 KiB

View File

@ -0,0 +1,492 @@
---
title: "A beautiful GitOps day IX - Monitoring & Tracing with OpenTelemetry"
date: 2023-08-27
description: "Follow this opinionated guide as starter-kit for your own Kubernetes platform..."
tags: ["kubernetes", "serilog", "metrics", "opentelemetry", "tracing", "tempo"]
---
{{< lead >}}
Use GitOps workflow for building a production grade on-premise Kubernetes cluster on cheap VPS provider, with complete CI/CD 🎉
{{< /lead >}}
This is the **Part IX** of more global topic tutorial. [Back to guide summary]({{< ref "/posts/10-a-beautiful-gitops-day" >}}) for intro.
## Better logging
Default ASP.NET logging are not very standard, let's add Serilog for real requests logging with duration and status code:
```sh
dotnet add src/KubeRocks.WebApi package Serilog.AspNetCore
```
{{< highlight host="kuberocks-demo" file="src/KubeRocks.WebApi/Program.cs" >}}
```cs
// ...
builder.Host.UseSerilog((ctx, cfg) => cfg
.ReadFrom.Configuration(ctx.Configuration)
.WriteTo.Console()
);
var app = builder.Build();
app.UseSerilogRequestLogging();
// ...
```
{{< /highlight >}}
Logs through Loki explorer stack should be far more readable.
## Zero-downtime deployment
All real production app should have liveness & readiness probes. It generally consists on particular URL which return the current health app status. We'll also include the DB access health. Let's add the standard `/healthz` endpoint, which is dead simple in ASP.NET Core:
```sh
dotnet add src/KubeRocks.WebApi package Microsoft.Extensions.Diagnostics.HealthChecks.EntityFrameworkCore
```
{{< highlight host="kuberocks-demo" file="src/KubeRocks.WebApi/Program.cs" >}}
```cs
// ...
builder.Services
.AddHealthChecks()
.AddDbContextCheck<AppDbContext>();
var app = builder.Build();
// ...
app.MapControllers();
app.MapHealthChecks("/healthz");
app.Run();
```
{{< /highlight >}}
And you're done ! Go to `https://localhost:xxxx/healthz` to confirm it's working. Try to stop the database with `docker compose stop` and check the healthz endpoint again, it should return `503` status code. Then push the code.
{{< alert >}}
The `Microsoft.Extensions.Diagnostics.HealthChecks` package is very extensible, you can add any custom check to enrich the health app status.
{{< /alert >}}
And finally the probes:
{{< highlight host="demo-kube-flux" file="clusters/demo/kuberocks/deploy-demo.yaml" >}}
```yaml
# ...
spec:
# ...
template:
# ...
spec:
# ...
containers:
- name: api
# ...
livenessProbe:
httpGet:
path: /healthz
port: 80
initialDelaySeconds: 10
periodSeconds: 10
readinessProbe:
httpGet:
path: /healthz
port: 80
initialDelaySeconds: 10
periodSeconds: 10
```
{{< /highlight >}}
{{< alert >}}
Be aware of difference between `liveness` and `readiness` probes. The first one is used to restart the pod if it's not responding, the second one is used to tell the pod is not ready to receive traffic, which is vital for preventing any downtime.
When **Rolling Update** strategy is used (the default), the old pod is not killed until the new one is ready (aka healthy).
{{< /alert >}}
## Telemetry
The last step but not least for a total integration with our monitored Kubernetes cluster is to add some telemetry to our app. We'll use `OpenTelemetry` for that, which becomes the standard library for metrics and tracing, by providing good integration to many languages.
### Application metrics
Install minimal ASP.NET Core metrics is really a no-brainer:
```sh
dotnet add src/KubeRocks.WebApi package OpenTelemetry.AutoInstrumentation --prerelease
dotnet add src/KubeRocks.WebApi package OpenTelemetry.Extensions.Hosting --prerelease
dotnet add src/KubeRocks.WebApi package OpenTelemetry.Exporter.Prometheus.AspNetCore --prerelease
```
{{< highlight host="kuberocks-demo" file="src/KubeRocks.WebApi/Program.cs" >}}
```cs
//...
builder.Services.AddOpenTelemetry()
.WithMetrics(b =>
{
b
.AddAspNetCoreInstrumentation()
.AddPrometheusExporter();
});
var app = builder.Build();
app.UseOpenTelemetryPrometheusScrapingEndpoint();
//...
```
{{< /highlight >}}
Relaunch app and go to `https://demo.kube.rocks/metrics` to confirm it's working. It should show metrics after each endpoint call, simply try `https://demo.kube.rocks/Articles`.
{{< alert >}}
.NET metrics are currently pretty basic, but the next .NET 8 version will provide far better metrics from internal components allowing some [useful dashboard](https://github.com/JamesNK/aspnetcore-grafana).
{{< /alert >}}
#### Hide internal endpoints
After push, you should see `/metrics` live. Let's step back and exclude this internal path from external public access. We have 2 options:
* Force on the app side to listen only on private network on `/metrics` and `/healthz` endpoints
* Push all the app logic under `/api` path and let Traefik to include only this path
Let's do the option 2. Add the `api/` prefix to controllers to expose:
{{< highlight host="kuberocks-demo" file="src/KubeRocks.WebApi/Controllers/ArticlesController.cs" >}}
```cs
//...
[ApiController]
[Route("api/[controller]")]
public class ArticlesController {
//...
}
```
{{< /highlight >}}
Let's move Swagger UI under `/api` path too:
{{< highlight host="kuberocks-demo" file="src/KubeRocks.WebApi/Program.cs" >}}
```cs
//...
if (app.Environment.IsDevelopment())
{
app.UseSwagger(c =>
{
c.RouteTemplate = "/api/{documentName}/swagger.json";
});
app.UseSwaggerUI(c =>
{
c.SwaggerEndpoint("v1/swagger.json", "KubeRocks v1");
c.RoutePrefix = "api";
});
}
//...
```
{{< /highlight >}}
{{< alert >}}
You may use ASP.NET API versioning, which work the same way with [versioning URL path](https://github.com/dotnet/aspnet-api-versioning/wiki/Versioning-via-the-URL-Path).
{{< /alert >}}
All is left is to include only the endpoints under `/api` prefix on Traefik IngressRoute:
{{< highlight host="demo-kube-flux" file="clusters/demo/kuberocks/deploy-demo.yaml" >}}
```yaml
#...
apiVersion: traefik.io/v1alpha1
kind: IngressRoute
#...
spec:
#...
routes:
- match: Host(`demo.kube.rocks`) && PathPrefix(`/api`)
#...
```
{{< /highlight >}}
Now the new URL is `https://demo.kube.rocks/api/Articles`. Any path different from `api` will return the Traefik 404 page, and internal paths as `https://demo.kube.rocks/metrics` is not accessible anymore. An other additional advantage of this config, it's simple to put a separated frontend project under `/` path (covered later), which can use the under API without any CORS problem natively.
#### Prometheus integration
It's only a matter of new ServiceMonitor config:
{{< highlight host="demo-kube-flux" file="clusters/demo/kuberocks/deploy-demo.yaml" >}}
```yaml
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: demo
namespace: kuberocks
spec:
endpoints:
- targetPort: 80
selector:
matchLabels:
app: demo
```
{{< /highlight >}}
After some time, You can finally use the Prometheus dashboard to query your app metrics. Use `{namespace="kuberocks",job="demo"}` PromQL query to list all available metrics:
[![Prometheus metrics](prometheus-graph.png)](prometheus-graph.png)
### Application tracing
A more useful case for OpenTelemetry is to integrate it to a tracing backend. [Tempo](https://grafana.com/oss/tempo/) is a good candidate, which is a free open-source alternative to Jaeger, simpler to install by requiring a simple s3 as storage, and compatible to many protocols as Jaeger, OTLP, Zipkin.
#### Installing Tempo
It's another Helm Chart to install as well as the related grafana datasource:
{{< highlight host="demo-kube-k3s" file="tracing.tf" >}}
```tf
resource "kubernetes_namespace_v1" "tracing" {
metadata {
name = "tracing"
}
}
resource "helm_release" "tempo" {
chart = "tempo"
version = "1.5.1"
repository = "https://grafana.github.io/helm-charts"
name = "tempo"
namespace = kubernetes_namespace_v1.tracing.metadata[0].name
set {
name = "tempo.storage.trace.backend"
value = "s3"
}
set {
name = "tempo.storage.trace.s3.bucket"
value = var.s3_bucket
}
set {
name = "tempo.storage.trace.s3.endpoint"
value = var.s3_endpoint
}
set {
name = "tempo.storage.trace.s3.region"
value = var.s3_region
}
set {
name = "tempo.storage.trace.s3.access_key"
value = var.s3_access_key
}
set {
name = "tempo.storage.trace.s3.secret_key"
value = var.s3_secret_key
}
set {
name = "serviceMonitor.enabled"
value = "true"
}
}
resource "kubernetes_config_map_v1" "tempo_grafana_datasource" {
metadata {
name = "tempo-grafana-datasource"
namespace = kubernetes_namespace_v1.monitoring.metadata[0].name
labels = {
grafana_datasource = "1"
}
}
data = {
"datasource.yaml" = <<EOF
apiVersion: 1
datasources:
- name: Tempo
type: tempo
uid: tempo
url: http://tempo.tracing:3100/
access: proxy
EOF
}
}
```
{{< /highlight >}}
Use the *Test* button on `https://grafana.kube.rocks/connections/datasources/edit/tempo` to confirm it's working.
#### OpenTelemetry
Let's firstly add another instrumentation package specialized for Npgsql driver used by EF Core to translate queries to PostgreSQL:
```sh
dotnet add src/KubeRocks.WebApi package Npgsql.OpenTelemetry
```
Then bridge all needed instrumentation as well as the OTLP exporter:
{{< highlight host="kuberocks-demo" file="src/KubeRocks.WebApi/Program.cs" >}}
```cs
//...
builder.Services.AddOpenTelemetry()
//...
.WithTracing(b =>
{
b
.SetResourceBuilder(ResourceBuilder
.CreateDefault()
.AddService("KubeRocks.Demo")
.AddTelemetrySdk()
)
.AddAspNetCoreInstrumentation(b =>
{
b.Filter = ctx =>
{
return ctx.Request.Path.StartsWithSegments("/api");
};
})
.AddEntityFrameworkCoreInstrumentation()
.AddNpgsql()
.AddOtlpExporter();
});
//...
```
{{< /highlight >}}
Then add the exporter endpoint config in order to push traces to Tempo:
{{< highlight host="demo-kube-flux" file="clusters/demo/kuberocks/deploy-demo.yaml" >}}
```yaml
#...
spec:
#...
template:
#...
spec:
#...
containers:
- name: api
#...
env:
#...
- name: OTEL_EXPORTER_OTLP_ENDPOINT
value: http://tempo.tracing:4317
```
{{< /highlight >}}
Call some API URLs and get back to Grafana / Explore, select Tempo data source and search for query traces. You should see something like this:
[![Tempo search](tempo-search.png)](tempo-search.png)
Click on one specific trace to get details. You can go through HTTP requests, EF Core time response, and even underline SQL queries thanks to Npgsql instrumentation:
[![Tempo traces](tempo-trace.png)](tempo-trace.png)
#### Correlation with Loki
It would be nice to have directly access to trace from logs through Loki search, as it's clearly a more seamless way than searching inside Tempo.
For that we need to do 2 things:
* Add the `TraceId` to logs in order to correlate trace with log. In ASP.NET Core, a `TraceId` correspond to a unique request, allowing isolation analyze for each request.
* Create a link in Grafana from the generated `TraceId` inside log and the detail Tempo view trace.
So firstly, let's take care of the app part by attaching the OpenTelemetry TraceId to Serilog:
```sh
dotnet add src/KubeRocks.WebApi package Serilog.Enrichers.Span
```
{{< highlight host="kuberocks-demo" file="src/KubeRocks.WebApi/Program.cs" >}}
```cs
//...
builder.Host.UseSerilog((ctx, cfg) => cfg
.ReadFrom.Configuration(ctx.Configuration)
.Enrich.WithSpan()
.WriteTo.Console(
outputTemplate: "[{Timestamp:HH:mm:ss} {Level:u3}] |{TraceId}| {Message:lj}{NewLine}{Exception}"
)
);
//...
```
{{< /highlight >}}
It should now generate that kind of logs:
```txt
[23:22:57 INF] |aa51c7254aaa10a3f679a511444a5da5| HTTP GET /api/Articles responded 200 in 301.7052 ms
```
Now Let's adapt the Loki datasource by creating a derived field inside `jsonData` property:
{{< highlight host="demo-kube-k3s" file="logging.tf" >}}
```tf
resource "kubernetes_config_map_v1" "loki_grafana_datasource" {
#...
data = {
"datasource.yaml" = <<EOF
apiVersion: 1
datasources:
- name: Loki
#...
jsonData:
derivedFields:
- datasourceName: Tempo
matcherRegex: "\\|(\\w+)\\|"
name: TraceID
url: "$$${__value.raw}"
datasourceUid: tempo
EOF
}
}
```
{{< /highlight >}}
This where the magic happens. The `\|(\w+)\|` regex will match and extract the `TraceId` inside the log, which is inside pipes, and create a link to Tempo trace detail view.
[![Derived fields](loki-derived-fields.png)](loki-derived-fields.png)
This will give us the nice link button as soon as you you click a log detail:
[![Derived fields](loki-tempo-link.png)](loki-tempo-link.png)
## 9th check ✅
We have done for the basic functional telemetry ! There are infinite things to cover in this subject, but it's enough for this endless guide. Go [next part]({{< ref "/posts/20-a-beautiful-gitops-day-10" >}}), we'll talk about feature testing, code metrics and code coverage.

Binary file not shown.

After

Width:  |  Height:  |  Size: 33 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 366 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 349 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 203 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 328 KiB

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 9.8 KiB

View File

@ -0,0 +1,750 @@
---
title: "A beautiful GitOps day X - QA with testing & code metrics"
date: 2023-08-28
description: "Follow this opinionated guide as starter-kit for your own Kubernetes platform..."
tags: ["kubernetes", "testing", "sonarqube", "xunit", "coverage"]
---
{{< lead >}}
Use GitOps workflow for building a production grade on-premise Kubernetes cluster on cheap VPS provider, with complete CI/CD 🎉
{{< /lead >}}
This is the **Part X** of more global topic tutorial. [Back to guide summary]({{< ref "/posts/10-a-beautiful-gitops-day" >}}) for intro.
## Code Metrics
SonarQube is leading the code metrics industry for a long time, embracing full Open Core model, and the community edition it's completely free of charge even for commercial use. It covers advanced code analysis, code coverage, code duplication, code smells, security vulnerabilities, etc. It ensures high quality code and help to keep it that way.
### SonarQube installation
SonarQube has its dedicated Helm chart which is perfect for us. However, it's the most resource hungry component of our development stack so far (because built with Java ? End of troll), so be sure to deploy it on almost empty free node (which should be ok with 3 workers), maybe a dedicated one. In fact, it's the last Helm chart for this tutorial, I promise!
Create dedicated database for SonarQube same as usual.
{{< highlight host="demo-kube-k3s" file="main.tf" >}}
```tf
variable "sonarqube_db_password" {
type = string
sensitive = true
}
```
{{< /highlight >}}
{{< highlight host="demo-kube-k3s" file="terraform.tfvars" >}}
```tf
sonarqube_db_password = "xxx"
```
{{< /highlight >}}
{{< highlight host="demo-kube-k3s" file="sonarqube.tf" >}}
```tf
resource "kubernetes_namespace_v1" "sonarqube" {
metadata {
name = "sonarqube"
}
}
resource "helm_release" "sonarqube" {
chart = "sonarqube"
version = "10.1.0+628"
repository = "https://SonarSource.github.io/helm-chart-sonarqube"
name = "sonarqube"
namespace = kubernetes_namespace_v1.sonarqube.metadata[0].name
set {
name = "prometheusMonitoring.podMonitor.enabled"
value = "true"
}
set {
name = "postgresql.enabled"
value = "false"
}
set {
name = "jdbcOverwrite.enabled"
value = "true"
}
set {
name = "jdbcOverwrite.jdbcUrl"
value = "jdbc:postgresql://postgresql-primary.postgres/sonarqube"
}
set {
name = "jdbcOverwrite.jdbcUsername"
value = "sonarqube"
}
set {
name = "jdbcOverwrite.jdbcPassword"
value = var.sonarqube_db_password
}
}
resource "kubernetes_manifest" "sonarqube_ingress" {
manifest = {
apiVersion = "traefik.io/v1alpha1"
kind = "IngressRoute"
metadata = {
name = "sonarqube"
namespace = kubernetes_namespace_v1.sonarqube.metadata[0].name
}
spec = {
entryPoints = ["websecure"]
routes = [
{
match = "Host(`sonarqube.${var.domain}`)"
kind = "Rule"
services = [
{
name = "sonarqube-sonarqube"
port = "http"
}
]
}
]
}
}
}
```
{{< /highlight >}}
Be sure to disable the PostgreSQL sub chart and use our self-hosted cluster with both `postgresql.enabled` and `jdbcOverwrite.enabled`. If needed, set proper `tolerations` and `nodeSelector` for deploying on a dedicated node.
The installation take many minutes, be patient. Once done, you can access SonarQube on `https://sonarqube.kube.rocks` and login with `admin` / `admin`.
### Project configuration
Firstly create a new project through SonarQube UI and retain the project key which is his identifier. Then create a **global analysis token** named `Concourse CI` that will be used for CI integration from your user account under `/account/security`.
Now we need to create a Kubernetes secret which contains this token value for Concourse CI, for usage inside the pipeline. The token is the one generated above.
Add a new concourse terraform variable for the token:
{{< highlight host="demo-kube-k3s" file="main.tf" >}}
```tf
variable "concourse_analysis_token" {
type = string
sensitive = true
}
```
{{< /highlight >}}
{{< highlight host="demo-kube-k3s" file="terraform.tfvars" >}}
```tf
concourse_analysis_token = "xxx"
```
{{< /highlight >}}
The secret:
{{< highlight host="demo-kube-k3s" file="concourse.tf" >}}
```tf
resource "kubernetes_secret_v1" "concourse_sonarqube" {
metadata {
name = "sonarqube"
namespace = "concourse-main"
}
data = {
url = "https://sonarqube.${var.domain}"
analysis-token = var.concourse_analysis_token
}
depends_on = [
helm_release.concourse
]
}
```
{{< /highlight >}}
We are ready to tackle the pipeline for integration.
### SonarScanner for .NET
As we use a dotnet project, we will use the official SonarQube scanner for .net. But sadly, as it's only a .NET CLI wrapper, it requires a java runtime to run and there is no official SonarQube docker image which contains both .NET SDK and Java runtime. But we have a CI now, so we can build our own QA image on our own private registry.
Create a new Gitea repo dedicated for any custom docker images with this one single Dockerfile:
{{< highlight host="demo-kube-images" file="dotnet-qa.dockerfile" >}}
```Dockerfile
FROM mcr.microsoft.com/dotnet/sdk:7.0
RUN apt-get update && apt-get install -y ca-certificates-java && apt-get install -y \
openjdk-17-jre-headless \
unzip \
&& rm -rf /var/lib/apt/lists/*
RUN dotnet tool install --global dotnet-sonarscanner
RUN dotnet tool install --global dotnet-coverage
ENV PATH="${PATH}:/root/.dotnet/tools"
```
{{< /highlight >}}
Note as we add the `dotnet-sonarscanner` tool to the path, we can use it directly in the pipeline without any extra step. I'll also add `dotnet-coverage` global tool for code coverage generation that we'll use later.
Then the pipeline:
{{< highlight host="demo-kube-flux" file="pipelines/images.yaml" >}}
```yml
resources:
- name: docker-images-git
type: git
icon: coffee
source:
uri: https://gitea.kube.rocks/kuberocks/docker-images
branch: main
- name: dotnet-qa-image
type: registry-image
icon: docker
source:
repository: ((registry.name))/kuberocks/dotnet-qa
tag: "7.0"
username: ((registry.username))
password: ((registry.password))
jobs:
- name: dotnet-qa
plan:
- get: docker-images-git
- task: build-image
privileged: true
config:
platform: linux
image_resource:
type: registry-image
source:
repository: concourse/oci-build-task
inputs:
- name: docker-images-git
outputs:
- name: image
params:
DOCKERFILE: docker-images-git/dotnet-qa.dockerfile
run:
path: build
- put: dotnet-qa-image
params:
image: image/image.tar
```
{{< /highlight >}}
Update the `main.yaml` pipeline to add the new job, then trigger it manually from Concourse UI to add the new above pipeline:
{{< highlight host="demo-kube-flux" file="pipelines/main.yaml" >}}
```tf
#...
jobs:
- name: configure-pipelines
plan:
#...
- set_pipeline: images
file: ci/pipelines/images.yaml
```
{{< /highlight >}}
The pipeline should now start and build the image, trigger it manually if needed on Concourse UI. Once done, you can check it on your Gitea container packages that the new image `gitea.kube.rocks/kuberocks/dotnet-qa` is here.
### Concourse pipeline integration
It's finally time to reuse this QA image in our Concourse demo project pipeline. Update it accordingly:
{{< highlight host="demo-kube-flux" file="pipelines/demo.yaml" >}}
```yml
#...
jobs:
- name: build
plan:
- get: source-code
trigger: true
- task: build-source
config:
platform: linux
image_resource:
type: registry-image
source:
repository: ((registry.name))/kuberocks/dotnet-qa
tag: "7.0"
username: ((registry.username))
password: ((registry.password))
#...
run:
path: /bin/sh
args:
- -ec
- |
dotnet format --verify-no-changes
dotnet sonarscanner begin /k:"KubeRocks-Demo" /d:sonar.host.url="((sonarqube.url))" /d:sonar.token="((sonarqube.analysis-token))"
dotnet build -c Release
dotnet sonarscanner end /d:sonar.token="((sonarqube.analysis-token))"
dotnet publish src/KubeRocks.WebApi -c Release -o publish --no-restore --no-build
#...
```
{{< /highlight >}}
Note as we now use the `dotnet-qa` image and surround the build step by `dotnet sonarscanner begin` and `dotnet sonarscanner end` commands with appropriate credentials allowing Sonar CLI to send report to our SonarQube instance. Trigger the pipeline manually, all should pass, and the result will be pushed to SonarQube.
[![SonarQube](sonarqube-dashboard.png)](sonarqube-dashboard.png)
## Feature testing
Let's cover the feature testing by calling the API against a real database. This is the opportunity to tackle the code coverage as well.
### xUnit
First add a dedicated database for test in the docker compose file as we won't interfere with the development database:
{{< highlight host="kuberocks-demo" file="docker-compose.yml" >}}
```yaml
version: "3"
services:
#...
db_test:
image: postgres:15
environment:
POSTGRES_USER: main
POSTGRES_PASSWORD: main
POSTGRES_DB: main
ports:
- 54320:5432
```
{{< /highlight >}}
Expose the startup service of minimal API:
{{< highlight host="kuberocks-demo" file="src/KubeRocks.WebApi/Program.cs" >}}
```cs
//...
public partial class Program
{
protected Program() { }
}
```
{{< /highlight >}}
Then add a testing JSON environment file for accessing our database `db_test` from the docker-compose.yml:
{{< highlight host="kuberocks-demo" file="src/KubeRocks.WebApi/appsettings.Testing.json" >}}
```json
{
"ConnectionStrings": {
"DefaultConnection": "Host=localhost:54320;Username=main;Password=main;Database=main;"
}
}
```
{{< /highlight >}}
Now the test project:
```sh
dotnet new xunit -o tests/KubeRocks.FeatureTests
dotnet sln add tests/KubeRocks.FeatureTests
dotnet add tests/KubeRocks.FeatureTests reference src/KubeRocks.WebApi
dotnet add tests/KubeRocks.FeatureTests package Microsoft.AspNetCore.Mvc.Testing
dotnet add tests/KubeRocks.FeatureTests package Respawn
dotnet add tests/KubeRocks.FeatureTests package FluentAssertions
```
The `WebApplicationFactory` that will use our testing environment:
{{< highlight host="kuberocks-demo" file="tests/KubeRocks.FeatureTests/KubeRocksApiFactory.cs" >}}
```cs
using Microsoft.AspNetCore.Mvc.Testing;
using Microsoft.Extensions.Hosting;
namespace KubeRocks.FeatureTests;
public class KubeRocksApiFactory : WebApplicationFactory<Program>
{
protected override IHost CreateHost(IHostBuilder builder)
{
builder.UseEnvironment("Testing");
return base.CreateHost(builder);
}
}
```
{{< /highlight >}}
The base test class for all test classes that manages database cleanup thanks to `Respawn`:
{{< highlight host="kuberocks-demo" file="tests/KubeRocks.FeatureTests/TestBase.cs" >}}
```cs
using KubeRocks.Application.Contexts;
using Microsoft.EntityFrameworkCore;
using Microsoft.Extensions.DependencyInjection;
using Npgsql;
using Respawn;
using Respawn.Graph;
namespace KubeRocks.FeatureTests;
[Collection("Sequencial")]
public class TestBase : IClassFixture<KubeRocksApiFactory>, IAsyncLifetime
{
protected KubeRocksApiFactory Factory { get; private set; }
protected TestBase(KubeRocksApiFactory factory)
{
Factory = factory;
}
public async Task RefreshDatabase()
{
using var scope = Factory.Services.CreateScope();
using var conn = new NpgsqlConnection(
scope.ServiceProvider.GetRequiredService<AppDbContext>().Database.GetConnectionString()
);
await conn.OpenAsync();
var respawner = await Respawner.CreateAsync(conn, new RespawnerOptions
{
TablesToIgnore = new Table[] { "__EFMigrationsHistory" },
DbAdapter = DbAdapter.Postgres
});
await respawner.ResetAsync(conn);
}
public Task InitializeAsync()
{
return RefreshDatabase();
}
public Task DisposeAsync()
{
return Task.CompletedTask;
}
}
```
{{< /highlight >}}
Note the `Collection` attribute that will force the test classes to run sequentially, required as we will use the same database for all tests.
Finally, the tests for the 2 endpoints of our articles controller:
{{< highlight host="kuberocks-demo" file="tests/KubeRocks.FeatureTests/Articles/ArticlesListTests.cs" >}}
```cs
using System.Net.Http.Json;
using FluentAssertions;
using KubeRocks.Application.Contexts;
using KubeRocks.Application.Entities;
using KubeRocks.WebApi.Models;
using Microsoft.Extensions.DependencyInjection;
using static KubeRocks.WebApi.Controllers.ArticlesController;
namespace KubeRocks.FeatureTests.Articles;
public class ArticlesListTests : TestBase
{
public ArticlesListTests(KubeRocksApiFactory factory) : base(factory) { }
[Fact]
public async Task Can_Paginate_Articles()
{
using (var scope = Factory.Services.CreateScope())
{
var db = scope.ServiceProvider.GetRequiredService<AppDbContext>();
var user = db.Users.Add(new User
{
Name = "John Doe",
Email = "john.doe@email.com"
});
db.Articles.AddRange(Enumerable.Range(1, 50).Select(i => new Article
{
Title = $"Test Title {i}",
Slug = $"test-title-{i}",
Description = "Test Description",
Body = "Test Body",
Author = user.Entity,
}));
await db.SaveChangesAsync();
}
var response = await Factory.CreateClient().GetAsync("/api/Articles?page=1&size=20");
response.EnsureSuccessStatusCode();
var body = (await response.Content.ReadFromJsonAsync<ArticlesResponse>())!;
body.Articles.Count().Should().Be(20);
body.ArticlesCount.Should().Be(50);
body.Articles.First().Should().BeEquivalentTo(new
{
Title = "Test Title 50",
Description = "Test Description",
Body = "Test Body",
Author = new
{
Name = "John Doe"
},
});
}
[Fact]
public async Task Can_Get_Article()
{
using (var scope = Factory.Services.CreateScope())
{
var db = scope.ServiceProvider.GetRequiredService<AppDbContext>();
db.Articles.Add(new Article
{
Title = $"Test Title",
Slug = $"test-title",
Description = "Test Description",
Body = "Test Body",
Author = new User
{
Name = "John Doe",
Email = "john.doe@email.com"
}
});
await db.SaveChangesAsync();
}
var response = await Factory.CreateClient().GetAsync($"/api/Articles/test-title");
response.EnsureSuccessStatusCode();
var body = (await response.Content.ReadFromJsonAsync<ArticleDto>())!;
body.Should().BeEquivalentTo(new
{
Title = "Test Title",
Description = "Test Description",
Body = "Test Body",
Author = new
{
Name = "John Doe"
},
});
}
}
```
{{< /highlight >}}
Ensure all tests passes with `dotnet test`.
### CI tests & code coverage
Now we need to integrate the tests in our CI pipeline. As we testing with a real database, create a new `demo_test` database through pgAdmin with basic `test` / `test` credentials.
{{< alert >}}
In real world scenario, you should use a dedicated database for testing, and not the same as production.
{{< /alert >}}
Let's edit the pipeline accordingly for tests:
{{< highlight host="demo-kube-flux" file="pipelines/demo.yaml" >}}
```yml
#...
jobs:
- name: build
plan:
#...
- task: build-source
config:
#...
params:
ConnectionStrings__DefaultConnection: "Host=postgres-primary.postgres;Username=test;Password=test;Database=demo_test"
run:
path: /bin/sh
args:
- -ec
- |
dotnet format --verify-no-changes
dotnet sonarscanner begin /k:"KubeRocks-Demo" /d:sonar.host.url="((sonarqube.url))" /d:sonar.token="((sonarqube.analysis-token))" /d:sonar.cs.vscoveragexml.reportsPaths=coverage.xml
dotnet build -c Release
dotnet-coverage collect 'dotnet test -c Release --no-restore --no-build --verbosity=normal' -f xml -o 'coverage.xml'
dotnet sonarscanner end /d:sonar.token="((sonarqube.analysis-token))"
dotnet publish src/KubeRocks.WebApi -c Release -o publish --no-restore --no-build
#...
```
{{< /highlight >}}
Note as we already include code coverage by using `dotnet-coverage` tool. Don't forget to precise the path of `coverage.xml` to `sonarscanner` CLI too. It's time to push our code with tests or trigger the pipeline manually to test our integration tests.
If all goes well, you should see the tests results on SonarQube with some coverage done:
[![SonarQube](sonarqube-tests.png)](sonarqube-tests.png)
Coverage detail:
[![SonarQube](sonarqube-cc.png)](sonarqube-cc.png)
You may exclude some files from analysis by adding some project properties:
{{< highlight host="kuberocks-demo" file="src/KubeRocks.Application/KubeRocks.Application.csproj" >}}
```xml
<Project Sdk="Microsoft.NET.Sdk">
<!-- ... -->
<ItemGroup>
<SonarQubeSetting Include="sonar.exclusions">
<Value>appsettings.Testing.json</Value>
</SonarQubeSetting>
</ItemGroup>
</Project>
```
{{< /highlight >}}
Same for coverage:
{{< highlight host="kuberocks-demo" file="src/KubeRocks.Application/KubeRocks.Application.csproj" >}}
```xml
<Project Sdk="Microsoft.NET.Sdk">
<!-- ... -->
<ItemGroup>
<SonarQubeSetting Include="sonar.coverage.exclusions">
<Value>Migrations/**/*</Value>
</SonarQubeSetting>
</ItemGroup>
</Project>
```
{{< /highlight >}}
### Sonar Analyzer
You can enforce many default sonar rules by using [Sonar Analyzer](https://github.com/SonarSource/sonar-dotnet) directly locally before any code push.
Create this file at the root of your solution for enabling Sonar Analyzer globally:
{{< highlight host="kuberocks-demo" file="Directory.Build.props" >}}
```xml
<Project>
<PropertyGroup>
<AnalysisLevel>latest-Recommended</AnalysisLevel>
<TreatWarningsAsErrors>true</TreatWarningsAsErrors>
<CodeAnalysisTreatWarningsAsErrors>true</CodeAnalysisTreatWarningsAsErrors>
</PropertyGroup>
<ItemGroup>
<PackageReference
Include="SonarAnalyzer.CSharp"
Version="9.8.0.76515"
PrivateAssets="all"
Condition="$(MSBuildProjectExtension) == '.csproj'"
/>
</ItemGroup>
</Project>
```
{{< /highlight >}}
Any rule violation is treated as error at project building, which block the CI before execution of tests. Use `latest-All` as `AnalysisLevel` for psychopath mode.
At this stage as soon this file is added, you should see some errors at building. If you use VSCode with correct C# extension, these errors will be highlighted directly in the editor. Here are some fixes:
{{< highlight host="kuberocks-demo" file="src/KubeRocks.WebApi/Program.cs" >}}
```cs
#...
builder.Host.UseSerilog((ctx, cfg) => cfg
.ReadFrom.Configuration(ctx.Configuration)
.Enrich.WithSpan()
.WriteTo.Console(
outputTemplate: "[{Timestamp:HH:mm:ss} {Level:u3}] |{TraceId}| {Message:lj}{NewLine}{Exception}",
// Enforce culture
formatProvider: CultureInfo.InvariantCulture
)
);
#...
```
{{< /highlight >}}
Delete `WeatherForecastController.cs`.
{{< highlight host="kuberocks-demo" file="tests/KubeRocks.FeatureTests.csproj" >}}
```xml
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<!-- ... -->
<NoWarn>CA1707</NoWarn>
</PropertyGroup>
<!-- ... -->
</Project>
```
{{< /highlight >}}
## 10th check ✅
We have done for code quality process. Go to the [final part]({{< ref "/posts/21-a-beautiful-gitops-day-11" >}}) with load testing, and some frontend !

Binary file not shown.

After

Width:  |  Height:  |  Size: 389 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 227 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 386 KiB

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 8.6 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 135 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 198 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 404 KiB

View File

@ -0,0 +1,984 @@
---
title: "A beautiful GitOps day XI - Load testing & Frontend"
date: 2023-08-29
description: "Follow this opinionated guide as starter-kit for your own Kubernetes platform..."
tags: ["kubernetes", "load-testing", "k6", "frontend", "vue", "typescript", "openapi"]
---
{{< lead >}}
Use GitOps workflow for building a production grade on-premise Kubernetes cluster on cheap VPS provider, with complete CI/CD 🎉
{{< /lead >}}
This is the **Part XI** of more global topic tutorial. [Back to guide summary]({{< ref "/posts/10-a-beautiful-gitops-day" >}}) for intro.
## Load testing
When it comes load testing, k6 is a perfect tool for this job and integrate with many real time series database integration like Prometheus or InfluxDB. As we already have Prometheus, let's use it and avoid us a separate InfluxDB installation. First be sure to allow remote write by enable `enableRemoteWriteReceiver` in the Prometheus Helm chart. It should be already done if you follow this tutorial.
### K6
We'll reuse our flux repo and add some manifests for defining the load testing scenario. Firstly describe the scenario inside `ConfigMap` that scrape all articles and then each article:
{{< highlight host="demo-kube-flux" file="jobs/demo-k6.yaml" >}}
```yml
apiVersion: v1
kind: ConfigMap
metadata:
name: scenario
namespace: kuberocks
data:
script.js: |
import http from "k6/http";
import { check } from "k6";
export default function () {
const size = 10;
let page = 1;
let articles = []
do {
const res = http.get(`${__ENV.API_URL}/Articles?page=${page}&size=${size}`);
check(res, {
"status is 200": (r) => r.status == 200,
});
articles = res.json().articles;
page++;
articles.forEach((article) => {
const res = http.get(`${__ENV.API_URL}/Articles/${article.slug}`);
check(res, {
"status is 200": (r) => r.status == 200,
});
});
}
while (articles.length > 0);
}
```
{{< /highlight >}}
Finally, add the k6 `Job` in the same file and configure it for Prometheus usage and mounting above scenario:
{{< highlight host="demo-kube-flux" file="jobs/demo-k6.yaml" >}}
```yml
#...
---
apiVersion: batch/v1
kind: Job
metadata:
name: k6
namespace: kuberocks
spec:
ttlSecondsAfterFinished: 0
template:
spec:
restartPolicy: Never
containers:
- name: run
image: grafana/k6
env:
- name: API_URL
value: https://demo.kube.rocks/api
- name: K6_VUS
value: "30"
- name: K6_DURATION
value: 1m
- name: K6_PROMETHEUS_RW_SERVER_URL
value: http://prometheus-operated.monitoring:9090/api/v1/write
command:
["k6", "run", "-o", "experimental-prometheus-rw", "script.js"]
volumeMounts:
- name: scenario
mountPath: /home/k6
tolerations:
- key: node-role.kubernetes.io/runner
operator: Exists
effect: NoSchedule
nodeSelector:
node-role.kubernetes.io/runner: "true"
volumes:
- name: scenario
configMap:
name: scenario
```
{{< /highlight >}}
Use appropriate `tolerations` and `nodeSelector` for running the load testing in a node which have free CPU resource. You can play with `K6_VUS` and `K6_DURATION` environment variables in order to change the level of load testing.
Then you can launch the job with `ka jobs/demo-k6.yaml`. Check quickly that the job is running via `klo -n kuberocks job/k6`:
```txt
/\ |‾‾| /‾‾/ /‾‾/
/\ / \ | |/ / / /
/ \/ \ | ( / ‾‾\
/ \ | |\ \ | (‾) |
/ __________ \ |__| \__\ \_____/ .io
execution: local
script: script.js
output: Prometheus remote write (http://prometheus-operated.monitoring:9090/api/v1/write)
scenarios: (100.00%) 1 scenario, 30 max VUs, 1m30s max duration (incl. graceful stop):
* default: 30 looping VUs for 1m0s (gracefulStop: 30s)
```
After 1 minute of run, job should finish and show some raw result:
```txt
✓ status is 200
checks.........................: 100.00% ✓ 17748 ✗ 0
data_received..................: 404 MB 6.3 MB/s
data_sent......................: 1.7 MB 26 kB/s
http_req_blocked...............: avg=242.43µs min=223ns med=728ns max=191.27ms p(90)=1.39µs p(95)=1.62µs
http_req_connecting............: avg=13.13µs min=0s med=0s max=9.48ms p(90)=0s p(95)=0s
http_req_duration..............: avg=104.22ms min=28.9ms med=93.45ms max=609.86ms p(90)=162.04ms p(95)=198.93ms
{ expected_response:true }...: avg=104.22ms min=28.9ms med=93.45ms max=609.86ms p(90)=162.04ms p(95)=198.93ms
http_req_failed................: 0.00% ✓ 0 ✗ 17748
http_req_receiving.............: avg=13.76ms min=32.71µs med=6.49ms max=353.13ms p(90)=36.04ms p(95)=51.36ms
http_req_sending...............: avg=230.04µs min=29.79µs med=93.16µs max=25.75ms p(90)=201.92µs p(95)=353.61µs
http_req_tls_handshaking.......: avg=200.57µs min=0s med=0s max=166.91ms p(90)=0s p(95)=0s
http_req_waiting...............: avg=90.22ms min=14.91ms med=80.76ms max=609.39ms p(90)=138.3ms p(95)=169.24ms
http_reqs......................: 17748 276.81409/s
iteration_duration.............: avg=5.39s min=3.97s med=5.35s max=7.44s p(90)=5.94s p(95)=6.84s
iterations.....................: 348 5.427727/s
vus............................: 7 min=7 max=30
vus_max........................: 30 min=30 max=30
```
As we use Prometheus for outputting the result, we can visualize it easily with Grafana. You just have to import [this dashboard](https://grafana.com/grafana/dashboards/18030-official-k6-test-result/):
[![Grafana](grafana-k6.png)](grafana-k6.png)
As we use Kubernetes, increase the loading performance horizontally is dead easy. Go to the deployment configuration of demo app for increasing replicas count, as well as Traefik, and compare the results.
### Load balancing database
So far, we only load balanced the stateless API, but what about the database part ? We have set up a replicated PostgreSQL cluster, however we have no use of the replica that stay sadly idle. But for that we have to distinguish write queries from scalable read queries.
We can make use of the Bitnami [PostgreSQL HA](https://artifacthub.io/packages/helm/bitnami/postgresql-ha) instead of simple one. It adds the new component [Pgpool-II](https://pgpool.net/mediawiki/index.php/Main_Page) as main load balancer and detect failover. It's able to separate in real time write queries from read queries and send them to the master or the replica. The pros: works natively for all apps without any changes. The cons: it consumes far more resources and add a new component to maintain.
A 2nd solution is to separate query typologies from where it counts: the application. It requires some code changes, but it's clearly a far more efficient solution. Let's do this way.
As Npgsql support load balancing [natively](https://www.npgsql.org/doc/failover-and-load-balancing.html), we don't need to add any Kubernetes service. We just have to create a clear distinction between read and write queries. One simple way is to create a separate RO `DbContext`.
{{< highlight host="kuberocks-demo" file="src/KubeRocks.Application/Contexts/AppRoDbContext.cs" >}}
```cs
namespace KubeRocks.Application.Contexts;
using KubeRocks.Application.Entities;
using Microsoft.EntityFrameworkCore;
public class AppRoDbContext : DbContext
{
public DbSet<User> Users => Set<User>();
public DbSet<Article> Articles => Set<Article>();
public DbSet<Comment> Comments => Set<Comment>();
public AppRoDbContext(DbContextOptions<AppRoDbContext> options) : base(options)
{
}
}
```
{{< /highlight >}}
Register it in DI:
{{< highlight host="kuberocks-demo" file="src/KubeRocks.Application/Extensions/ServiceExtensions.cs" >}}
```cs
public static class ServiceExtensions
{
public static IServiceCollection AddKubeRocksServices(this IServiceCollection services, IConfiguration configuration)
{
return services
//...
.AddDbContext<AppRoDbContext>((options) =>
{
options.UseNpgsql(
configuration.GetConnectionString("DefaultRoConnection")
??
configuration.GetConnectionString("DefaultConnection")
);
});
}
}
```
{{< /highlight >}}
We fall back to the RW connection string if the RO one is not defined. Then use it in the `ArticlesController` which has only read endpoints:
{{< highlight host="kuberocks-demo" file="src/KubeRocks.WebApi/Controllers/ArticlesController.cs" >}}
```cs
//...
public class ArticlesController
{
private readonly AppRoDbContext _context;
//...
public ArticlesController(AppRoDbContext context)
{
_context = context;
}
//...
}
```
{{< /highlight >}}
Push and let it pass the CI. In the meantime, add the new RO connection:
{{< highlight host="demo-kube-flux" file="clusters/demo/kuberocks/deploy-demo.yaml" >}}
```yaml
# ...
spec:
# ...
template:
# ...
spec:
# ...
containers:
- name: api
# ...
env:
- name: DB_PASSWORD
valueFrom:
secretKeyRef:
name: demo-db
key: password
- name: ConnectionStrings__DefaultConnection
value: Host=postgresql-primary.postgres;Username=demo;Password='$(DB_PASSWORD)';Database=demo;
- name: ConnectionStrings__DefaultRoConnection
value: Host=postgresql-primary.postgres,postgresql-read.postgres;Username=demo;Password='$(DB_PASSWORD)';Database=demo;Load Balance Hosts=true;
#...
```
{{< /highlight >}}
We simply have to add multiple hosts like `postgresql-primary.postgres,postgresql-read.postgres` for the RO connection string and enable LB mode with `Load Balance Hosts=true`.
Once deployed, relaunch a load test with K6 and admire the DB load balancing in action on both storage servers with `htop` or directly on compute pods by namespace dashboard in Grafana.
[![Gafana DB load balancing](grafana-db-lb.png)](grafana-db-lb.png)
## Frontend
Let's finish this guide by a quick view of SPA frontend development as a separate project from backend.
### Vue TS
Create a new Vue.js project from [vitesse starter kit](https://github.com/antfu/vitesse-lite) (be sure to have pnpm, just a matter of `scoop/brew install pnpm`):
```sh
npx degit antfu/vitesse-lite kuberocks-demo-ui
cd kuberocks-demo-ui
git init
git add .
git commit -m "Initial commit"
pnpm i
pnpm dev
```
Should launch app in `http://localhost:3333/`. Create a new `kuberocks-demo-ui` Gitea repo and push this code into it. Now lets quick and done for API calls.
### Get around CORS and HTTPS with YARP
As always when frontend is separated from backend, we have to deal with CORS. But I prefer to have one single URL for frontend + backend and get rid of CORS problem by simply call under `/api` path. Moreover, it'll be production ready without need to manage any `Vite` variable for API URL and we'll get HTTPS provided by dotnet. Back to API project.
For convenience, let's change the randomly generated ASP.NET ports by predefined ones:
{{< highlight host="kuberocks-demo" file="src/KubeRocks.WebApi/Properties/launchSettings.json" >}}
```json
{
//...
"profiles": {
"http": {
//...
"applicationUrl": "http://localhost:5000",
//...
},
"https": {
//...
"applicationUrl": "https://localhost:5001;http://localhost:5000",
//...
},
//...
}
}
```
{{< /highlight >}}
```sh
dotnet add src/KubeRocks.WebApi package Yarp.ReverseProxy
```
{{< highlight host="kuberocks-demo" file="src/KubeRocks.WebApi/Program.cs" >}}
```cs
//...
var builder = WebApplication.CreateBuilder(args);
builder.Services.AddReverseProxy()
.LoadFromConfig(builder.Configuration.GetSection("ReverseProxy"));
//...
var app = builder.Build();
app.MapReverseProxy();
//...
app.UseRouting();
//...
```
{{< /highlight >}}
Note as we must add `app.UseRouting();` too in order to get Swagger UI working.
The proxy configuration (only for development):
{{< highlight host="kuberocks-demo" file="src/KubeRocks.WebApi/appsettings.Development.json" >}}
```json
{
//...
"ReverseProxy": {
"Routes": {
"ServerRouteApi": {
"ClusterId": "Server",
"Match": {
"Path": "/api/{**catch-all}"
},
"Transforms": [
{
"PathRemovePrefix": "/api"
}
]
},
"ClientRoute": {
"ClusterId": "Client",
"Match": {
"Path": "{**catch-all}"
}
}
},
"Clusters": {
"Client": {
"Destinations": {
"Client1": {
"Address": "http://localhost:3333"
}
}
},
"Server": {
"Destinations": {
"Server1": {
"Address": "https://localhost:5001"
}
}
}
}
}
}
```
{{< /highlight >}}
Now your frontend app should appear under `https://localhost:5001`, and API calls under `https://localhost:5001/api`. We now benefit from HTTPS for all app. Push API code.
### Typescript API generator
As we use OpenAPI, it's possible to generate typescript client for API calls. Before tackle the generation of client models, go back to backend for forcing required by default for attributes when not nullable when using `Swashbuckle.AspNetCore`:
{{< highlight host="kuberocks-demo" file="src/KubeRocks.WebApi/Filters/RequiredNotNullableSchemaFilter.cs" >}}
```cs
using Microsoft.OpenApi.Models;
using Swashbuckle.AspNetCore.SwaggerGen;
namespace KubeRocks.WebApi.Filters;
public class RequiredNotNullableSchemaFilter : ISchemaFilter
{
public void Apply(OpenApiSchema schema, SchemaFilterContext context)
{
if (schema.Properties is null)
{
return;
}
var notNullableProperties = schema
.Properties
.Where(x => !x.Value.Nullable && !schema.Required.Contains(x.Key))
.ToList();
foreach (var property in notNullableProperties)
{
schema.Required.Add(property.Key);
}
}
}
```
{{< /highlight >}}
{{< highlight host="kuberocks-demo" file="src/KubeRocks.WebApi/Program.cs" >}}
```cs
//...
builder.Services.AddSwaggerGen(o =>
{
o.SupportNonNullableReferenceTypes();
o.SchemaFilter<RequiredNotNullableSchemaFilter>();
});
//...
```
{{< /highlight >}}
You should now have proper required attributes for models in swagger UI:
[![Frontend](swagger-ui-nullable.png)](swagger-ui-nullable.png)
{{< alert >}}
Sadly, without this boring step, many attributes will be nullable when generating TypeScript models, and leads to headaches from client side by forcing us to manage nullable everywhere.
{{< /alert >}}
Now back to the `kubrerocks-demo-ui` project and add the following dependencies:
```sh
pnpm add openapi-typescript -D
pnpm add openapi-typescript-fetch
```
Now generate the models by adding this script:
{{< highlight host="kuberocks-demo-ui" file="package.json" >}}
```json
{
//...
"scripts": {
//...
"openapi": "openapi-typescript http://localhost:5000/api/v1/swagger.json --output src/api/openapi.ts"
},
//...
}
```
{{< /highlight >}}
Use the HTTP version of swagger as you'll get a self certificate error. Then use `pnpm openapi` to generate full TS model. Finally, describe API fetchers like so:
{{< highlight host="kuberocks-demo-ui" file="src/api/index.ts" >}}
```ts
import { Fetcher } from 'openapi-typescript-fetch'
import type { components, paths } from './openapi'
const fetcher = Fetcher.for<paths>()
type ArticleList = components['schemas']['ArticleListDto']
type Article = components['schemas']['ArticleDto']
const getArticles = fetcher.path('/api/Articles').method('get').create()
const getArticleBySlug = fetcher.path('/api/Articles/{slug}').method('get').create()
export type { Article, ArticleList }
export {
getArticles,
getArticleBySlug,
}
```
{{< /highlight >}}
We are now fully typed compliant with the API.
### Call the API
Let's create a pretty basic paginated list and detail vue pages:
{{< highlight host="kuberocks-demo-ui" file="src/pages/articles/index.vue" >}}
```vue
<script lang="ts" setup>
import { getArticles } from '~/api'
import type { ArticleList } from '~/api'
const articles = ref<ArticleList[]>([])
const articlesCount = ref<number>(0)
const page = ref<number>(1)
const size = ref<number>(10)
async function loadArticles() {
const { data } = await getArticles({
page: page.value,
size: size.value,
})
articles.value = data.articles
articlesCount.value = data.articlesCount
}
function fetchDataOnPage({ currentPage }: { currentPage: number }) {
page.value = currentPage
loadArticles()
}
loadArticles()
</script>
<template>
<div inline-flex flex-col gap-4>
<RouterLink
v-for="(article, i) in articles"
:key="i"
:to="`/articles/${article.slug}`"
inline-block border-1 border-purple-500 rounded p-4
>
<h3>{{ article.title }}</h3>
</RouterLink>
</div>
<div mt-4 flex justify-center>
<OffsetPagination
:page="page"
:size="size"
:total="articlesCount"
:fetch-data="fetchDataOnPage"
/>
</div>
</template>
```
{{< /highlight >}}
The reusable pagination component that use `useOffsetPagination` from VueUse:
{{< highlight host="kuberocks-demo-ui" file="src/components/OffsetPagination.vue" >}}
```vue
<script lang="ts" setup>
const props = defineProps<{
total: number
size: number
page: number
fetchData: ({ currentPage }: { currentPage: number }) => Promise<void> | void
}>()
const pagination = computed(() =>
useOffsetPagination({
total: props.total,
page: props.page,
pageSize: props.size,
onPageChange: props.fetchData,
onPageSizeChange: props.fetchData,
}),
)
function usePagesBuilder(currentPage: number, pageCount: number) {
const pages = []
const maxPages = 5
const half = Math.floor(maxPages / 2)
const start = Math.max(currentPage - half, 1)
const end = Math.min(start + maxPages, pageCount)
for (let i = start; i <= end; i++)
pages.push(i)
if (start > 1) {
pages.unshift('...')
pages.unshift(1)
}
if (end < pageCount) {
pages.push('...')
pages.push(pageCount)
}
return pages
}
const classes
= 'flex items-center justify-center border rounded-1 text-sm font-sans text-gray-300 border-gray-500 w-8 h-8'
</script>
<template>
<div flex flex-wrap gap-1>
<button
:disabled="pagination.isFirstPage.value"
:class="[
classes,
{
'opacity-50': pagination.isFirstPage.value,
},
]"
@click="pagination.prev"
>
&lt;
</button>
<button
v-for="item in usePagesBuilder(
pagination.currentPage.value,
pagination.pageCount.value,
)"
:key="item"
:disabled="
pagination.currentPage.value === item || !Number.isInteger(item)
"
:class="[
classes,
{
'opacity-50': !Number.isInteger(item),
'text-white border-purple-500 bg-purple-500':
pagination.currentPage.value === item,
},
]"
@click="pagination.currentPage.value = Number(item)"
>
{{ item }}
</button>
<button
:disabled="pagination.isLastPage.value"
:class="[
classes,
{
'opacity-50': pagination.isLastPage.value,
},
]"
@click="pagination.next"
>
&gt;
</button>
</div>
</template>
```
{{< /highlight >}}
The view detail:
{{< highlight host="kuberocks-demo-ui" file="src/pages/articles/[slug].vue" >}}
```vue
<script lang="ts" setup>
import { getArticleBySlug } from '~/api'
import type { Article } from '~/api'
const props = defineProps<{ slug: string }>()
const article = ref<Article>()
const router = useRouter()
async function getArticle() {
const { data } = await getArticleBySlug({ slug: props.slug })
article.value = data
}
getArticle()
</script>
<template>
<div v-if="article">
<h1 mb-6 text-2xl font-bold>
{{ article.title }}
</h1>
<p mb-4 italic>
{{ article.description }}
</p>
<div prose v-html="article.body" />
<div>
<button m-3 mt-8 text-sm btn @click="router.back()">
Back
</button>
</div>
</div>
</template>
```
{{< /highlight >}}
### Frontend CI/CD
The CI frontend is far simpler than backend. Create a new `demo-ui` pipeline:
{{< highlight host="demo-kube-flux" file="pipelines/demo-ui.yaml" >}}
```yml
resources:
- name: version
type: semver
source:
driver: git
uri: ((git.url))/kuberocks/demo-ui
branch: main
file: version
username: ((git.username))
password: ((git.password))
git_user: ((git.git-user))
commit_message: ((git.commit-message))
- name: source-code
type: git
icon: coffee
source:
uri: ((git.url))/kuberocks/demo-ui
branch: main
username: ((git.username))
password: ((git.password))
- name: docker-image
type: registry-image
icon: docker
source:
repository: ((registry.name))/kuberocks/demo-ui
tag: latest
username: ((registry.username))
password: ((registry.password))
jobs:
- name: build
plan:
- get: source-code
trigger: true
- task: build-source
config:
platform: linux
image_resource:
type: registry-image
source:
repository: node
tag: 18-buster
inputs:
- name: source-code
path: .
outputs:
- name: dist
path: dist
caches:
- path: .pnpm-store
run:
path: /bin/sh
args:
- -ec
- |
corepack enable
corepack prepare pnpm@latest-8 --activate
pnpm config set store-dir .pnpm-store
pnpm i
pnpm lint
pnpm build
- task: build-image
privileged: true
config:
platform: linux
image_resource:
type: registry-image
source:
repository: concourse/oci-build-task
inputs:
- name: source-code
path: .
- name: dist
path: dist
outputs:
- name: image
run:
path: build
- put: version
params: { bump: patch }
- put: docker-image
params:
additional_tags: version/number
image: image/image.tar
```
{{< /highlight >}}
`pnpm build` take care of TypeScript type-checks and assets building.
{{< highlight host="demo-kube-flux" file="pipelines/main.yaml" >}}
```tf
#...
jobs:
- name: configure-pipelines
plan:
#...
- set_pipeline: demo-ui
file: ci/pipelines/demo-ui.yaml
```
{{< /highlight >}}
Apply it and put this nginx config alongside the `Dockerfile` on frontend root project:
{{< highlight host="kuberocks-demo-ui" file="docker/nginx.conf" >}}
```conf
server {
listen 80;
server_name localhost;
root /usr/share/nginx/html;
location / {
try_files $uri /index.html;
}
}
```
{{< /highlight >}}
{{< highlight host="kuberocks-demo-ui" file="Dockerfile" >}}
```Dockerfile
FROM nginx:alpine
COPY docker/nginx.conf /etc/nginx/conf.d/default.conf
COPY dist /usr/share/nginx/html
```
{{< /highlight >}}
{{< alert >}}
Without nginx config, as it's an SPA, it will not handle properly the JS routes.
{{< /alert >}}
After push all CI should build correctly. Then the image policy for auto update:
{{< highlight host="demo-kube-flux" file="clusters/demo/kuberocks/images-demo-ui.yaml" >}}
```yml
apiVersion: image.toolkit.fluxcd.io/v1beta1
kind: ImageRepository
metadata:
name: demo-ui
namespace: flux-system
spec:
image: gitea.kube.rocks/kuberocks/demo-ui
interval: 1m0s
secretRef:
name: dockerconfigjson
---
apiVersion: image.toolkit.fluxcd.io/v1beta1
kind: ImagePolicy
metadata:
name: demo-ui
namespace: flux-system
spec:
imageRepositoryRef:
name: demo-ui
namespace: flux-system
policy:
semver:
range: 0.0.x
```
{{< /highlight >}}
The deployment:
{{< highlight host="demo-kube-flux" file="clusters/demo/kuberocks/deploy-demo-ui.yaml" >}}
```yml
apiVersion: apps/v1
kind: Deployment
metadata:
name: demo-ui
namespace: kuberocks
spec:
replicas: 2
selector:
matchLabels:
app: demo-ui
template:
metadata:
labels:
app: demo-ui
spec:
imagePullSecrets:
- name: dockerconfigjson
containers:
- name: front
image: gitea.okami101.io/kuberocks/demo-ui:latest # {"$imagepolicy": "flux-system:image-demo-ui"}
ports:
- containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
name: demo-ui
namespace: kuberocks
spec:
selector:
app: demo-ui
ports:
- name: http
port: 80
```
{{< /highlight >}}
After push, the demo UI container should be deployed. The very last step is to add a new route to existing `IngressRoute` for frontend:
{{< highlight host="demo-kube-flux" file="clusters/demo/kuberocks/deploy-demo.yaml" >}}
```yaml
#...
apiVersion: traefik.io/v1alpha1
kind: IngressRoute
#...
spec:
#...
routes:
- match: Host(`demo.kube.rocks`)
kind: Rule
services:
- name: demo-ui
port: http
- match: Host(`demo.kube.rocks`) && PathPrefix(`/api`)
#...
```
{{< /highlight >}}
Go to `https://demo.kube.rocks` to confirm that front app can call the API.
[![Frontend](frontend.png)](frontend.png)
## Final check 🎊🏁🎊
You just made a vast tour of building an on-premise Kubernetes cluster following GitOps workflow from the ground up. Congratulation if you're getting that far !!!
I highly encourage you to buy [this book](https://www.amazon.com/Understanding-Kubernetes-visual-way-sketchnotes/dp/B0BB619188) from [Aurélie Vache](https://twitter.com/aurelievache), it's the best cheat sheet for Kubernetes in the place.

Binary file not shown.

After

Width:  |  Height:  |  Size: 113 KiB

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 12 KiB