Files
blog/content/posts/10-a-beautiful-gitops-day/index.md
2023-08-30 21:06:21 +02:00

163 lines
8.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: "A beautiful GitOps day - Build your self-hosted Kubernetes cluster"
date: 2023-08-18
description: "Follow this opinionated guide as starter-kit for your own Kubernetes platform..."
tags: ["kubernetes"]
---
{{< lead >}}
Use GitOps workflow for building a production grade on-premise Kubernetes cluster on cheap VPS provider, with complete CI/CD 🎉
{{< /lead >}}
## The goal 🎯
This guide is mainly intended for any developers or some SRE who want to build a Kubernetes cluster that respect following conditions:
1. **On-Premise management** (The Hard Way), so no vendor lock in to any managed Kubernetes provider (KaaS/CaaS)
2. Hosted on affordable VPS provider (**Hetzner**), with strong **Terraform support**, allowing **GitOps** principles
3. **High Availability** with cloud Load Balancer, resilient storage and DB with replication, allowing automatic upgrades or maintenance without any downtime for production apps
4. Include complete **monitoring**, **logging** and **tracing** stacks
5. Complete **CI/CD pipeline**
6. Budget target **~$60/month** for complete cluster with all above tools, can be far less if no need for HA, CI or monitoring features
### What you'll learn 📚
* Use Terraform to manage your infrastructure, for both cloud provider and Kubernetes, following the GitOps principles
* How to set up an On-Premise resilient Kubernetes cluster, from the ground up, with automatic upgrades and reboot
* Use [K3s](https://k3s.io/) as lightweight Kubernetes distribution
* Use [Traefik](https://traefik.io/) as ingress controller, combined to [cert-manager](https://cert-manager.io/) for distributed SSL certificates, and secured access attempt to our cluster through Hetzner Load Balancer
* Use [Longhorn](https://longhorn.io/) as resilient storage, installed to dedicated storage nodes pool and volumes, include PVC incremental backups to S3
* Install and configure data stateful components as **PostgreSQL** and **Redis** clusters to specific nodes pool via well-known [Bitnami Helms](https://bitnami.com/stacks/helm)
* Test our resilient storage with some No Code apps, as [n8n](https://n8n.io/) and [nocodb](https://nocodb.com/), managed by Flux
* Complete monitoring and logging stack with [Prometheus](https://prometheus.io/), [Grafana](https://grafana.com/), [Loki](https://grafana.com/oss/loki/)
* Mount a complete self-hosted CI pipeline with the lightweight [Gitea](https://gitea.io/) + [Concourse CI](https://concourse-ci.org/) combo
* Test above CI tools with a sample **.NET API app**, using database cluster, with automatic CD using Flux
* Integrate the app to our monitoring stack with [OpenTelemetry](https://opentelemetry.io/), and use [Tempo](https://grafana.com/oss/tempo/) for distributed tracing
* Go further with [SonarQube](https://www.sonarsource.com/products/sonarqube/) for Continuous Inspection on code quality, including automatic code coverage reports
* Do some load testing scenarios with [k6](https://k6.io/) and frontend SPA sample deployment using the .NET API.
### You probably don't need Kubernetes 🪧
All of this is of course overkill for any personal usage, and is only intended for learning purpose or getting a low-cost semi-pro grade K3s cluster.
**Docker Swarm** is probably the best solution for 99% of people that need a simple container orchestration system. Swarm stays an officially supported project, as it's built in into the Docker Engine, even if we shouldn't expect any new features.
I wrote a [complete dedicated 2022 guide here]({{< ref "/posts/02-build-your-own-docker-swarm-cluster" >}}) that explains all steps in order to have a semi-pro grade Swarm cluster (but no GitOps oriented, using only Portainer UI).
## Cluster Architecture 🏘️
Here are the node pools that we'll need for a complete self-hosted Kubernetes cluster, where each node pool is scalable independently:
| Node pool | Description |
| ------------ | ---------------------------------------------------------------------------------------------------------- |
| `controller` | The control planes nodes, use at least 3 or any greater odd number (when etcd) for HA kube API server |
| `worker` | Workers for your production/staging stateless apps |
| `storage` | Dedicated nodes for running Longhorn for resilient storage and DB, in case you won't use managed databases |
| `monitor` | Workers dedicated for monitoring, optional |
| `runner` | Workers dedicated for CI/CD pipelines execution, optional |
Here a HA architecture sample with replicated storage (via Longhorn) and PostgreSQL DB (controllers, monitoring and runners are excluded for simplicity):
{{< mermaid >}}
flowchart TB
client((Client))
client -- Port 80 + 443 --> lb{LB}
lb{LB}
lb -- Port 80 --> worker-01
lb -- Port 80 --> worker-02
lb -- Port 80 --> worker-03
subgraph worker-01
direction TB
traefik-01{Traefik}
app-01([My App replica 1])
traefik-01 --> app-01
end
subgraph worker-02
direction TB
traefik-02{Traefik}
app-02([My App replica 2])
traefik-02 --> app-02
end
subgraph worker-03
direction TB
traefik-03{Traefik}
app-03([My App replica 3])
traefik-03 --> app-03
end
overlay(Overlay network)
worker-01 --> overlay
worker-02 --> overlay
worker-03 --> overlay
overlay --> db-primary
overlay --> db-read
db-primary((Primary SVC))
db-primary -- Port 5432 --> storage-01
db-read((Read SVC))
db-read -- Port 5432 --> storage-02
db-read -- Port 5432 --> storage-03
subgraph storage-01
direction TB
pg-primary([PostgreSQL primary])
longhorn-01[(Longhorn<br>volume)]
pg-primary --> longhorn-01
end
subgraph storage-02
direction TB
pg-replica-01([PostgreSQL replica 1])
longhorn-02[(Longhorn<br>volume)]
pg-replica-01 --> longhorn-02
end
subgraph storage-03
direction TB
pg-replica-02([PostgreSQL replica 2])
longhorn-03[(Longhorn<br>volume)]
pg-replica-02 --> longhorn-03
end
db-streaming(Streaming replication)
storage-01 --> db-streaming
storage-02 --> db-streaming
storage-03 --> db-streaming
{{</ mermaid >}}
### Cloud provider choice ☁️
As a HA Kubernetes cluster can be quickly expensive, a good cloud provider is an essential part.
After testing many providers, as Digital Ocean, Vultr, Linode, Civo, OVH, Scaleway, it seems like **Hetzner** is very well suited **in my opinion**:
* Very competitive price for middle-range performance (plan only around **$6** for 2 CPU/4 GB for each node)
* No frills, just the basics, VMs, block volumes, load balancer, firewall, and that's it
* Nice UI + efficient CLI tool
* Official strong [Terraform support](https://registry.terraform.io/providers/hetznercloud/hcloud/latest), so GitOps ready
Please let me know in below comments if you have other better suggestions !
### Final cost estimate 💰
| Server Name | Type | Quantity | Unit Price |
| ------------ | -------- | --------------------- | ---------- |
| `worker` | **LB1** | 1 | 5.39 |
| `manager-0x` | **CX21** | 1 or 3 for HA cluster | 0.5 + 4.85 |
| `worker-0x` | **CX21** | 2 or 3 | 0.5 + 4.85 |
| `storage-0x` | **CX21** | 2 for HA database | 0.5 + 4.85 |
| `monitor-0x` | **CX21** | 1 | 0.5 + 4.85 |
| `runner-0x` | **CX21** | 1 | 0.5 + 4.85 |
**€0.5** is for primary IPs.
We will also need some expendable block volumes for our storage nodes. Let's start with **20 GB**, **2\*0.88**.
(5.39+**8**\*(0.5+4.85)+**2**\*0.88)\*1.2 = **€59.94** / month
We targeted **€60/month** for a minimal working CI/CD cluster, so we are good !
You can also prefer to take **2 larger** cx31 worker nodes (**8 GB** RAM) instead of **3 smaller** ones, which [will optimize resource usage](https://learnk8s.io/kubernetes-node-size), so:
(5.39+**7**\*0.5+**5**\*4.85+**2**\*9.2+**2**\*0.88)\*1.2 = **€63.96** / month
For an HA cluster, you'll need to put 2 more cx21 controllers, so **€72.78** (for 3 small workers option) or **€76.80** / month (for 2 big workers option).
## Lets party 🎉
Enough talk, [let's go Charles !]({{< ref "/posts/11-a-beautiful-gitops-day-1" >}})