Files
blog/content/posts/10-a-beautiful-gitops-day/index.md
2023-08-30 14:37:40 +02:00

8.1 KiB
Raw Blame History

title, date, description, tags, draft
title date description tags draft
A beautiful GitOps day - Build your self-hosted Kubernetes cluster 2023-08-18 Follow this opinionated guide as starter-kit for your own Kubernetes platform...
kubernetes
true

{{< lead >}} Be free from AWS/Azure/GCP by building a production grade On-Premise Kubernetes cluster on cheap VPS provider, fully GitOps managed, and with complete CI/CD tools 🎉 {{< /lead >}}

The goal 🎯

This guide is mainly intended for any developers or some SRE who want to build a Kubernetes cluster that respect following conditions :

  1. On-Premise management (The Hard Way), so no vendor lock in to any managed Kubernetes provider (KaaS/CaaS)
  2. Hosted on affordable VPS provider (Hetzner), with strong Terraform support, allowing GitOps principles
  3. High Availability with cloud Load Balancer, resilient storage and DB with replication, allowing automatic upgrades or maintenance without any downtime for production apps
  4. Include complete monitoring, logging and tracing stacks
  5. Complete CI/CD pipeline
  6. Budget target ~$60/month for complete cluster with all above tools, can be far less if no need for HA, CI or monitoring features

What you'll learn 📚

  • How to set up an On-Premise resilient Kubernetes cluster with Terraform, from the ground up, with automatic upgrades and reboot
  • Use Terraform to manage your infrastructure, for both cloud provider and Kubernetes, following the GitOps principles
  • Use K3s as lightweight Kubernetes distribution
  • Use Traefik as ingress controller, combined to cert-manager for distributed SSL certificates, and first secure access attempt to our cluster through Hetzner Load Balancer
  • Continuous Delivery with Flux and test it with a sample stateless app
  • Use Longhorn as resilient storage, installed to dedicated storage nodes pool and volumes, include PVC incremental backups to S3
  • Install and configure some critical StatefulSets as PostgreSQL and Redis clusters to specific nodes pool via well-known Bitnami Helms
  • Test our resilient storage with some No Code apps, as n8n and nocodb, always managed by Flux
  • Complete monitoring and logging stack with Prometheus, Grafana, Loki
  • Mount a complete self-hosted CI pipeline with the lightweight Gitea + Concourse CI combo
  • Test above CI tools with a sample .NET app, with automatic CD using Flux
  • Integrate the app to our monitoring stack with OpenTelemetry, and use Tempo for distributed tracing
  • Do some load testing scenarios with k6
  • Go further with SonarQube for Continuous Inspection on code quality, including automatic code coverage reports

You probably don't need Kubernetes 🪧

All of this is of course overkill for any personal usage, and is only intended for learning purpose or getting a low-cost semi-pro grade K3s cluster.

Docker Swarm is probably the best solution for 99% of people that need a simple container orchestration system. Swarm stays an officially supported project, as it's built in into the Docker Engine, even if we shouldn't expect any new features.

I wrote a [complete dedicated 2022 guide here]({{< ref "/posts/02-build-your-own-docker-swarm-cluster" >}}) that explains all steps in order to have a semi-pro grade Swarm cluster.

Cluster Architecture 🏘️

Here are the node pools that we'll need for a complete self-hosted Kubernetes cluster :

Node pool Description
controllers The control planes nodes, use at least 3 or any greater odd number (when etcd) for HA kube API server
workers Workers for your production/staging apps, at least 3 for running Longhorn for resilient storage
storages Dedicated nodes for any DB / critical StatefulSets pods, recommended if you won't use managed databases
monitors Workers dedicated for monitoring, optional
runners Workers dedicated for CI/CD pipelines execution, optional

Here a HA architecture sample with replicated storage (via Longhorn) and DB (PostgreSQL) that we will trying to replicate (controllers, monitoring and runners are excluded for simplicity) :

{{< mermaid >}} flowchart TB client((Client)) client -- Port 80 + 443 --> lb{LB} lb{LB} lb -- Port 80 --> worker-01 lb -- Port 80 --> worker-02 lb -- Port 80 --> worker-03 subgraph worker-01 direction TB traefik-01{Traefik} app-01([My App replica 1]) traefik-01 --> app-01 end subgraph worker-02 direction TB traefik-02{Traefik} app-02([My App replica 2]) traefik-02 --> app-02 end subgraph worker-03 direction TB traefik-03{Traefik} app-03([My App replica 3]) traefik-03 --> app-03 end overlay(Overlay network) worker-01 --> overlay worker-02 --> overlay worker-03 --> overlay overlay --> db-rw overlay --> db-ro db-rw((RW SVC)) db-rw -- Port 5432 --> storage-01 db-ro((RO SVC)) db-ro -- Port 5432 --> storage-01 db-ro -- Port 5432 --> storage-02 subgraph storage-01 pg-primary([PostgreSQL primary]) longhorn-01[(Longhorn
volume)] pg-primary --> longhorn-01 end subgraph storage-02 pg-replica([PostgreSQL replica]) longhorn-02[(Longhorn
volume)] pg-replica --> longhorn-02 end db-streaming(Streaming replication) storage-01 --> db-streaming storage-02 --> db-streaming {{</ mermaid >}}

Cloud provider choice ☁️

As a HA Kubernetes cluster can be quickly expensive, a good cloud provider is an essential part.

After testing many providers, as Digital Ocean, Vultr, Linode, Civo, OVH, Scaleway, it seems like Hetzner is very well suited in my opinion :

  • Very competitive price for middle-range performance (plan only around $6 for 2CPU/4 GB for each node)
  • No frills, just the basics, VMs, block volumes, load balancer, DNS, firewall, and that's it
  • Simple nice UI + CLI tool
  • Official strong Terraform support, so GitOps ready
  • In case you use Hetzner DNS, you have cert-manager support via a third party webhook for DSN01 challenge

Please let me know in below comments if you have other better suggestions !

Final cost estimate 💰

Server Name Type Quantity Unit Price
worker LB1 1 5.39
manager-0x CX21 1 or 3 for HA cluster 0.5 + 4.85
worker-0x CX21 2 or 3 0.5 + 4.85
storage-0x CX21 2 for HA database 0.5 + 4.85
monitor-0x CX21 1 0.5 + 4.85
runner-0x CX21 1 0.5 + 4.85

0.5 if for primary IPs.

We will also need some expendable block volumes for our storage nodes. Let's start with 20 GB, 2*0.88.

(5.39+8*(0.5+4.85)+2*0.88)*1.2 = €59.94 / month

We targeted €60/month for a minimal working CI/CD cluster, so we are good !

You can also prefer to take 2 larger cx31 worker nodes (8 GB RAM) instead of 3 smaller ones, which will optimize resource usage, so :

(5.39+7*0.5+5*4.85+2*9.2+2*0.88)*1.2 = €63.96 / month

For an HA cluster, you'll need to put 2 more cx21 controllers, so €72.78 (3 small workers) or €76.80 / month (2 big workers).

Lets party 🎉

Enough talk, [let's go Charles !]({{< ref "/posts/11-a-beautiful-gitops-day-1" >}}).