k8s

2023-08-20 15:06:33 +02:00
parent 0426392a90
commit 028849ad3a
1 changed files with 168 additions and 8 deletions
--- a/content/posts/12-build-your-own-kubernetes-cluster-part-3/index.md
+++ b/content/posts/12-build-your-own-kubernetes-cluster-part-3/index.md
@ -28,13 +28,20 @@ terraform {

 Let's begin with automatic upgrades management.

-### Monitoring stack
+### CRD prerequisites
+
+Before we go next steps, we need to install critical monitoring CRDs that will be used by many components for monitoring.
+
+```sh
+kubectl apply --server-side -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/v0.67.1/example/prometheus-operator-crd/monitoring.coreos.com_servicemonitors.yaml
+kubectl apply --server-side -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/v0.67.1/example/prometheus-operator-crd/monitoring.coreos.com_podmonitors.yaml
+```

 ### Automatic reboot

 When OS kernel is upgraded, the system needs to be rebooted to apply it. This is a critical operation for a Kubernetes cluster as can cause downtime. To avoid this, we'll use [kured](https://github.com/kubereboot/kured) that will take care of cordon & drains before rebooting nodes one by one.

-{{< highlight file="reboot.tf" >}}
+{{< highlight file="kured.tf" >}}

 ```tf
 resource "helm_release" "kubereboot" {
@ -43,6 +50,7 @@ resource "helm_release" "kubereboot" {
  repository = "https://kubereboot.github.io/charts"

  name = "kured"
+  namespace = "kube-system"

  set {
    name  = "configuration.period"
@ -68,17 +76,169 @@ resource "helm_release" "kubereboot" {

 {{</ highlight >}}

-After applying this, you can check that the daemonset is running on all nodes with `kg ds`.
+After applying this with `terraform apply`, ensure that the `daemonset` is running on all nodes with `kg ds -n kube-system`.

-{{< alert >}}
-`tolerations` will ensure that all tainted nodes will receive the daemonset.
-{{</ alert >}}
+`tolerations` will ensure all tainted nodes will receive the daemonset.
+
+`metrics.create` will create a `servicemonitor` custom k8s resource that allow Prometheus to scrape all kured metrics. You can check it with `kg smon  -n kube-system -o yaml`. The monitoring subject will be covered in a future post, but let's be monitoring ready from the start.
+
+You can test it by exec `touch /var/run/reboot-required` to a specific node.

 ### Automatic K3s upgrade

-kubectl apply -k github.com/rancher/system-upgrade-controller
+Now let's take care of K3s upgrade. We'll use [system-upgrade-controller](https://github.com/rancher/system-upgrade-controller). It will take care of upgrading K3s binary automatically on all nodes one by one.

-## HTTP access
+However, as Terraform doesn't offer a proper way to apply a remote multi-document Yaml file natively, the simplest way is to sacrifice some GitOps by installing system-upgrade-controller manually.
+
+{{< alert >}}
+Don't push yourself get fully 100% GitOps everywhere if the remedy give far more code complexity. Sometimes a simple documentation of manual steps in README is better.
+{{</ alert >}}
+
+```sh
+ka https://github.com/rancher/system-upgrade-controller/releases/latest/download/system-upgrade-controller.yaml
+kg deploy -n system-upgrade
+```
+
+Next apply the following upgrade plans for servers and agents.
+
+{{< highlight file="plans.tf" >}}
+
+```tf
+resource "kubernetes_manifest" "server_plan" {
+  manifest = {
+    apiVersion = "upgrade.cattle.io/v1"
+    kind       = "Plan"
+    metadata = {
+      name      = "server-plan"
+      namespace = "system-upgrade"
+    }
+    spec = {
+      concurrency = 1
+      cordon      = true
+      nodeSelector = {
+        matchExpressions = [
+          {
+            key      = "node-role.kubernetes.io/control-plane"
+            operator = "Exists"
+          }
+        ]
+      }
+      tolerations = [
+        {
+          operator = "Exists"
+          effect   = "NoSchedule"
+        }
+      ]
+      serviceAccountName = "system-upgrade"
+      upgrade = {
+        image = "rancher/k3s-upgrade"
+      }
+      channel = "https://update.k3s.io/v1-release/channels/stable"
+    }
+  }
+}
+
+resource "kubernetes_manifest" "agent_plan" {
+  manifest = {
+    apiVersion = "upgrade.cattle.io/v1"
+    kind       = "Plan"
+    metadata = {
+      name      = "agent-plan"
+      namespace = "system-upgrade"
+    }
+    spec = {
+      concurrency = 1
+      cordon      = true
+      nodeSelector = {
+        matchExpressions = [
+          {
+            key      = "node-role.kubernetes.io/control-plane"
+            operator = "DoesNotExist"
+          }
+        ]
+      }
+      tolerations = [
+        {
+          operator = "Exists"
+          effect   = "NoSchedule"
+        }
+      ]
+      prepare = {
+        args  = ["prepare", "server-plan"]
+        image = "rancher/k3s-upgrade"
+      }
+      serviceAccountName = "system-upgrade"
+      upgrade = {
+        image = "rancher/k3s-upgrade"
+      }
+      channel = "https://update.k3s.io/v1-release/channels/stable"
+    }
+  }
+}
+```
+
+{{</ highlight >}}
+
+{{< alert >}}
+You may set the same channel as previous step for hcloud cluster creation.
+{{</ alert >}}
+
+## External access
+
+Now it's time to expose our cluster to the outside world. We'll use Traefik as ingress controller and cert-manager for SSL certificates management.
+
+### cert-manager
+
+First we need to install cert-manager for proper distributed SSL management. First install CRDs manually.
+
+```sh
+ka https://github.com/cert-manager/cert-manager/releases/download/v1.12.3/cert-manager.crds.yaml
+```
+
+Then apply the following Terraform code.
+
+{{< highlight file="cert-manager.tf" >}}
+
+```tf
+resource "kubernetes_namespace_v1" "cert_manager" {
+  metadata {
+    name = "cert-manager"
+  }
+}
+
+resource "helm_release" "cert_manager" {
+  chart      = "cert-manager"
+  version    = "v1.12.3"
+  repository = "https://charts.jetstack.io"
+
+  name      = "cert-manager"
+  namespace = kubernetes_namespace_v1.cert_manager.metadata[0].name
+
+  set {
+    name  = "prometheus.servicemonitor.enabled"
+    value = true
+  }
+}
+```
+
+{{</ highlight >}}
+
+{{< alert >}}
+You can use `installCRDs` option to install CRDs automatically. But uninstall cert-manager will delete all associated resources including generated certificates. That's why I generally prefer to install CRDs manually.  
+As always we enable `prometheus.servicemonitor.enabled` to allow Prometheus to scrape cert-manager metrics.
+{{</ alert >}}
+
+All should be ok with `kg deploy -n cert-manager`.
+
+#### Wildcard certificate via DNS01
+
+We'll use [DNS01 challenge](https://cert-manager.io/docs/configuration/acme/dns01/) to get wildcard certificate for our domain. This is the most convenient way to get a certificate for a domain without having to expose it to the outside world.
+
+{{< alert >}}
+You may use a DNS provider that is supported by cert-manager. Check the [list of supported providers](https://cert-manager.io/docs/configuration/acme/dns01/#supported-dns01-providers). But cert-manager is highly extensible, and you can easily add your own provider if needed with some efforts. Check [available contrib webhooks](https://cert-manager.io/docs/configuration/acme/dns01/#webhook).
+{{</ alert >}}
+
+### Traefik

 * Traefik + cert-manager
 * DNS configuration