This commit is contained in:
2023-08-20 15:06:33 +02:00
parent 0426392a90
commit 028849ad3a

View File

@ -28,13 +28,20 @@ terraform {
Let's begin with automatic upgrades management.
### Monitoring stack
### CRD prerequisites
Before we go next steps, we need to install critical monitoring CRDs that will be used by many components for monitoring.
```sh
kubectl apply --server-side -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/v0.67.1/example/prometheus-operator-crd/monitoring.coreos.com_servicemonitors.yaml
kubectl apply --server-side -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/v0.67.1/example/prometheus-operator-crd/monitoring.coreos.com_podmonitors.yaml
```
### Automatic reboot
When OS kernel is upgraded, the system needs to be rebooted to apply it. This is a critical operation for a Kubernetes cluster as can cause downtime. To avoid this, we'll use [kured](https://github.com/kubereboot/kured) that will take care of cordon & drains before rebooting nodes one by one.
{{< highlight file="reboot.tf" >}}
{{< highlight file="kured.tf" >}}
```tf
resource "helm_release" "kubereboot" {
@ -43,6 +50,7 @@ resource "helm_release" "kubereboot" {
repository = "https://kubereboot.github.io/charts"
name = "kured"
namespace = "kube-system"
set {
name = "configuration.period"
@ -68,17 +76,169 @@ resource "helm_release" "kubereboot" {
{{</ highlight >}}
After applying this, you can check that the daemonset is running on all nodes with `kg ds`.
After applying this with `terraform apply`, ensure that the `daemonset` is running on all nodes with `kg ds -n kube-system`.
{{< alert >}}
`tolerations` will ensure that all tainted nodes will receive the daemonset.
{{</ alert >}}
`tolerations` will ensure all tainted nodes will receive the daemonset.
`metrics.create` will create a `servicemonitor` custom k8s resource that allow Prometheus to scrape all kured metrics. You can check it with `kg smon -n kube-system -o yaml`. The monitoring subject will be covered in a future post, but let's be monitoring ready from the start.
You can test it by exec `touch /var/run/reboot-required` to a specific node.
### Automatic K3s upgrade
kubectl apply -k github.com/rancher/system-upgrade-controller
Now let's take care of K3s upgrade. We'll use [system-upgrade-controller](https://github.com/rancher/system-upgrade-controller). It will take care of upgrading K3s binary automatically on all nodes one by one.
## HTTP access
However, as Terraform doesn't offer a proper way to apply a remote multi-document Yaml file natively, the simplest way is to sacrifice some GitOps by installing system-upgrade-controller manually.
{{< alert >}}
Don't push yourself get fully 100% GitOps everywhere if the remedy give far more code complexity. Sometimes a simple documentation of manual steps in README is better.
{{</ alert >}}
```sh
ka https://github.com/rancher/system-upgrade-controller/releases/latest/download/system-upgrade-controller.yaml
kg deploy -n system-upgrade
```
Next apply the following upgrade plans for servers and agents.
{{< highlight file="plans.tf" >}}
```tf
resource "kubernetes_manifest" "server_plan" {
manifest = {
apiVersion = "upgrade.cattle.io/v1"
kind = "Plan"
metadata = {
name = "server-plan"
namespace = "system-upgrade"
}
spec = {
concurrency = 1
cordon = true
nodeSelector = {
matchExpressions = [
{
key = "node-role.kubernetes.io/control-plane"
operator = "Exists"
}
]
}
tolerations = [
{
operator = "Exists"
effect = "NoSchedule"
}
]
serviceAccountName = "system-upgrade"
upgrade = {
image = "rancher/k3s-upgrade"
}
channel = "https://update.k3s.io/v1-release/channels/stable"
}
}
}
resource "kubernetes_manifest" "agent_plan" {
manifest = {
apiVersion = "upgrade.cattle.io/v1"
kind = "Plan"
metadata = {
name = "agent-plan"
namespace = "system-upgrade"
}
spec = {
concurrency = 1
cordon = true
nodeSelector = {
matchExpressions = [
{
key = "node-role.kubernetes.io/control-plane"
operator = "DoesNotExist"
}
]
}
tolerations = [
{
operator = "Exists"
effect = "NoSchedule"
}
]
prepare = {
args = ["prepare", "server-plan"]
image = "rancher/k3s-upgrade"
}
serviceAccountName = "system-upgrade"
upgrade = {
image = "rancher/k3s-upgrade"
}
channel = "https://update.k3s.io/v1-release/channels/stable"
}
}
}
```
{{</ highlight >}}
{{< alert >}}
You may set the same channel as previous step for hcloud cluster creation.
{{</ alert >}}
## External access
Now it's time to expose our cluster to the outside world. We'll use Traefik as ingress controller and cert-manager for SSL certificates management.
### cert-manager
First we need to install cert-manager for proper distributed SSL management. First install CRDs manually.
```sh
ka https://github.com/cert-manager/cert-manager/releases/download/v1.12.3/cert-manager.crds.yaml
```
Then apply the following Terraform code.
{{< highlight file="cert-manager.tf" >}}
```tf
resource "kubernetes_namespace_v1" "cert_manager" {
metadata {
name = "cert-manager"
}
}
resource "helm_release" "cert_manager" {
chart = "cert-manager"
version = "v1.12.3"
repository = "https://charts.jetstack.io"
name = "cert-manager"
namespace = kubernetes_namespace_v1.cert_manager.metadata[0].name
set {
name = "prometheus.servicemonitor.enabled"
value = true
}
}
```
{{</ highlight >}}
{{< alert >}}
You can use `installCRDs` option to install CRDs automatically. But uninstall cert-manager will delete all associated resources including generated certificates. That's why I generally prefer to install CRDs manually.
As always we enable `prometheus.servicemonitor.enabled` to allow Prometheus to scrape cert-manager metrics.
{{</ alert >}}
All should be ok with `kg deploy -n cert-manager`.
#### Wildcard certificate via DNS01
We'll use [DNS01 challenge](https://cert-manager.io/docs/configuration/acme/dns01/) to get wildcard certificate for our domain. This is the most convenient way to get a certificate for a domain without having to expose it to the outside world.
{{< alert >}}
You may use a DNS provider that is supported by cert-manager. Check the [list of supported providers](https://cert-manager.io/docs/configuration/acme/dns01/#supported-dns01-providers). But cert-manager is highly extensible, and you can easily add your own provider if needed with some efforts. Check [available contrib webhooks](https://cert-manager.io/docs/configuration/acme/dns01/#webhook).
{{</ alert >}}
### Traefik
* Traefik + cert-manager
* DNS configuration