K3s: The Complete Guide for Every Use Case
Master K3s from single-node dev to multi-node production: installation, networking, storage, GitOps, observability, CI/CD, edge deployments, and cluster maintenance.
Kubernetes is the standard for container orchestration at scale, but its operational weight β etcd, the API server, the controller manager, the scheduler, the cloud-controller, the CNI plugin β makes it expensive to run anywhere below enterprise scale. A full Kubernetes cluster on a $10 VPS is like running a logistics operation for a single package.
K3s solves this by stripping Kubernetes down to its essential API surface while keeping full compatibility. It ships as a single binary under 100MB, replaces etcd with SQLite for single-node deployments, bundles Flannel as the CNI, Traefik as the ingress controller, and local-path-provisioner for storage. You get a production-grade Kubernetes cluster that starts in under 30 seconds on a 512MB machine.
This guide covers every meaningful use case: local development, single VPS production, multi-node HA clusters, Raspberry Pi and edge devices, homelab setups, GitOps pipelines, observability stacks, and day-two operations. Each section is self-contained β read the ones that match your situation.
How K3s Differs from K8s
Understanding what K3s removed and what it replaced helps you reason about its limitations and strengths before committing to it.
Removed from upstream Kubernetes:
- In-tree cloud provider code (AWS, GCP, Azure specific controllers)
- Alpha-stage features and deprecated APIs
- Most non-essential plugins and add-ons
Replaced with lighter alternatives:
- etcd β SQLite (single node) or embedded etcd (HA cluster)
- kube-proxy β replaced by Flannelβs host-gw / VXLAN
- CoreDNS β bundled, same version
- Ingress β Traefik v2 bundled by default
- Storage β local-path-provisioner bundled
What remains identical to upstream K8s:
- The Kubernetes API β every
kubectlcommand, every manifest, every CRD - The pod scheduling model
- RBAC, secrets, configmaps, services, ingress
- Helm chart compatibility
- All standard workload types (Deployment, StatefulSet, DaemonSet, Job, CronJob)
The practical implication: any workload that runs on upstream Kubernetes runs on K3s without modification. The difference is in the infrastructure layer β how the cluster itself is run and managed.
Case 1: Local Development
The most immediate K3s use case is replacing Docker Compose or Minikube for local development. K3s runs on Linux natively and on macOS/Windows via Multipass or Lima.
Linux (Native)
# install K3s as a single-node cluster
curl -sfL https://get.k3s.io | sh -
# the install script starts K3s as a systemd service
# verify it is running
sudo systemctl status k3s
# check the node
sudo kubectl get nodes
# copy kubeconfig for local use
mkdir -p ~/.kube
sudo cp /etc/rancher/k3s/k3s.yaml ~/.kube/config
sudo chown $(id -u):$(id -g) ~/.kube/config
chmod 600 ~/.kube/config
K3s stores its kubeconfig at /etc/rancher/k3s/k3s.yaml. The default context is default. Once copied to ~/.kube/config, all standard kubectl commands work without sudo.
macOS / Windows (via Lima)
Lima runs a Linux VM with automatic file sharing and port forwarding β lighter than Docker Desktop or Multipass:
# install Lima
brew install lima
# create a K3s instance
limactl start --name k3s template://k3s
# use the K3s kubectl from Lima
limactl shell k3s kubectl get nodes
# or configure local kubectl to use it
limactl shell k3s -- cat /etc/rancher/k3s/k3s.yaml \
| sed "s/127.0.0.1/$(limactl list k3s --format '{{.IP}}')/g" \
> ~/.kube/k3s-lima.yaml
export KUBECONFIG=~/.kube/k3s-lima.yaml
kubectl get nodes
Development Workflow
For local development, the typical workflow is:
# build and load image into K3s without a registry
docker build -t myapp:dev .
# K3s uses containerd, not Docker β import directly
docker save myapp:dev | sudo k3s ctr images import -
# or use a local registry
# run a local registry container
docker run -d -p 5000:5000 --name registry registry:2
# tag and push to local registry
docker tag myapp:dev localhost:5000/myapp:dev
docker push localhost:5000/myapp:dev
# configure K3s to trust the local registry
sudo tee /etc/rancher/k3s/registries.yaml << 'EOF'
mirrors:
"localhost:5000":
endpoint:
- "http://localhost:5000"
EOF
sudo systemctl restart k3s
Case 2: Single-Node Production (VPS)
A single K3s node on a $10β20/month VPS handles a surprising amount of production workload: a Go API with PostgreSQL, a Redis cache, a few background workers, and Traefik handling HTTPS β all on 2 vCPUs and 4GB RAM.
Server Requirements
| Workload | Minimum | Recommended |
|---|---|---|
| K3s system + Traefik | 512MB RAM, 1 vCPU | 1GB RAM, 1 vCPU |
| 3β5 small services | 2GB RAM total, 2 vCPU | 4GB RAM, 2 vCPU |
| PostgreSQL + Redis | +1GB RAM | +2GB RAM |
| Monitoring stack | +512MB RAM | +1GB RAM |
Installing K3s on a VPS
# as root on the VPS
# disable Traefik if you want to install it manually with custom config
curl -sfL https://get.k3s.io | INSTALL_K3S_EXEC="--disable traefik" sh -
# or keep bundled Traefik (fine for most cases)
curl -sfL https://get.k3s.io | sh -
# get the node token (needed for adding agent nodes later)
sudo cat /var/lib/rancher/k3s/server/node-token
Deploying a Production Application
A complete manifest set for a Go API with PostgreSQL:
# namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
name: myapp
# postgres.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: postgres-pvc
namespace: myapp
spec:
accessModes: [ReadWriteOnce]
storageClassName: local-path # K3s default storage class
resources:
requests:
storage: 10Gi
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: postgres
namespace: myapp
spec:
serviceName: postgres
replicas: 1
selector:
matchLabels:
app: postgres
template:
metadata:
labels:
app: postgres
spec:
containers:
- name: postgres
image: postgres:17-alpine
env:
- name: POSTGRES_DB
value: myapp
- name: POSTGRES_USER
valueFrom:
secretKeyRef:
name: postgres-secret
key: username
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: postgres-secret
key: password
ports:
- containerPort: 5432
volumeMounts:
- name: data
mountPath: /var/lib/postgresql/data
readinessProbe:
exec:
command: ["pg_isready", "-U", "$(POSTGRES_USER)"]
initialDelaySeconds: 5
periodSeconds: 5
volumes:
- name: data
persistentVolumeClaim:
claimName: postgres-pvc
---
apiVersion: v1
kind: Service
metadata:
name: postgres
namespace: myapp
spec:
selector:
app: postgres
ports:
- port: 5432
# api.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: api
namespace: myapp
spec:
replicas: 2
selector:
matchLabels:
app: api
template:
metadata:
labels:
app: api
spec:
containers:
- name: api
image: yourregistry/myapp-api:2.1.0
ports:
- containerPort: 8080
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: api-secret
key: database-url
- name: PORT
value: "8080"
readinessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 15
periodSeconds: 20
resources:
requests:
memory: "128Mi"
cpu: "100m"
limits:
memory: "256Mi"
cpu: "500m"
---
apiVersion: v1
kind: Service
metadata:
name: api
namespace: myapp
spec:
selector:
app: api
ports:
- port: 80
targetPort: 8080
# ingress.yaml β Traefik IngressRoute
apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
name: api-ingress
namespace: myapp
spec:
entryPoints:
- websecure
routes:
- match: Host(`api.yourdomain.com`)
kind: Rule
services:
- name: api
port: 80
tls:
certResolver: letsencrypt
Secrets Management
Never put secrets in manifests. Use kubectl create secret or Sealed Secrets:
# create secrets imperatively
kubectl create secret generic postgres-secret \
--namespace myapp \
--from-literal=username=myapp \
--from-literal=password=$(openssl rand -base64 32)
kubectl create secret generic api-secret \
--namespace myapp \
--from-literal=database-url="postgres://myapp:password@postgres:5432/myapp?sslmode=disable"
Case 3: Multi-Node HA Cluster
For production workloads that need high availability, K3s supports a multi-server setup with embedded etcd. The minimum HA configuration is three server nodes (for etcd quorum).
Architecture
ββββββββββββββββββββββββββββββββ
β Load Balancer (nginx/haproxy)β
β TCP pass-through :6443 β
ββββββββ¬βββββββ¬βββββββ¬ββββββββββββ
β β β
ββββββββΌβββ ββββΌββββββΌβ ββββββββββββ
βServer 1 β βServer 2 β β Server 3 β
β(etcd) β β(etcd) β β (etcd) β
βββββββββββ βββββββββββ ββββββββββββ
β
ββββββββββββββΌβββββββββββββ
ββββββββΌβββ ββββββββΌβββ ββββββββΌβββ
β Agent 1 β β Agent 2 β β Agent 3 β
β(worker) β β(worker) β β(worker) β
βββββββββββ βββββββββββ βββββββββββ
Setting Up the HA Cluster
# on Server 1 β initialize the cluster with embedded etcd
curl -sfL https://get.k3s.io | sh -s - server \
--cluster-init \
--tls-san YOUR_LB_IP \
--tls-san server1.internal \
--disable traefik \
--node-taint CriticalAddonsOnly=true:NoExecute
# get the cluster token
sudo cat /var/lib/rancher/k3s/server/node-token
# β K10abc...::server:xyz...
# on Server 2 and Server 3 β join the cluster
curl -sfL https://get.k3s.io | sh -s - server \
--server https://SERVER1_IP:6443 \
--token K10abc...::server:xyz... \
--tls-san YOUR_LB_IP \
--disable traefik \
--node-taint CriticalAddonsOnly=true:NoExecute
# on each Agent node
curl -sfL https://get.k3s.io | K3S_URL=https://YOUR_LB_IP:6443 \
K3S_TOKEN=K10abc...::server:xyz... sh -
The --node-taint CriticalAddonsOnly=true:NoExecute on server nodes prevents workloads from being scheduled on the control plane nodes, keeping them available for cluster management. All application workloads run on the agent nodes.
The --tls-san YOUR_LB_IP adds the load balancer IP to the TLS certificate so clients connecting through it do not get a certificate error.
Load Balancer Configuration (nginx)
# /etc/nginx/nginx.conf (on the load balancer node)
stream {
upstream k3s_servers {
server server1.internal:6443;
server server2.internal:6443;
server server3.internal:6443;
}
server {
listen 6443;
proxy_pass k3s_servers;
proxy_timeout 10s;
proxy_connect_timeout 5s;
}
}
For a simpler setup without a dedicated load balancer node, use kube-vip as an in-cluster load balancer for the control plane:
# deploy kube-vip as a DaemonSet on server nodes
kubectl apply -f https://kube-vip.io/manifests/rbac.yaml
# create the kube-vip DaemonSet manifest
KVVERSION=$(curl -sL https://api.github.com/repos/kube-vip/kube-vip/releases | jq -r ".[0].name")
alias kube-vip="ctr run --rm --net-host ghcr.io/kube-vip/kube-vip:$KVVERSION vip /kube-vip"
kube-vip manifest daemonset \
--interface eth0 \
--address 192.168.1.100 \
--inCluster \
--taint \
--controlplane \
--services \
--arp \
--leaderElection | kubectl apply -f -
Case 4: Raspberry Pi and Edge Devices
K3s was designed with ARM in mind. It runs on Raspberry Pi 4 (4GB), Raspberry Pi 5, and other ARM single-board computers. This makes it the natural choice for edge deployments, home automation hubs, and offline-capable IoT clusters.
Raspberry Pi Setup
# on Raspberry Pi OS (64-bit recommended)
# enable cgroups β required for container runtime
echo "cgroup_memory=1 cgroup_enable=memory" | sudo tee -a /boot/cmdline.txt
sudo reboot
# install K3s (detects ARM64 automatically)
curl -sfL https://get.k3s.io | sh -
# verify
sudo kubectl get nodes
# NAME STATUS ROLES AGE VERSION
# rpi4-node Ready control-plane,master 30s v1.31.x+k3s1
ARM-specific considerations:
- Use multi-arch container images (
--platform linux/arm64in your build). Single-archamd64images will fail withexec format error. - Build ARM images with Docker Buildx:
# set up buildx for multi-arch
docker buildx create --name multiarch --use
docker buildx inspect --bootstrap
# build and push multi-arch image
docker buildx build \
--platform linux/amd64,linux/arm64 \
--tag yourregistry/myapp:2.1.0 \
--push .
- SD card I/O is slow. Move the K3s data directory to a USB SSD:
# mount the SSD
sudo mkdir -p /mnt/ssd
sudo mount /dev/sda1 /mnt/ssd
# install K3s with custom data directory
curl -sfL https://get.k3s.io | INSTALL_K3S_EXEC="--data-dir /mnt/ssd/k3s" sh -
Offline Edge Deployments
Edge devices often operate in environments with intermittent or no internet connectivity. K3s handles this by design β once the cluster is running and workloads are deployed, it operates independently of any external API.
For air-gapped installation:
# on an internet-connected machine, download K3s artifacts
VERSION=v1.31.0+k3s1
wget https://github.com/k3s-io/k3s/releases/download/$VERSION/k3s-arm64
wget https://github.com/k3s-io/k3s/releases/download/$VERSION/k3s-airgap-images-arm64.tar.zst
wget https://github.com/k3s-io/k3s/releases/download/$VERSION/sha256sum-arm64.txt
# transfer to the edge device
scp k3s-arm64 pi@edgedevice:/usr/local/bin/k3s
scp k3s-airgap-images-arm64.tar.zst pi@edgedevice:/var/lib/rancher/k3s/agent/images/
# on the edge device
chmod +x /usr/local/bin/k3s
# install using the local binary (INSTALL_K3S_SKIP_DOWNLOAD prevents curl to GitHub)
INSTALL_K3S_SKIP_DOWNLOAD=true \
INSTALL_K3S_VERSION=$VERSION \
./install.sh
Pre-load your application images into the air-gapped node:
# on internet-connected machine
docker pull yourregistry/myapp:2.1.0
docker save yourregistry/myapp:2.1.0 > myapp.tar
# on edge device
sudo k3s ctr images import myapp.tar
Case 5: Homelab
A homelab K3s cluster running on old hardware or a small group of mini-PCs is an excellent way to learn production Kubernetes patterns without cloud costs. The typical homelab stack: K3s + MetalLB + Traefik + cert-manager + Longhorn + a few self-hosted services.
MetalLB: LoadBalancer on Bare Metal
Cloud Kubernetes clusters have a native LoadBalancer service type backed by the cloudβs load balancer service. On bare metal, LoadBalancer services stay in <Pending> state forever without a tool to handle them. MetalLB assigns real IP addresses from a pool you define.
# install MetalLB
kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.14.9/config/manifests/metallb-native.yaml
# wait for MetalLB pods
kubectl wait --namespace metallb-system \
--for=condition=ready pod \
--selector=app=metallb \
--timeout=90s
Configure an IP address pool (use IPs from your LAN that are not in the DHCP range):
# metallb-config.yaml
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
name: homelab-pool
namespace: metallb-system
spec:
addresses:
- 192.168.1.200-192.168.1.220 # reserve these in your router
---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
name: homelab-l2
namespace: metallb-system
spec:
ipAddressPools:
- homelab-pool
Now LoadBalancer services get real IPs from this pool, reachable from any device on the LAN.
Traefik with cert-manager for HTTPS
K3s bundles Traefik, but for homelab use you want cert-manager to handle certificates β either from Letβs Encrypt (for a real domain with DNS pointing to your home IP) or from a self-signed CA for internal .home.arpa domains.
# install cert-manager
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.17.0/cert-manager.yaml
For a self-signed CA (internal services):
# cluster-issuer.yaml
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: selfsigned-issuer
spec:
selfSigned: {}
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: homelab-ca
namespace: cert-manager
spec:
isCA: true
commonName: homelab-ca
secretName: homelab-ca-secret
privateKey:
algorithm: ECDSA
size: 256
issuerRef:
name: selfsigned-issuer
kind: ClusterIssuer
group: cert-manager.io
---
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: homelab-ca-issuer
spec:
ca:
secretName: homelab-ca-secret
For Letβs Encrypt (public domain with DNS-01 challenge via Cloudflare):
# letsencrypt-issuer.yaml
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-prod
spec:
acme:
email: you@yourdomain.com
server: https://acme-v02.api.letsencrypt.org/directory
privateKeySecretRef:
name: letsencrypt-prod-key
solvers:
- dns01:
cloudflare:
email: you@yourdomain.com
apiTokenSecretRef:
name: cloudflare-token
key: api-token
Longhorn: Distributed Block Storage
The bundled local-path provisioner creates node-local volumes β if the node dies, the data is gone. For a multi-node homelab where you want data to survive node failures, Longhorn provides replicated block storage.
# prerequisites
kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/v1.8.0/deploy/prerequisite/longhorn-iscsi-installation.yaml
# install Longhorn
kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/v1.8.0/deploy/longhorn.yaml
# watch installation
kubectl get pods --namespace longhorn-system --watch
Make Longhorn the default storage class:
# remove default from local-path
kubectl patch storageclass local-path -p '{"metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"false"}}}'
# set Longhorn as default
kubectl patch storageclass longhorn -p '{"metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
Longhorn PVCs now replicate data across nodes:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: my-data
spec:
accessModes: [ReadWriteOnce]
storageClassName: longhorn
resources:
requests:
storage: 5Gi
Case 6: CI/CD β Deploying to K3s from GitHub Actions
The standard deployment pattern: build and push an image in CI, then update the Kubernetes deployment to use the new image.
Setting Up Access
GitHub Actions needs kubectl access to your K3s cluster. The safest way is a service account with limited RBAC permissions β not the admin kubeconfig.
# deploy-sa.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: github-deploy
namespace: myapp
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: deploy-role
namespace: myapp
rules:
- apiGroups: ["apps"]
resources: ["deployments"]
verbs: ["get", "patch", "list"]
- apiGroups: [""]
resources: ["pods", "services"]
verbs: ["get", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: github-deploy-binding
namespace: myapp
subjects:
- kind: ServiceAccount
name: github-deploy
namespace: myapp
roleRef:
kind: Role
name: deploy-role
apiGroup: rbac.authorization.k8s.io
---
apiVersion: v1
kind: Secret
metadata:
name: github-deploy-token
namespace: myapp
annotations:
kubernetes.io/service-account.name: github-deploy
type: kubernetes.io/service-account-token
Extract the kubeconfig for the service account:
# get the token
TOKEN=$(kubectl get secret github-deploy-token -n myapp -o jsonpath='{.data.token}' | base64 -d)
# get the CA cert
CA=$(kubectl get secret github-deploy-token -n myapp -o jsonpath='{.data.ca\.crt}')
# get the cluster server URL
SERVER=$(kubectl config view --minify -o jsonpath='{.clusters[0].cluster.server}')
# build the kubeconfig
cat << EOF
apiVersion: v1
kind: Config
clusters:
- cluster:
certificate-authority-data: $CA
server: $SERVER
name: k3s
contexts:
- context:
cluster: k3s
namespace: myapp
user: github-deploy
name: k3s
current-context: k3s
users:
- name: github-deploy
user:
token: $TOKEN
EOF
Store this as a GitHub Actions secret (KUBECONFIG_B64 β base64 encoded).
The Deployment Workflow
# .github/workflows/deploy.yml
name: Deploy
on:
push:
branches: [main]
env:
REGISTRY: ghcr.io
IMAGE_NAME: ${{ github.repository }}/api
jobs:
build-and-deploy:
runs-on: ubuntu-latest
permissions:
contents: read
packages: write
steps:
- uses: actions/checkout@v4
- name: Log in to GHCR
uses: docker/login-action@v3
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Build and push
uses: docker/build-push-action@v6
with:
context: .
platforms: linux/amd64,linux/arm64
push: true
tags: |
${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }}
${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:latest
cache-from: type=gha
cache-to: type=gha,mode=max
- name: Deploy to K3s
env:
KUBECONFIG_DATA: ${{ secrets.KUBECONFIG_B64 }}
run: |
echo "$KUBECONFIG_DATA" | base64 -d > /tmp/kubeconfig
export KUBECONFIG=/tmp/kubeconfig
# update the deployment image
kubectl set image deployment/api \
api=${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }} \
--namespace myapp
# wait for rollout to complete
kubectl rollout status deployment/api \
--namespace myapp \
--timeout=5m
- name: Verify deployment
env:
KUBECONFIG_DATA: ${{ secrets.KUBECONFIG_B64 }}
run: |
echo "$KUBECONFIG_DATA" | base64 -d > /tmp/kubeconfig
export KUBECONFIG=/tmp/kubeconfig
kubectl get pods --namespace myapp
Case 7: GitOps with Flux
GitOps inverts the push-based CI/CD model: instead of CI pushing changes to the cluster, a controller in the cluster pulls from a Git repository and applies changes. The Git repository becomes the single source of truth for cluster state.
Flux is the CNCF-graduated GitOps tool. It watches a Git repo and applies any Kubernetes manifests it finds there.
Installing Flux
# install Flux CLI
curl -s https://fluxcd.io/install.sh | sudo bash
# bootstrap Flux onto K3s β this creates a Git repo and commits the Flux manifests
flux bootstrap github \
--owner=yourorg \
--repository=k3s-gitops \
--branch=main \
--path=clusters/production \
--personal
After bootstrap, the k3s-gitops repo exists with Fluxβs own manifests committed. Any manifests you add under clusters/production/ are automatically applied.
GitRepository and Kustomization
# clusters/production/myapp/source.yaml
apiVersion: source.toolkit.fluxcd.io/v1
kind: GitRepository
metadata:
name: myapp
namespace: flux-system
spec:
interval: 1m
url: https://github.com/yourorg/myapp
ref:
branch: main
---
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
name: myapp
namespace: flux-system
spec:
interval: 5m
path: ./k8s/production
prune: true # delete resources removed from Git
sourceRef:
kind: GitRepository
name: myapp
healthChecks:
- apiVersion: apps/v1
kind: Deployment
name: api
namespace: myapp
Image Automation
Flux can also watch a container registry and automatically update the Git repo when a new image is pushed:
# image-policy.yaml
apiVersion: image.toolkit.fluxcd.io/v1beta2
kind: ImageRepository
metadata:
name: myapp-api
namespace: flux-system
spec:
image: ghcr.io/yourorg/myapp/api
interval: 5m
---
apiVersion: image.toolkit.fluxcd.io/v1beta2
kind: ImagePolicy
metadata:
name: myapp-api
namespace: flux-system
spec:
imageRepositoryRef:
name: myapp-api
policy:
semver:
range: '>=2.0.0 <3.0.0' # only minor and patch updates
---
apiVersion: image.toolkit.fluxcd.io/v1beta2
kind: ImageUpdateAutomation
metadata:
name: myapp
namespace: flux-system
spec:
interval: 5m
sourceRef:
kind: GitRepository
name: myapp
git:
checkout:
ref:
branch: main
commit:
author:
email: fluxbot@yourorg.com
name: Flux Bot
messageTemplate: 'chore: update {{range .Updated.Images}}{{println .}}{{end}}'
push:
branch: main
update:
path: ./k8s/production
strategy: Setters
In your Deployment manifest, mark the image field for automation:
containers:
- name: api
image: ghcr.io/yourorg/myapp/api:2.1.0 # {"$imagepolicy": "flux-system:myapp-api"}
When a new image matching the semver policy is pushed to the registry, Flux automatically commits a manifest update to Git and applies it to the cluster.
Case 8: Observability Stack
A K3s cluster without observability is a black box. The standard stack β Prometheus for metrics, Grafana for visualization, Loki for logs β fits comfortably on a cluster with 4GB+ RAM.
kube-prometheus-stack (Helm)
# add Helm repo
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
# install the full stack
helm install monitoring prometheus-community/kube-prometheus-stack \
--namespace monitoring \
--create-namespace \
--set grafana.adminPassword=$(openssl rand -base64 24) \
--set prometheus.prometheusSpec.retention=7d \
--set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.storageClassName=local-path \
--set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=10Gi \
--set alertmanager.alertmanagerSpec.storage.volumeClaimTemplate.spec.storageClassName=local-path \
--set alertmanager.alertmanagerSpec.storage.volumeClaimTemplate.spec.resources.requests.storage=2Gi
Exposing Grafana with Traefik
# grafana-ingress.yaml
apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
name: grafana
namespace: monitoring
spec:
entryPoints:
- websecure
routes:
- match: Host(`grafana.yourdomain.com`)
kind: Rule
services:
- name: monitoring-grafana
port: 80
tls:
certResolver: letsencrypt
Scraping Custom Application Metrics
In your Go application, expose a /metrics endpoint using the prometheus/client_golang library:
// internal/metrics/metrics.go
package metrics
import (
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promauto"
)
var (
HTTPRequestsTotal = promauto.NewCounterVec(
prometheus.CounterOpts{
Name: "http_requests_total",
Help: "Total HTTP requests",
},
[]string{"method", "path", "status"},
)
HTTPRequestDuration = promauto.NewHistogramVec(
prometheus.HistogramOpts{
Name: "http_request_duration_seconds",
Help: "HTTP request duration",
Buckets: prometheus.DefBuckets,
},
[]string{"method", "path"},
)
)
Tell Prometheus to scrape it with a ServiceMonitor:
# servicemonitor.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: myapp-api
namespace: myapp
labels:
release: monitoring # must match the Helm release label selector
spec:
selector:
matchLabels:
app: api
endpoints:
- port: http
path: /metrics
interval: 30s
Loki for Logs
helm repo add grafana https://grafana.github.io/helm-charts
helm install loki grafana/loki-stack \
--namespace monitoring \
--set promtail.enabled=true \
--set loki.persistence.enabled=true \
--set loki.persistence.storageClassName=local-path \
--set loki.persistence.size=10Gi
Loki collects all pod logs via Promtail (a DaemonSet that tails container logs). In Grafana, add Loki as a data source and use LogQL to query:
{namespace="myapp", app="api"} |= "error" | json | level="error"
Case 9: Networking Deep Dive
K3s ships with Flannel as its CNI and Traefik as its ingress controller. For most use cases these are adequate. When they are not, here are the alternatives.
Replacing Flannel with Cilium
Cilium provides eBPF-based networking with better performance, network policies, and observability. Install K3s without Flannel:
curl -sfL https://get.k3s.io | sh -s - \
--flannel-backend=none \
--disable-network-policy \
--disable traefik
# install Cilium
helm repo add cilium https://helm.cilium.io/
helm install cilium cilium/cilium \
--namespace kube-system \
--set operator.replicas=1 \
--set kubeProxyReplacement=true \
--set k8sServiceHost=$(hostname -I | awk '{print $1}') \
--set k8sServicePort=6443
Network Policies
Even with Flannel, K3s supports NetworkPolicy resources if you need to restrict traffic between namespaces or pods. With Cilium, policies are enforced at the eBPF level (more efficient and more capable):
# deny-all-ingress-by-default.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-ingress
namespace: myapp
spec:
podSelector: {} # applies to all pods in the namespace
policyTypes:
- Ingress
---
# allow ingress to api from traefik only
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-traefik-to-api
namespace: myapp
spec:
podSelector:
matchLabels:
app: api
policyTypes:
- Ingress
ingress:
- from:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: kube-system
podSelector:
matchLabels:
app.kubernetes.io/name: traefik
ports:
- protocol: TCP
port: 8080
Traefik Middleware
Traefikβs middleware system allows you to add rate limiting, authentication, CORS headers, and redirects to any route without modifying the application:
# middlewares.yaml
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
name: rate-limit
namespace: myapp
spec:
rateLimit:
average: 100
burst: 50
---
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
name: https-redirect
namespace: myapp
spec:
redirectScheme:
scheme: https
permanent: true
---
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
name: security-headers
namespace: myapp
spec:
headers:
stsSeconds: 31536000
stsIncludeSubdomains: true
contentTypeNosniff: true
frameDeny: true
xssProtection: "1; mode=block"
referrerPolicy: "strict-origin-when-cross-origin"
Apply middleware to an IngressRoute:
routes:
- match: Host(`api.yourdomain.com`)
kind: Rule
middlewares:
- name: rate-limit
- name: security-headers
services:
- name: api
port: 80
Day-Two Operations
Upgrading K3s
K3s provides a system-upgrade-controller that handles rolling upgrades across the cluster using a Plan custom resource.
# install the upgrade controller
kubectl apply -f https://github.com/rancher/system-upgrade-controller/releases/latest/download/system-upgrade-controller.yaml
# upgrade-plan.yaml
apiVersion: upgrade.cattle.io/v1
kind: Plan
metadata:
name: k3s-server
namespace: system-upgrade
spec:
concurrency: 1
cordon: true
nodeSelector:
matchExpressions:
- key: node-role.kubernetes.io/control-plane
operator: In
values: ["true"]
serviceAccountName: system-upgrade
upgrade:
image: rancher/k3s-upgrade
channel: https://update.k3s.io/v1-release/channels/stable
---
apiVersion: upgrade.cattle.io/v1
kind: Plan
metadata:
name: k3s-agent
namespace: system-upgrade
spec:
concurrency: 2
cordon: true
nodeSelector:
matchExpressions:
- key: node-role.kubernetes.io/control-plane
operator: DoesNotExist
serviceAccountName: system-upgrade
upgrade:
image: rancher/k3s-upgrade
prepare:
image: rancher/k3s-upgrade
args: ["prepare", "k3s-server"]
channel: https://update.k3s.io/v1-release/channels/stable
The prepare step on agent nodes waits for server nodes to finish upgrading first β ensuring API compatibility is maintained during the rolling upgrade.
Backup and Restore
For a single-node K3s cluster, the SQLite database holds all cluster state:
# backup (K3s must be stopped or snapshot taken)
sudo systemctl stop k3s
sudo cp /var/lib/rancher/k3s/server/db/state.db /backup/k3s-state-$(date +%Y%m%d).db
sudo systemctl start k3s
# automated backup with systemd timer
sudo tee /etc/systemd/system/k3s-backup.service << 'EOF'
[Unit]
Description=K3s SQLite backup
[Service]
Type=oneshot
ExecStart=/bin/bash -c 'cp /var/lib/rancher/k3s/server/db/state.db /backup/k3s-$(date +%Y%m%d-%H%M).db && find /backup -name "k3s-*.db" -mtime +7 -delete'
EOF
sudo tee /etc/systemd/system/k3s-backup.timer << 'EOF'
[Unit]
Description=Run K3s backup every 6 hours
[Timer]
OnCalendar=*-*-* 00,06,12,18:00:00
Persistent=true
[Install]
WantedBy=timers.target
EOF
sudo systemctl enable --now k3s-backup.timer
For HA clusters with embedded etcd, use K3sβs built-in snapshot command:
# take a manual snapshot
sudo k3s etcd-snapshot save --name pre-upgrade-snapshot
# list snapshots
sudo k3s etcd-snapshot ls
# restore (cluster must be stopped)
sudo k3s server \
--cluster-reset \
--cluster-reset-restore-path=/var/lib/rancher/k3s/server/db/snapshots/pre-upgrade-snapshot
Node Maintenance
# drain a node (evict all pods, cordon for scheduling)
kubectl drain node-name \
--ignore-daemonsets \
--delete-emptydir-data \
--grace-period=60
# perform maintenance (OS updates, hardware work)
# ...
# uncordon after maintenance
kubectl uncordon node-name
# remove a node from the cluster permanently
kubectl delete node node-name
# on the node itself, uninstall the K3s agent
/usr/local/bin/k3s-agent-uninstall.sh
Resource Quotas per Namespace
Prevent any single namespace from consuming all cluster resources:
# resource-quota.yaml
apiVersion: v1
kind: ResourceQuota
metadata:
name: myapp-quota
namespace: myapp
spec:
hard:
requests.cpu: "4"
requests.memory: 4Gi
limits.cpu: "8"
limits.memory: 8Gi
pods: "20"
persistentvolumeclaims: "10"
---
apiVersion: v1
kind: LimitRange
metadata:
name: myapp-limits
namespace: myapp
spec:
limits:
- type: Container
default:
cpu: "500m"
memory: "256Mi"
defaultRequest:
cpu: "100m"
memory: "128Mi"
Choosing the Right K3s Configuration
Not every cluster needs every component. A decision matrix for the most common choices:
Storage class:
- Single node, data can be rebuilt β
local-path(default) - Multi-node, data must survive node failure β Longhorn
- Performance-critical database β dedicated PV on fast local NVMe, not replicated
Ingress:
- Simple HTTP/HTTPS routing β bundled Traefik (default)
- Advanced traffic management, gRPC, TCP β Traefik with custom configuration
- Enterprise features, WAF β nginx-ingress or Kong
CNI:
- Standard workloads β Flannel (default)
- Network policies, observability, performance β Cilium
- Strict network isolation requirements β Calico
Cluster topology:
- Single developer or small service β 1 server node (SQLite)
- Production with HA β 3 server nodes + N agent nodes (embedded etcd)
- Edge/offline β 1 server node, pre-loaded images, local-path storage
K3s scales from a single Raspberry Pi to a fifty-node bare metal cluster, with the same kubectl API across every configuration. The investment in learning it β the manifests, the Helm charts, the GitOps patterns β transfers across every deployment target without modification.
K3s is not a simplified Kubernetes. It is Kubernetes with the operational weight removed β the same API, the same compatibility, the same ecosystem, running on hardware that full Kubernetes cannot justify.
Tags
Related Articles
Automation with Go: Building Scalable, Concurrent Systems for Real-World Tasks
Master Go for automation. Learn to build fast, concurrent automation tools, CLI utilities, monitoring systems, and deployment pipelines. Go's concurrency model makes it perfect for real-world automation.
Automation Tools for Developers: Real Workflows Without AI - CLI, Scripts & Open Source
Master free automation tools for developers. Learn to automate repetitive tasks, workflows, deployments, monitoring, and operations. Build custom automation pipelines with open-source toolsβno AI needed.
Data Analysis for Backend Engineers: Using Metrics to Make Better Technical Decisions
Master data analysis as a backend engineer. Learn to collect meaningful metrics, analyze performance data, avoid common pitfalls, and make technical decisions backed by evidence instead of hunches.