K3s: The Complete Guide for Every Use Case

Kubernetes is the standard for container orchestration at scale, but its operational weight — etcd, the API server, the controller manager, the scheduler, the cloud-controller, the CNI plugin — makes it expensive to run anywhere below enterprise scale. A full Kubernetes cluster on a $10 VPS is like running a logistics operation for a single package.

K3s solves this by stripping Kubernetes down to its essential API surface while keeping full compatibility. It ships as a single binary under 100MB, replaces etcd with SQLite for single-node deployments, bundles Flannel as the CNI, Traefik as the ingress controller, and local-path-provisioner for storage. You get a production-grade Kubernetes cluster that starts in under 30 seconds on a 512MB machine.

This guide covers every meaningful use case: local development, single VPS production, multi-node HA clusters, Raspberry Pi and edge devices, homelab setups, GitOps pipelines, observability stacks, and day-two operations. Each section is self-contained — read the ones that match your situation.

How K3s Differs from K8s

Understanding what K3s removed and what it replaced helps you reason about its limitations and strengths before committing to it.

Removed from upstream Kubernetes:

In-tree cloud provider code (AWS, GCP, Azure specific controllers)
Alpha-stage features and deprecated APIs
Most non-essential plugins and add-ons

Replaced with lighter alternatives:

etcd → SQLite (single node) or embedded etcd (HA cluster)
kube-proxy → replaced by Flannel’s host-gw / VXLAN
CoreDNS → bundled, same version
Ingress → Traefik v2 bundled by default
Storage → local-path-provisioner bundled

What remains identical to upstream K8s:

The Kubernetes API — every kubectl command, every manifest, every CRD
The pod scheduling model
RBAC, secrets, configmaps, services, ingress
Helm chart compatibility
All standard workload types (Deployment, StatefulSet, DaemonSet, Job, CronJob)

The practical implication: any workload that runs on upstream Kubernetes runs on K3s without modification. The difference is in the infrastructure layer — how the cluster itself is run and managed.

Case 1: Local Development

The most immediate K3s use case is replacing Docker Compose or Minikube for local development. K3s runs on Linux natively and on macOS/Windows via Multipass or Lima.

Linux (Native)

# install K3s as a single-node cluster
curl -sfL https://get.k3s.io | sh -

# the install script starts K3s as a systemd service
# verify it is running
sudo systemctl status k3s

# check the node
sudo kubectl get nodes

# copy kubeconfig for local use
mkdir -p ~/.kube
sudo cp /etc/rancher/k3s/k3s.yaml ~/.kube/config
sudo chown $(id -u):$(id -g) ~/.kube/config
chmod 600 ~/.kube/config

K3s stores its kubeconfig at /etc/rancher/k3s/k3s.yaml. The default context is default. Once copied to ~/.kube/config, all standard kubectl commands work without sudo.

macOS / Windows (via Lima)

Lima runs a Linux VM with automatic file sharing and port forwarding — lighter than Docker Desktop or Multipass:

# install Lima
brew install lima

# create a K3s instance
limactl start --name k3s template://k3s

# use the K3s kubectl from Lima
limactl shell k3s kubectl get nodes

# or configure local kubectl to use it
limactl shell k3s -- cat /etc/rancher/k3s/k3s.yaml \
  | sed "s/127.0.0.1/$(limactl list k3s --format '{{.IP}}')/g" \
  > ~/.kube/k3s-lima.yaml
export KUBECONFIG=~/.kube/k3s-lima.yaml
kubectl get nodes

Development Workflow

For local development, the typical workflow is:

# build and load image into K3s without a registry
docker build -t myapp:dev .
# K3s uses containerd, not Docker — import directly
docker save myapp:dev | sudo k3s ctr images import -

# or use a local registry
# run a local registry container
docker run -d -p 5000:5000 --name registry registry:2

# tag and push to local registry
docker tag myapp:dev localhost:5000/myapp:dev
docker push localhost:5000/myapp:dev

# configure K3s to trust the local registry
sudo tee /etc/rancher/k3s/registries.yaml << 'EOF'
mirrors:
  "localhost:5000":
    endpoint:
      - "http://localhost:5000"
EOF

sudo systemctl restart k3s

Case 2: Single-Node Production (VPS)

A single K3s node on a $10–20/month VPS handles a surprising amount of production workload: a Go API with PostgreSQL, a Redis cache, a few background workers, and Traefik handling HTTPS — all on 2 vCPUs and 4GB RAM.

Server Requirements

Workload	Minimum	Recommended
K3s system + Traefik	512MB RAM, 1 vCPU	1GB RAM, 1 vCPU
3–5 small services	2GB RAM total, 2 vCPU	4GB RAM, 2 vCPU
PostgreSQL + Redis	+1GB RAM	+2GB RAM
Monitoring stack	+512MB RAM	+1GB RAM

Installing K3s on a VPS

# as root on the VPS
# disable Traefik if you want to install it manually with custom config
curl -sfL https://get.k3s.io | INSTALL_K3S_EXEC="--disable traefik" sh -

# or keep bundled Traefik (fine for most cases)
curl -sfL https://get.k3s.io | sh -

# get the node token (needed for adding agent nodes later)
sudo cat /var/lib/rancher/k3s/server/node-token

Deploying a Production Application

A complete manifest set for a Go API with PostgreSQL:

# namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: myapp

# postgres.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: postgres-pvc
  namespace: myapp
spec:
  accessModes: [ReadWriteOnce]
  storageClassName: local-path     # K3s default storage class
  resources:
    requests:
      storage: 10Gi
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgres
  namespace: myapp
spec:
  serviceName: postgres
  replicas: 1
  selector:
    matchLabels:
      app: postgres
  template:
    metadata:
      labels:
        app: postgres
    spec:
      containers:
        - name: postgres
          image: postgres:17-alpine
          env:
            - name: POSTGRES_DB
              value: myapp
            - name: POSTGRES_USER
              valueFrom:
                secretKeyRef:
                  name: postgres-secret
                  key: username
            - name: POSTGRES_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: postgres-secret
                  key: password
          ports:
            - containerPort: 5432
          volumeMounts:
            - name: data
              mountPath: /var/lib/postgresql/data
          readinessProbe:
            exec:
              command: ["pg_isready", "-U", "$(POSTGRES_USER)"]
            initialDelaySeconds: 5
            periodSeconds: 5
      volumes:
        - name: data
          persistentVolumeClaim:
            claimName: postgres-pvc
---
apiVersion: v1
kind: Service
metadata:
  name: postgres
  namespace: myapp
spec:
  selector:
    app: postgres
  ports:
    - port: 5432

# api.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api
  namespace: myapp
spec:
  replicas: 2
  selector:
    matchLabels:
      app: api
  template:
    metadata:
      labels:
        app: api
    spec:
      containers:
        - name: api
          image: yourregistry/myapp-api:2.1.0
          ports:
            - containerPort: 8080
          env:
            - name: DATABASE_URL
              valueFrom:
                secretKeyRef:
                  name: api-secret
                  key: database-url
            - name: PORT
              value: "8080"
          readinessProbe:
            httpGet:
              path: /health
              port: 8080
            initialDelaySeconds: 5
            periodSeconds: 10
          livenessProbe:
            httpGet:
              path: /health
              port: 8080
            initialDelaySeconds: 15
            periodSeconds: 20
          resources:
            requests:
              memory: "128Mi"
              cpu: "100m"
            limits:
              memory: "256Mi"
              cpu: "500m"
---
apiVersion: v1
kind: Service
metadata:
  name: api
  namespace: myapp
spec:
  selector:
    app: api
  ports:
    - port: 80
      targetPort: 8080

# ingress.yaml — Traefik IngressRoute
apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
  name: api-ingress
  namespace: myapp
spec:
  entryPoints:
    - websecure
  routes:
    - match: Host(`api.yourdomain.com`)
      kind: Rule
      services:
        - name: api
          port: 80
  tls:
    certResolver: letsencrypt

Secrets Management

Never put secrets in manifests. Use kubectl create secret or Sealed Secrets:

# create secrets imperatively
kubectl create secret generic postgres-secret \
  --namespace myapp \
  --from-literal=username=myapp \
  --from-literal=password=$(openssl rand -base64 32)

kubectl create secret generic api-secret \
  --namespace myapp \
  --from-literal=database-url="postgres://myapp:password@postgres:5432/myapp?sslmode=disable"

Case 3: Multi-Node HA Cluster

For production workloads that need high availability, K3s supports a multi-server setup with embedded etcd. The minimum HA configuration is three server nodes (for etcd quorum).

Architecture

                    ┌──────────────────────────────┐
                    │   Load Balancer (nginx/haproxy)│
                    │      TCP pass-through :6443    │
                    └──────┬──────┬──────┬───────────┘
                           │      │      │
                    ┌──────▼──┐ ┌──▼─────▼┐ ┌──────────┐
                    │Server 1 │ │Server 2 │ │ Server 3 │
                    │(etcd)   │ │(etcd)   │ │ (etcd)   │
                    └─────────┘ └─────────┘ └──────────┘
                           │
              ┌────────────┼────────────┐
       ┌──────▼──┐  ┌──────▼──┐  ┌──────▼──┐
       │ Agent 1 │  │ Agent 2 │  │ Agent 3 │
       │(worker) │  │(worker) │  │(worker) │
       └─────────┘  └─────────┘  └─────────┘

Setting Up the HA Cluster

# on Server 1 — initialize the cluster with embedded etcd
curl -sfL https://get.k3s.io | sh -s - server \
  --cluster-init \
  --tls-san YOUR_LB_IP \
  --tls-san server1.internal \
  --disable traefik \
  --node-taint CriticalAddonsOnly=true:NoExecute

# get the cluster token
sudo cat /var/lib/rancher/k3s/server/node-token
# → K10abc...::server:xyz...

# on Server 2 and Server 3 — join the cluster
curl -sfL https://get.k3s.io | sh -s - server \
  --server https://SERVER1_IP:6443 \
  --token K10abc...::server:xyz... \
  --tls-san YOUR_LB_IP \
  --disable traefik \
  --node-taint CriticalAddonsOnly=true:NoExecute

# on each Agent node
curl -sfL https://get.k3s.io | K3S_URL=https://YOUR_LB_IP:6443 \
  K3S_TOKEN=K10abc...::server:xyz... sh -

The --node-taint CriticalAddonsOnly=true:NoExecute on server nodes prevents workloads from being scheduled on the control plane nodes, keeping them available for cluster management. All application workloads run on the agent nodes.

The --tls-san YOUR_LB_IP adds the load balancer IP to the TLS certificate so clients connecting through it do not get a certificate error.

Load Balancer Configuration (nginx)

# /etc/nginx/nginx.conf (on the load balancer node)
stream {
    upstream k3s_servers {
        server server1.internal:6443;
        server server2.internal:6443;
        server server3.internal:6443;
    }

    server {
        listen 6443;
        proxy_pass k3s_servers;
        proxy_timeout 10s;
        proxy_connect_timeout 5s;
    }
}

For a simpler setup without a dedicated load balancer node, use kube-vip as an in-cluster load balancer for the control plane:

# deploy kube-vip as a DaemonSet on server nodes
kubectl apply -f https://kube-vip.io/manifests/rbac.yaml

# create the kube-vip DaemonSet manifest
KVVERSION=$(curl -sL https://api.github.com/repos/kube-vip/kube-vip/releases | jq -r ".[0].name")
alias kube-vip="ctr run --rm --net-host ghcr.io/kube-vip/kube-vip:$KVVERSION vip /kube-vip"
kube-vip manifest daemonset \
  --interface eth0 \
  --address 192.168.1.100 \
  --inCluster \
  --taint \
  --controlplane \
  --services \
  --arp \
  --leaderElection | kubectl apply -f -

Case 4: Raspberry Pi and Edge Devices

K3s was designed with ARM in mind. It runs on Raspberry Pi 4 (4GB), Raspberry Pi 5, and other ARM single-board computers. This makes it the natural choice for edge deployments, home automation hubs, and offline-capable IoT clusters.

Raspberry Pi Setup

# on Raspberry Pi OS (64-bit recommended)
# enable cgroups — required for container runtime
echo "cgroup_memory=1 cgroup_enable=memory" | sudo tee -a /boot/cmdline.txt
sudo reboot

# install K3s (detects ARM64 automatically)
curl -sfL https://get.k3s.io | sh -

# verify
sudo kubectl get nodes
# NAME        STATUS   ROLES                  AGE   VERSION
# rpi4-node   Ready    control-plane,master   30s   v1.31.x+k3s1

ARM-specific considerations:

Use multi-arch container images (--platform linux/arm64 in your build). Single-arch amd64 images will fail with exec format error.
Build ARM images with Docker Buildx:

# set up buildx for multi-arch
docker buildx create --name multiarch --use
docker buildx inspect --bootstrap

# build and push multi-arch image
docker buildx build \
  --platform linux/amd64,linux/arm64 \
  --tag yourregistry/myapp:2.1.0 \
  --push .

SD card I/O is slow. Move the K3s data directory to a USB SSD:

# mount the SSD
sudo mkdir -p /mnt/ssd
sudo mount /dev/sda1 /mnt/ssd

# install K3s with custom data directory
curl -sfL https://get.k3s.io | INSTALL_K3S_EXEC="--data-dir /mnt/ssd/k3s" sh -

Offline Edge Deployments

Edge devices often operate in environments with intermittent or no internet connectivity. K3s handles this by design — once the cluster is running and workloads are deployed, it operates independently of any external API.

For air-gapped installation:

# on an internet-connected machine, download K3s artifacts
VERSION=v1.31.0+k3s1
wget https://github.com/k3s-io/k3s/releases/download/$VERSION/k3s-arm64
wget https://github.com/k3s-io/k3s/releases/download/$VERSION/k3s-airgap-images-arm64.tar.zst
wget https://github.com/k3s-io/k3s/releases/download/$VERSION/sha256sum-arm64.txt

# transfer to the edge device
scp k3s-arm64 pi@edgedevice:/usr/local/bin/k3s
scp k3s-airgap-images-arm64.tar.zst pi@edgedevice:/var/lib/rancher/k3s/agent/images/

# on the edge device
chmod +x /usr/local/bin/k3s

# install using the local binary (INSTALL_K3S_SKIP_DOWNLOAD prevents curl to GitHub)
INSTALL_K3S_SKIP_DOWNLOAD=true \
INSTALL_K3S_VERSION=$VERSION \
./install.sh

Pre-load your application images into the air-gapped node:

# on internet-connected machine
docker pull yourregistry/myapp:2.1.0
docker save yourregistry/myapp:2.1.0 > myapp.tar

# on edge device
sudo k3s ctr images import myapp.tar

Case 5: Homelab

A homelab K3s cluster running on old hardware or a small group of mini-PCs is an excellent way to learn production Kubernetes patterns without cloud costs. The typical homelab stack: K3s + MetalLB + Traefik + cert-manager + Longhorn + a few self-hosted services.

MetalLB: LoadBalancer on Bare Metal

Cloud Kubernetes clusters have a native LoadBalancer service type backed by the cloud’s load balancer service. On bare metal, LoadBalancer services stay in <Pending> state forever without a tool to handle them. MetalLB assigns real IP addresses from a pool you define.

# install MetalLB
kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.14.9/config/manifests/metallb-native.yaml

# wait for MetalLB pods
kubectl wait --namespace metallb-system \
  --for=condition=ready pod \
  --selector=app=metallb \
  --timeout=90s

Configure an IP address pool (use IPs from your LAN that are not in the DHCP range):

# metallb-config.yaml
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
  name: homelab-pool
  namespace: metallb-system
spec:
  addresses:
    - 192.168.1.200-192.168.1.220   # reserve these in your router
---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
  name: homelab-l2
  namespace: metallb-system
spec:
  ipAddressPools:
    - homelab-pool

Now LoadBalancer services get real IPs from this pool, reachable from any device on the LAN.

Traefik with cert-manager for HTTPS

K3s bundles Traefik, but for homelab use you want cert-manager to handle certificates — either from Let’s Encrypt (for a real domain with DNS pointing to your home IP) or from a self-signed CA for internal .home.arpa domains.

# install cert-manager
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.17.0/cert-manager.yaml

For a self-signed CA (internal services):

# cluster-issuer.yaml
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: selfsigned-issuer
spec:
  selfSigned: {}
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: homelab-ca
  namespace: cert-manager
spec:
  isCA: true
  commonName: homelab-ca
  secretName: homelab-ca-secret
  privateKey:
    algorithm: ECDSA
    size: 256
  issuerRef:
    name: selfsigned-issuer
    kind: ClusterIssuer
    group: cert-manager.io
---
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: homelab-ca-issuer
spec:
  ca:
    secretName: homelab-ca-secret

For Let’s Encrypt (public domain with DNS-01 challenge via Cloudflare):

# letsencrypt-issuer.yaml
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    email: you@yourdomain.com
    server: https://acme-v02.api.letsencrypt.org/directory
    privateKeySecretRef:
      name: letsencrypt-prod-key
    solvers:
      - dns01:
          cloudflare:
            email: you@yourdomain.com
            apiTokenSecretRef:
              name: cloudflare-token
              key: api-token

Longhorn: Distributed Block Storage

The bundled local-path provisioner creates node-local volumes — if the node dies, the data is gone. For a multi-node homelab where you want data to survive node failures, Longhorn provides replicated block storage.

# prerequisites
kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/v1.8.0/deploy/prerequisite/longhorn-iscsi-installation.yaml

# install Longhorn
kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/v1.8.0/deploy/longhorn.yaml

# watch installation
kubectl get pods --namespace longhorn-system --watch

Make Longhorn the default storage class:

# remove default from local-path
kubectl patch storageclass local-path -p '{"metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"false"}}}'

# set Longhorn as default
kubectl patch storageclass longhorn -p '{"metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'

Longhorn PVCs now replicate data across nodes:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: my-data
spec:
  accessModes: [ReadWriteOnce]
  storageClassName: longhorn
  resources:
    requests:
      storage: 5Gi

Case 6: CI/CD — Deploying to K3s from GitHub Actions

The standard deployment pattern: build and push an image in CI, then update the Kubernetes deployment to use the new image.

Setting Up Access

GitHub Actions needs kubectl access to your K3s cluster. The safest way is a service account with limited RBAC permissions — not the admin kubeconfig.

# deploy-sa.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: github-deploy
  namespace: myapp
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: deploy-role
  namespace: myapp
rules:
  - apiGroups: ["apps"]
    resources: ["deployments"]
    verbs: ["get", "patch", "list"]
  - apiGroups: [""]
    resources: ["pods", "services"]
    verbs: ["get", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: github-deploy-binding
  namespace: myapp
subjects:
  - kind: ServiceAccount
    name: github-deploy
    namespace: myapp
roleRef:
  kind: Role
  name: deploy-role
  apiGroup: rbac.authorization.k8s.io
---
apiVersion: v1
kind: Secret
metadata:
  name: github-deploy-token
  namespace: myapp
  annotations:
    kubernetes.io/service-account.name: github-deploy
type: kubernetes.io/service-account-token

Extract the kubeconfig for the service account:

# get the token
TOKEN=$(kubectl get secret github-deploy-token -n myapp -o jsonpath='{.data.token}' | base64 -d)

# get the CA cert
CA=$(kubectl get secret github-deploy-token -n myapp -o jsonpath='{.data.ca\.crt}')

# get the cluster server URL
SERVER=$(kubectl config view --minify -o jsonpath='{.clusters[0].cluster.server}')

# build the kubeconfig
cat << EOF
apiVersion: v1
kind: Config
clusters:
- cluster:
    certificate-authority-data: $CA
    server: $SERVER
  name: k3s
contexts:
- context:
    cluster: k3s
    namespace: myapp
    user: github-deploy
  name: k3s
current-context: k3s
users:
- name: github-deploy
  user:
    token: $TOKEN
EOF

Store this as a GitHub Actions secret (KUBECONFIG_B64 — base64 encoded).

The Deployment Workflow

# .github/workflows/deploy.yml
name: Deploy

on:
  push:
    branches: [main]

env:
  REGISTRY: ghcr.io
  IMAGE_NAME: ${{ github.repository }}/api

jobs:
  build-and-deploy:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      packages: write

    steps:
      - uses: actions/checkout@v4

      - name: Log in to GHCR
        uses: docker/login-action@v3
        with:
          registry: ${{ env.REGISTRY }}
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}

      - name: Build and push
        uses: docker/build-push-action@v6
        with:
          context: .
          platforms: linux/amd64,linux/arm64
          push: true
          tags: |
            ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }}
            ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:latest
          cache-from: type=gha
          cache-to: type=gha,mode=max

      - name: Deploy to K3s
        env:
          KUBECONFIG_DATA: ${{ secrets.KUBECONFIG_B64 }}
        run: |
          echo "$KUBECONFIG_DATA" | base64 -d > /tmp/kubeconfig
          export KUBECONFIG=/tmp/kubeconfig

          # update the deployment image
          kubectl set image deployment/api \
            api=${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }} \
            --namespace myapp

          # wait for rollout to complete
          kubectl rollout status deployment/api \
            --namespace myapp \
            --timeout=5m

      - name: Verify deployment
        env:
          KUBECONFIG_DATA: ${{ secrets.KUBECONFIG_B64 }}
        run: |
          echo "$KUBECONFIG_DATA" | base64 -d > /tmp/kubeconfig
          export KUBECONFIG=/tmp/kubeconfig
          kubectl get pods --namespace myapp

Case 7: GitOps with Flux

GitOps inverts the push-based CI/CD model: instead of CI pushing changes to the cluster, a controller in the cluster pulls from a Git repository and applies changes. The Git repository becomes the single source of truth for cluster state.

Flux is the CNCF-graduated GitOps tool. It watches a Git repo and applies any Kubernetes manifests it finds there.

Installing Flux

# install Flux CLI
curl -s https://fluxcd.io/install.sh | sudo bash

# bootstrap Flux onto K3s — this creates a Git repo and commits the Flux manifests
flux bootstrap github \
  --owner=yourorg \
  --repository=k3s-gitops \
  --branch=main \
  --path=clusters/production \
  --personal

After bootstrap, the k3s-gitops repo exists with Flux’s own manifests committed. Any manifests you add under clusters/production/ are automatically applied.

GitRepository and Kustomization

# clusters/production/myapp/source.yaml
apiVersion: source.toolkit.fluxcd.io/v1
kind: GitRepository
metadata:
  name: myapp
  namespace: flux-system
spec:
  interval: 1m
  url: https://github.com/yourorg/myapp
  ref:
    branch: main
---
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
  name: myapp
  namespace: flux-system
spec:
  interval: 5m
  path: ./k8s/production
  prune: true           # delete resources removed from Git
  sourceRef:
    kind: GitRepository
    name: myapp
  healthChecks:
    - apiVersion: apps/v1
      kind: Deployment
      name: api
      namespace: myapp

Image Automation

Flux can also watch a container registry and automatically update the Git repo when a new image is pushed:

# image-policy.yaml
apiVersion: image.toolkit.fluxcd.io/v1beta2
kind: ImageRepository
metadata:
  name: myapp-api
  namespace: flux-system
spec:
  image: ghcr.io/yourorg/myapp/api
  interval: 5m
---
apiVersion: image.toolkit.fluxcd.io/v1beta2
kind: ImagePolicy
metadata:
  name: myapp-api
  namespace: flux-system
spec:
  imageRepositoryRef:
    name: myapp-api
  policy:
    semver:
      range: '>=2.0.0 <3.0.0'    # only minor and patch updates
---
apiVersion: image.toolkit.fluxcd.io/v1beta2
kind: ImageUpdateAutomation
metadata:
  name: myapp
  namespace: flux-system
spec:
  interval: 5m
  sourceRef:
    kind: GitRepository
    name: myapp
  git:
    checkout:
      ref:
        branch: main
    commit:
      author:
        email: fluxbot@yourorg.com
        name: Flux Bot
      messageTemplate: 'chore: update {{range .Updated.Images}}{{println .}}{{end}}'
    push:
      branch: main
  update:
    path: ./k8s/production
    strategy: Setters

In your Deployment manifest, mark the image field for automation:

containers:
  - name: api
    image: ghcr.io/yourorg/myapp/api:2.1.0 # {"$imagepolicy": "flux-system:myapp-api"}

When a new image matching the semver policy is pushed to the registry, Flux automatically commits a manifest update to Git and applies it to the cluster.

Case 8: Observability Stack

A K3s cluster without observability is a black box. The standard stack — Prometheus for metrics, Grafana for visualization, Loki for logs — fits comfortably on a cluster with 4GB+ RAM.

kube-prometheus-stack (Helm)

# add Helm repo
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

# install the full stack
helm install monitoring prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --create-namespace \
  --set grafana.adminPassword=$(openssl rand -base64 24) \
  --set prometheus.prometheusSpec.retention=7d \
  --set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.storageClassName=local-path \
  --set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=10Gi \
  --set alertmanager.alertmanagerSpec.storage.volumeClaimTemplate.spec.storageClassName=local-path \
  --set alertmanager.alertmanagerSpec.storage.volumeClaimTemplate.spec.resources.requests.storage=2Gi

Exposing Grafana with Traefik

# grafana-ingress.yaml
apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
  name: grafana
  namespace: monitoring
spec:
  entryPoints:
    - websecure
  routes:
    - match: Host(`grafana.yourdomain.com`)
      kind: Rule
      services:
        - name: monitoring-grafana
          port: 80
  tls:
    certResolver: letsencrypt

Scraping Custom Application Metrics

In your Go application, expose a /metrics endpoint using the prometheus/client_golang library:

// internal/metrics/metrics.go
package metrics

import (
    "github.com/prometheus/client_golang/prometheus"
    "github.com/prometheus/client_golang/prometheus/promauto"
)

var (
    HTTPRequestsTotal = promauto.NewCounterVec(
        prometheus.CounterOpts{
            Name: "http_requests_total",
            Help: "Total HTTP requests",
        },
        []string{"method", "path", "status"},
    )

    HTTPRequestDuration = promauto.NewHistogramVec(
        prometheus.HistogramOpts{
            Name:    "http_request_duration_seconds",
            Help:    "HTTP request duration",
            Buckets: prometheus.DefBuckets,
        },
        []string{"method", "path"},
    )
)

Tell Prometheus to scrape it with a ServiceMonitor:

# servicemonitor.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: myapp-api
  namespace: myapp
  labels:
    release: monitoring    # must match the Helm release label selector
spec:
  selector:
    matchLabels:
      app: api
  endpoints:
    - port: http
      path: /metrics
      interval: 30s

Loki for Logs

helm repo add grafana https://grafana.github.io/helm-charts
helm install loki grafana/loki-stack \
  --namespace monitoring \
  --set promtail.enabled=true \
  --set loki.persistence.enabled=true \
  --set loki.persistence.storageClassName=local-path \
  --set loki.persistence.size=10Gi

Loki collects all pod logs via Promtail (a DaemonSet that tails container logs). In Grafana, add Loki as a data source and use LogQL to query:

{namespace="myapp", app="api"} |= "error" | json | level="error"

Case 9: Networking Deep Dive

K3s ships with Flannel as its CNI and Traefik as its ingress controller. For most use cases these are adequate. When they are not, here are the alternatives.

Replacing Flannel with Cilium

Cilium provides eBPF-based networking with better performance, network policies, and observability. Install K3s without Flannel:

curl -sfL https://get.k3s.io | sh -s - \
  --flannel-backend=none \
  --disable-network-policy \
  --disable traefik

# install Cilium
helm repo add cilium https://helm.cilium.io/
helm install cilium cilium/cilium \
  --namespace kube-system \
  --set operator.replicas=1 \
  --set kubeProxyReplacement=true \
  --set k8sServiceHost=$(hostname -I | awk '{print $1}') \
  --set k8sServicePort=6443

Network Policies

Even with Flannel, K3s supports NetworkPolicy resources if you need to restrict traffic between namespaces or pods. With Cilium, policies are enforced at the eBPF level (more efficient and more capable):

# deny-all-ingress-by-default.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-ingress
  namespace: myapp
spec:
  podSelector: {}         # applies to all pods in the namespace
  policyTypes:
    - Ingress
---
# allow ingress to api from traefik only
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-traefik-to-api
  namespace: myapp
spec:
  podSelector:
    matchLabels:
      app: api
  policyTypes:
    - Ingress
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: kube-system
          podSelector:
            matchLabels:
              app.kubernetes.io/name: traefik
      ports:
        - protocol: TCP
          port: 8080

Traefik Middleware

Traefik’s middleware system allows you to add rate limiting, authentication, CORS headers, and redirects to any route without modifying the application:

# middlewares.yaml
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
  name: rate-limit
  namespace: myapp
spec:
  rateLimit:
    average: 100
    burst: 50
---
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
  name: https-redirect
  namespace: myapp
spec:
  redirectScheme:
    scheme: https
    permanent: true
---
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
  name: security-headers
  namespace: myapp
spec:
  headers:
    stsSeconds: 31536000
    stsIncludeSubdomains: true
    contentTypeNosniff: true
    frameDeny: true
    xssProtection: "1; mode=block"
    referrerPolicy: "strict-origin-when-cross-origin"

Apply middleware to an IngressRoute:

routes:
  - match: Host(`api.yourdomain.com`)
    kind: Rule
    middlewares:
      - name: rate-limit
      - name: security-headers
    services:
      - name: api
        port: 80

Day-Two Operations

Upgrading K3s

K3s provides a system-upgrade-controller that handles rolling upgrades across the cluster using a Plan custom resource.

# install the upgrade controller
kubectl apply -f https://github.com/rancher/system-upgrade-controller/releases/latest/download/system-upgrade-controller.yaml

# upgrade-plan.yaml
apiVersion: upgrade.cattle.io/v1
kind: Plan
metadata:
  name: k3s-server
  namespace: system-upgrade
spec:
  concurrency: 1
  cordon: true
  nodeSelector:
    matchExpressions:
      - key: node-role.kubernetes.io/control-plane
        operator: In
        values: ["true"]
  serviceAccountName: system-upgrade
  upgrade:
    image: rancher/k3s-upgrade
  channel: https://update.k3s.io/v1-release/channels/stable
---
apiVersion: upgrade.cattle.io/v1
kind: Plan
metadata:
  name: k3s-agent
  namespace: system-upgrade
spec:
  concurrency: 2
  cordon: true
  nodeSelector:
    matchExpressions:
      - key: node-role.kubernetes.io/control-plane
        operator: DoesNotExist
  serviceAccountName: system-upgrade
  upgrade:
    image: rancher/k3s-upgrade
  prepare:
    image: rancher/k3s-upgrade
    args: ["prepare", "k3s-server"]
  channel: https://update.k3s.io/v1-release/channels/stable

The prepare step on agent nodes waits for server nodes to finish upgrading first — ensuring API compatibility is maintained during the rolling upgrade.

Backup and Restore

For a single-node K3s cluster, the SQLite database holds all cluster state:

# backup (K3s must be stopped or snapshot taken)
sudo systemctl stop k3s
sudo cp /var/lib/rancher/k3s/server/db/state.db /backup/k3s-state-$(date +%Y%m%d).db
sudo systemctl start k3s

# automated backup with systemd timer
sudo tee /etc/systemd/system/k3s-backup.service << 'EOF'
[Unit]
Description=K3s SQLite backup

[Service]
Type=oneshot
ExecStart=/bin/bash -c 'cp /var/lib/rancher/k3s/server/db/state.db /backup/k3s-$(date +%Y%m%d-%H%M).db && find /backup -name "k3s-*.db" -mtime +7 -delete'
EOF

sudo tee /etc/systemd/system/k3s-backup.timer << 'EOF'
[Unit]
Description=Run K3s backup every 6 hours

[Timer]
OnCalendar=*-*-* 00,06,12,18:00:00
Persistent=true

[Install]
WantedBy=timers.target
EOF

sudo systemctl enable --now k3s-backup.timer

For HA clusters with embedded etcd, use K3s’s built-in snapshot command:

# take a manual snapshot
sudo k3s etcd-snapshot save --name pre-upgrade-snapshot

# list snapshots
sudo k3s etcd-snapshot ls

# restore (cluster must be stopped)
sudo k3s server \
  --cluster-reset \
  --cluster-reset-restore-path=/var/lib/rancher/k3s/server/db/snapshots/pre-upgrade-snapshot

Node Maintenance

# drain a node (evict all pods, cordon for scheduling)
kubectl drain node-name \
  --ignore-daemonsets \
  --delete-emptydir-data \
  --grace-period=60

# perform maintenance (OS updates, hardware work)
# ...

# uncordon after maintenance
kubectl uncordon node-name

# remove a node from the cluster permanently
kubectl delete node node-name

# on the node itself, uninstall the K3s agent
/usr/local/bin/k3s-agent-uninstall.sh

Resource Quotas per Namespace

Prevent any single namespace from consuming all cluster resources:

# resource-quota.yaml
apiVersion: v1
kind: ResourceQuota
metadata:
  name: myapp-quota
  namespace: myapp
spec:
  hard:
    requests.cpu: "4"
    requests.memory: 4Gi
    limits.cpu: "8"
    limits.memory: 8Gi
    pods: "20"
    persistentvolumeclaims: "10"
---
apiVersion: v1
kind: LimitRange
metadata:
  name: myapp-limits
  namespace: myapp
spec:
  limits:
    - type: Container
      default:
        cpu: "500m"
        memory: "256Mi"
      defaultRequest:
        cpu: "100m"
        memory: "128Mi"

Choosing the Right K3s Configuration

Not every cluster needs every component. A decision matrix for the most common choices:

Storage class:

Single node, data can be rebuilt → local-path (default)
Multi-node, data must survive node failure → Longhorn
Performance-critical database → dedicated PV on fast local NVMe, not replicated

Ingress:

Simple HTTP/HTTPS routing → bundled Traefik (default)
Advanced traffic management, gRPC, TCP → Traefik with custom configuration
Enterprise features, WAF → nginx-ingress or Kong

CNI:

Standard workloads → Flannel (default)
Network policies, observability, performance → Cilium
Strict network isolation requirements → Calico

Cluster topology:

Single developer or small service → 1 server node (SQLite)
Production with HA → 3 server nodes + N agent nodes (embedded etcd)
Edge/offline → 1 server node, pre-loaded images, local-path storage

K3s scales from a single Raspberry Pi to a fifty-node bare metal cluster, with the same kubectl API across every configuration. The investment in learning it — the manifests, the Helm charts, the GitOps patterns — transfers across every deployment target without modification.

K3s is not a simplified Kubernetes. It is Kubernetes with the operational weight removed — the same API, the same compatibility, the same ecosystem, running on hardware that full Kubernetes cannot justify.