Mastering OpenShift on VMware and Bare Metal: Key Insights

Administering OpenShift on VMware vSphere or Bare Metal is significantly more complex than cloud environments because you are responsible for the “underlay” (the physical or virtual infrastructure) as well as the “overlay” (OpenShift).

In a 2026 interview, expect a focus on automation, connectivity in restricted environments, and hardware lifecycle.


1. Installation & Provisioning (The Foundation)

Q1: Compare IPI vs. UPI in the context of VMware vSphere.
  • IPI (Installer-Provisioned Infrastructure): The installer has the vCenter credentials. It automatically creates the Folder, Virtual Machines, and Resource Pools. It also handles the VIP (Virtual IPs) for the API and Ingress via Keepalived.
  • UPI (User-Provisioned Infrastructure): You manually create the VMs, set up the Load Balancers (F5, HAProxy), and configure DNS.
  • Interview Tip: Mention that IPI is preferred for speed and “automated scaling,” but UPI is often mandatory in “Brownfield” environments where the networking team won’t give the installer full control over the VLANs.
Q2: How does OpenShift interact with physical hardware for Bare Metal?

Answer: It uses the Metal3 project and the Bare Metal Operator (BMO).

  • The admin provides the BMC (Baseboard Management Controller) details—like IPMI, iDRAC (Dell), or iLO (HP)—to OpenShift.
  • OpenShift uses these to remotely power on the server, PXE boot it, and install RHCOS (Red Hat Enterprise Linux CoreOS).

2. Infrastructure Operations

Q3: What is a “Disconnected” (Air-Gapped) Installation?

Answer: Common in on-prem data centers with high security.

  • The Problem: OpenShift usually pulls images from quay.io.
  • The Solution: You must set up a Local Mirror Registry (like Red Hat Quay or Sonatype Nexus).
  • Process: You use the oc mirror plugin to download all required images to a portable disk, move it inside the secure zone, and push them to your local registry. You then configure the cluster to use an ImageContentSourcePolicy to redirect all image pulls to your local IP.
Q4: How do you handle storage on VMware vs. Bare Metal?
  • VMware: Use the vSphere CSI Driver. This allows OpenShift to talk to vCenter and dynamically provision .vmdk files as Persistent Volumes (PVs).
  • Bare Metal: You typically use LVM (Local Storage Operator) for fast local SSDs or OpenShift Data Foundation (ODF) (based on Ceph). ODF is the industry standard for on-prem because it provides S3-compatible, Block, and File storage within the cluster itself.

3. High Availability & Networking

Q5: On Bare Metal, how do you handle Load Balancing for the API and Ingress?

Answer: Since there is no “AWS ELB” on-prem, you have two choices:

  1. External: Use a physical appliance like an F5 Big-IP or a pair of HAProxy nodes managed by your team.
  2. Internal (MetalLB): Use the MetalLB Operator. It allows you to assign a range of IPs from your corporate network to the OpenShift Router so it can act like a cloud load balancer.
Q6: What happens if a Master (Control Plane) node dies in a Bare Metal cluster?

Answer: * Quorum: You must have 3 Masters to maintain an etcd quorum. If one dies, the cluster survives. If two die, the API becomes read-only or crashes.

  • Recovery: On Bare Metal, recovery is manual. You must reinstall the OS, use the kube-etcd-operator to remove the old member, and then use the cluster-bootstrap process to add the new node back into the etcd ring.

4. Advanced Troubleshooting

Q7: A worker node is “NotReady” on VMware. What is your first check?

Answer: Beyond the logs, I check the VMware Tools status and Time Sync.

  • If the ESXi host and the VM have a clock drift (common if NTP is misconfigured), the certificates for the Kubelet will fail to validate, and the node will go NotReady.
  • I would also check the MachineConfigPool (MCP). If the node is stuck in “Updating,” it might be failing to pull an OS image from the internal registry.
Q8: What is “Assisted Installer”?

Answer: It’s the modern way to install OpenShift on-prem. It provides a web-based GUI that generates a “Discovery ISO.” You boot your physical servers with this ISO; they “check in” to the portal, and you can then click “Install” to deploy the whole cluster without writing complex YAML files.


Technical “Buzzwords” for 2026:

  • OVN-Kubernetes: The default network plugin (replaces OpenShift SDN).
  • LVM Storage: Used for high-performance databases on bare metal.
  • Red Hat Advanced Cluster Management (RHACM): If the company has multiple on-prem clusters, they will use this to manage them all from one dashboard.

Debugging etcd is the highest level of OpenShift administration. If etcd is healthy, the cluster is healthy; if etcd is failing, the API will be sluggish or completely unresponsive.

Here is the technical deep-dive on how to diagnose and fix etcd on-premise.


1. Checking the High-Level Status

Before diving into logs, check if the Etcd Operator is happy. If the operator is degraded, it usually means it’s struggling to manage the quorum.

# Check the status of the etcd cluster operator
oc get clusteroperator etcd
# Check the status of the individual etcd pods
oc get pods -n openshift-etcd -l app=etcd

2. Testing Quorum and Health (The etcdctl way)

In OpenShift 4.x, etcd runs as Static Pods on the master nodes. To run diagnostic commands, you must use a helper script or exec into the container.

The “Is it alive?” check:

# Get a list of etcd members and their health
oc rsh -n openshift-etcd etcd-master-0 etcdctl endpoint health --cluster -w table
The Performance check (Disk Latency):

On-premise (especially VMware), Disk I/O latency is the #1 killer of etcd. If your storage is slow, etcd will lose quorum.

# Check the sync duration
oc rsh -n openshift-etcd etcd-master-0 etcdctl check perf

Interview Pro-Tip: Mention that etcd requires fsync latency of less than 10ms. If it’s higher, your underlying VMware datastore or Bare Metal disks are too slow for an enterprise cluster.


3. Investigating Logs

If a pod is crashing, check the logs specifically for “leader” issues or “wal” (Write Ahead Log) errors.

# View the last 100 lines of logs from a specific member
oc logs -n openshift-etcd etcd-master-0 -c etcd --tail=100

What to look for:

  • "lost leader": Indicates network instability between master nodes.
  • "apply entries took too long": Indicates slow disk or high CPU pressure on the master node.
  • "database space exceeded": The 8GB quota has been reached (requires a defrag).

4. Critical Recovery: The “Master Node Replacement”

If a master node (e.g., master-1) hardware fails permanently on Bare Metal, you must follow these steps to restore the cluster health:

  1. Remove the ghost member:Tell etcd to stop looking for the dead node.Bashoc rsh -n openshift-etcd etcd-master-0 etcdctl member list oc rsh -n openshift-etcd etcd-master-0 etcdctl member remove <dead-member-id>
  2. Clean up the Node object:oc delete node master-1
  3. Re-provision: Boot the new hardware with the RHCOS ISO. If using IPI, the Machine API might do this for you. If UPI, you must manually trigger the CSR (Certificate Signing Request) approval.
  4. Approve CSRs:The new master won’t join until you approve its certificates:oc get csr | grep Pending | awk '{print $1}' | xargs oc adm certificate approve

5. Compaction and Defragmentation

Over time, etcd keeps versions of objects. If the database grows too large, the cluster will stop accepting writes (Error: mvcc: database space exceeded).

The Fix:

OpenShift normally handles this automatically, but as an admin, you might need to force it:

# Defragment the endpoint
oc rsh -n openshift-etcd etcd-master-0 etcdctl defrag --cluster

The “Final Boss” Interview Question:

“We lost 2 out of 3 master nodes. The API is down. How do you recover?”

The Answer:

  1. Since quorum is lost (needs $n/2 + 1$ nodes), you must perform a Single Master Recovery.
  2. Stop the etcd service on the remaining healthy master.
  3. Run the etcd-snapshot-restore.sh script (shipped with OpenShift) using a previous backup.
  4. This forces the remaining master to become a “New Cluster” of one.
  5. Once the API is back up, you re-join the other two nodes as brand-new members.

Since OpenShift 4.12+, OVN-Kubernetes has become the default network provider, replacing the older OpenShift SDN. For an on-premise administrator, understanding this is vital because it changes how traffic flows from your physical switches into your pods.


1. OVN-Kubernetes Architecture

Unlike the old SDN which used Open vSwitch (OVS) in a basic way, OVN (Open Virtual Network) brings a distributed logical router and switch to every node.

  • Geneve Encap: OVN uses Geneve (Generic Network Virtualization Encapsulation) instead of VXLAN to tunnel traffic between nodes. It’s more flexible and allows for more metadata.
  • The Gateway: Every node has a “Gateway” that handles traffic entering and exiting the cluster. On-premise, this is where your physical network interface (e.g., eno1 or ens192) meets the virtual world.

2. On-Premise Networking Challenges

Q1: How does OpenShift handle “External” IPs on-prem?

In the cloud, you have a LoadBalancer service. On-prem, you don’t.

The Admin Solution: MetalLB.

As an admin, you configure a MetalLB Operator with an IP address pool from your actual data center VLAN. When a developer creates a Service of type LoadBalancer, MetalLB uses ARP (Layer 2) or BGP (Layer 3) to announce that IP address to your physical routers.

Q2: What is the “Ingress VIP” and “API VIP”?

During a VMware/Bare Metal IPI install, you are asked for two IPs:

  1. API VIP: The floating IP used to talk to the control plane (Port 6443).
  2. Ingress VIP: The floating IP for all application traffic (Ports 80/443).Mechanism: OpenShift uses Keepalived and HAProxy internally to float these IPs between the master nodes (for API) or worker nodes (for Ingress). If the node holding the IP fails, it “floats” to another node in seconds.

3. Troubleshooting the Network

If pods can’t talk to each other, follow this “inside-out” path:

Step 1: Check the Cluster Network Operator (CNO)

If the CNO is degraded, the entire network is unstable.

oc get clusteroperator network
Step 2: Trace the Flow with oc adm network

OpenShift provides a built-in tool to verify if two pods can actually talk to each other across nodes:

Bash

oc adm pod-network diagnostic
Step 3: Inspect the OVN Database

Since OVN stores the network state in a database (Northbound and Southbound DBs), you can check if the logical flows are actually created.

# Get the logs of the ovnkube-master
oc logs -n openshift-ovn-kubernetes -l app=ovnkube-master

4. Key Concepts for Interview Scenarios
Scenario: “Applications are slow only when talking to external databases.”
  • Likely Culprit: MTU Mismatch. * Explanation: Geneve encapsulation adds 100 bytes of overhead to every packet. If your physical network is set to standard MTU (1500), but OpenShift is also sending 1500, the packets get fragmented, causing a massive performance hit.
  • The Fix: Ensure the cluster MTU is set to 1400 (1500 – 100) or enable Jumbo Frames (9000) on your physical switches.
Scenario: “How do you isolate traffic between two departments on the same cluster?”
  • The Answer: NetworkPolicies. * OVN-Kubernetes supports standard Kubernetes NetworkPolicy objects. By default, all pods can talk to all pods. I would implement a “Deny-All” default policy and then explicitly allow traffic only between required microservices.

Summary for Administrator Interview

FeatureOpenShift SDN (Old)OVN-Kubernetes (New/Standard)
EncapsulationVXLANGeneve
Network PolicyLimitedFully Featured (Egress/Ingress)
Hybrid CloudHard to implementDesigned for it (IPsec support)
Windows SupportNoYes

Essential OpenShift Q&A: Architecture, Security & Workflow

In an OpenShift interview, the questions typically fall into three categories: Architecture/Concepts, Security (SCCs/RBAC), and Developer Workflow (S2I/Builds).

Here is a curated list of the most common and high-impact questions for 2026.


1. Core Architecture & Concepts

Q1: What is the fundamental difference between OpenShift and Kubernetes?

Answer: While Kubernetes is an open-source orchestration engine, OpenShift is a downstream, enterprise-grade distribution of Kubernetes by Red Hat.

  • The “Plus” Factor: OpenShift includes everything in Kubernetes but adds a built-in container registry, integrated CI/CD pipelines (Tekton), a developer-friendly web console, and enhanced security defaults.
  • Security: By default, OpenShift forbids containers from running as root, whereas vanilla Kubernetes is “open” by default.

Q2: What is an OpenShift “Project” vs. a Kubernetes “Namespace”?

Answer: A Project is simply an abstraction on top of a Kubernetes Namespace.

  • It adds metadata and facilitates Self-Service: users can request projects via the CLI (oc new-project) or Web Console.
  • It automatically applies default Resource Quotas and Limit Ranges to the namespace to prevent a single user from crashing the cluster.

Q3: Explain the role of the Router (HAProxy) in OpenShift.

Answer: In vanilla Kubernetes, you typically install an Ingress Controller (like NGINX). In OpenShift, the Router (based on HAProxy) is a core component. It provides the external entry point for traffic, mapping an external URL (a Route) to an internal Service.


2. Developer & Build Workflow

Q4: What is Source-to-Image (S2I) and why is it used?

Answer: S2I is a toolkit that allows developers to provide only their source code (via a Git URL). OpenShift then:

  1. Detects the language (Java, Python, Node, etc.).
  2. Injects the code into a “Builder Image.”
  3. Assembles the final application image.Benefit: Developers don’t need to know how to write a Dockerfile or manage base images, ensuring consistent security patches at the base layer.

Q5: What is a BuildConfig?

Answer: A BuildConfig is the definition of the entire build process. It specifies:

  • Source: Where the code is (Git).
  • Strategy: How to build it (S2I, Docker, or Pipeline).
  • Output: Where to push the resulting image (internal registry).
  • Triggers: Events that start a build (e.g., a code commit or an update to the base image).

3. Security & Operations

Q6: What are Security Context Constraints (SCCs)?

Answer: SCCs are one of the most important security features in OpenShift. They control what actions a pod can perform.

  • Restricted SCC: The default. It prevents pods from running as root and limits access to the host filesystem.
  • Anyuid SCC: Often used when migrating legacy Docker images that must run as a specific user.
  • Privileged SCC: Full access (usually reserved for infra components like logging or monitoring).

Q7: How does OpenShift handle Persistent Storage?

Answer: OpenShift uses the Persistent Volume (PV) and Persistent Volume Claim (PVC) model.

  • An administrator provisions PVs (storage chunks).
  • A developer requests storage via a PVC.
  • OpenShift uses Storage Classes to dynamically provision storage on the fly (e.g., on AWS EBS or VMware vSphere) when a PVC is created.

4. Scenario-Based “Pro” Question

Q8: “A pod is failing with a CrashLoopBackOff. How do you troubleshoot it in OpenShift?”

Answer: Walk through these 4 steps to show you have hands-on experience:

  1. Check Status: oc get pods to see the status.
  2. Examine Logs: oc logs <pod_name> (use --previous if the container already restarted).
  3. Inspect Events: oc describe pod <pod_name> to look for failed mounts, scheduling issues, or “Back-off” events.
  4. Debug Session: Use oc debug pod/<pod_name> to launch a terminal inside a clone of the failing pod to inspect the filesystem and environment variables.

5. Rapid-Fire Command Cheat Sheet

TaskCommand
Loginoc login <api-url>
Create Appoc new-app https://github.com/user/repo
Scale Appoc scale --replicas=3 dc/my-app
Expose Serviceoc expose svc/my-service
View Resourcesoc get all
Check SCCsoc get scc

For the Administrator track, the interview will shift away from “how to deploy an app” toward Cluster Health, Lifecycle Management, and Infrastructure Stability.

In OpenShift 4.x (the modern standard), the “Operator-focused” architecture is the star of the show. Here are the heavy-hitting admin questions you should be ready for.


1. The Operator Framework

Q1: What is the “Operator Pattern” and why is it central to OpenShift 4?

Answer: In OpenShift 4, the entire cluster is managed by Operators. An Operator is a custom controller that encodes human operational knowledge into software.

  • The Loop: It constantly monitors the Actual State of a component (like the Internal Registry or Monitoring stack) and compares it to the Desired State. If they differ, the Operator automatically fixes it.
  • Cluster Version Operator (CVO): This is the “Master Operator” that manages the updates of the cluster itself, ensuring all core components are at the correct version.

Q2: How do you perform a Cluster Upgrade in OpenShift 4?

Answer: Upgrades are managed via the Cluster Version Operator (CVO).

  • Process: You typically update the “Channel” (e.g., stable-4.14) and then trigger the upgrade via the console or oc adm upgrade.
  • Mechanism: The CVO orchestrates the update of every operator in the cluster. The Machine Config Operator (MCO) handles the rolling reboot of nodes to update the underlying Red Hat Enterprise Linux CoreOS (RHCOS).

2. Infrastructure & Nodes

Q3: What is the Machine Config Operator (MCO)?

Answer: The MCO is one of the most important components for an admin. It treats the underlying nodes like “cattle, not pets.”

  • It manages the operating system (RHCOS) itself.
  • If you need to change a kernel parameter, add a SSH key, or change a NTP setting across 50 nodes, you create a MachineConfig object. The MCO then applies that change and reboots nodes in a rolling fashion to ensure zero downtime.

Q4: Explain the difference between IPI and UPI installation.

Answer: * IPI (Installer-Provisioned Infrastructure): Full automation. The OpenShift installer has credentials to your cloud (AWS, Azure, etc.) and creates the VMs, VPCs, and Load Balancers for you.

  • UPI (User-Provisioned Infrastructure): The admin manually creates the infrastructure (VMs, networking, storage). You then run the installer to “bootstrap” OpenShift onto those pre-existing resources. (Common in highly regulated or air-gapped environments).

3. Storage & Networking

Q5: How do you troubleshoot a Node that is in “NotReady” status?

Answer: I follow a systematic checklist:

  1. Check Node Details: oc describe node <node_name> to look at the “Conditions” section (e.g., MemoryPressure, DiskPressure, or NetworkUnavailable).
  2. Verify Kubelet: SSH into the node (or use oc debug node) and check the kubelet logs: journalctl -u kubelet.
  3. Resource Usage: Check if the node has run out of PIDs or Disk space.
  4. CSRs: If the node was recently added/rebooted, check if there are pending Certificate Signing Requests: oc get csr and approve them if necessary.

Q6: What is the “In-tree” to CSI migration?

Answer: Older versions of OpenShift used storage drivers built directly into the Kubernetes binary (“In-tree”). Modern OpenShift is moving to CSI (Container Storage Interface) drivers. As an admin, this means storage is now handled by standalone operators, allowing for easier updates without upgrading the whole cluster.


4. Security & Etcd

Q7: Why is the etcd backup critical, and how do you perform it?

Answer: etcd is the “brain” of the cluster; it stores every configuration and state. If etcd is lost, the cluster is dead.

  • Backup: You use the cluster-etcd-operator. I would run a specific debug script provided by Red Hat: oc debug node/<master-node> -- /usr/local/bin/cluster-backup.sh /home/core/assets/backup.
  • Strategy: Always take a backup before a cluster upgrade.

5. Monitoring & Logging

Q8: What stack does OpenShift use for Cluster Monitoring?

Answer: OpenShift comes with a pre-configured Prometheus, Grafana, and Alertmanager stack (managed by the Monitoring Operator).

  • Note: Admins use this to monitor cluster health (CPU/Mem of nodes).
  • User Workload Monitoring: In newer versions, admins can enable “User Workload Monitoring” to allow developers to use the same Prometheus stack for their own applications without interfering with the cluster’s core monitoring.

Summary Checklist for your Interview

[!TIP]

If they ask about a problem you can’t solve: Always mention “Looking at the Operators.” In OpenShift 4, if something is broken, check

oc get clusteroperators.

If one is DEGRADED=True, that is your smoking gun.

OCP can run on different environment, such as on-premise (VMware/Bare Metal) or a managed service (ROSA/ARO)

Linux network

Linux network issues can come from a lot of places—misconfiguration, hardware problems, DNS failures, firewall rules, or even upstream network outages. The trick is to diagnose step-by-step instead of guessing. Here’s a practical workflow you can follow.


1. Check Basic Connectivity

Start simple—make sure the system is even connected.

ip a
  • Look for an IP address (not just 127.0.0.1)
  • Check if the interface is UP
ip link
  • Verify your interface (e.g., eth0, wlan0) is not DOWN

If it’s down:

sudo ip link set eth0 up

2. Test Network Reachability

Try pinging known addresses:

ping -c 4 8.8.8.8
  • If this fails → no internet or routing issue

Then test DNS:

ping -c 4 google.com
  • If IP ping works but domain fails → DNS problem

3. Check Routing Table

ip route

Look for a default route:

default via 192.168.1.1 dev eth0

If missing, add it:

sudo ip route add default via <gateway-ip>

4. Verify DNS Configuration

Check:

cat /etc/resolv.conf

You should see something like:

nameserver 8.8.8.8

If not, temporarily add:

sudo nano /etc/resolv.conf

Or if using systemd:

systemd-resolve --status

5. Check Network Manager / Services

Depending on your system:

Using NetworkManager

nmcli device status
nmcli connection show

Restart:

sudo systemctl restart NetworkManager

Using systemd-networkd

networkctl status
sudo systemctl restart systemd-networkd

6. Inspect Firewall Rules

Firewall can silently block traffic.

sudo iptables -L -n

Or if using UFW:

sudo ufw status

Temporarily disable (for testing):

sudo ufw disable

7. Check Open Ports

ss -tulnp

This shows:

  • Listening services
  • Ports in use

8. Test with traceroute

See where packets stop:

traceroute 8.8.8.8

If it stops early → local network issue
If it goes far but fails → ISP or remote issue


9. Check Logs

Logs often reveal the real problem:

journalctl -u NetworkManager
journalctl -xe
dmesg | grep -i network

10. Wireless-Specific Checks

If using Wi-Fi:

iwconfig

Scan networks:

iw dev wlan0 scan

Reconnect:

nmcli device wifi connect "SSID" password "PASSWORD"

11. Hardware / Driver Issues

Check drivers:

lspci | grep -i network
lsmod | grep <driver>

If missing drivers, you may need to install firmware packages.


Common Problems & Quick Fixes

  • No IP address → DHCP issue → restart network service
  • Can ping IP but not domain → DNS misconfigured
  • Interface down → enable with ip link set
  • Connected but no internet → gateway missing
  • Wi-Fi connected but slow → interference or driver issue

On Ubuntu, networking is usually managed by NetworkManager, so troubleshooting is a bit more standardized. Let’s narrow this down quickly with a focused Ubuntu workflow.


1. Check if NetworkManager is running

sudo systemctl status NetworkManager

If it’s not active:

sudo systemctl restart NetworkManager

2. See connection status

nmcli device status

You’ll see something like:

  • connected → good
  • disconnected → problem
  • unmanaged → config issue

To reconnect:

nmcli connection up <connection-name>

3. Check IP address

ip a

Look for something like:

inet 192.168.x.x

❌ No IP? Try renewing DHCP:

sudo dhclient -v

4. Test connectivity

ping -c 4 8.8.8.8
  • ❌ fails → network/gateway issue
  • ✅ works → move to DNS test:
ping -c 4 google.com
  • ❌ fails → DNS problem

5. Check default gateway

ip route

You should see:

default via 192.168.x.1

If missing:

sudo ip route add default via 192.168.1.1

6. Fix DNS (very common on Ubuntu)

Check:

cat /etc/resolv.conf

If it’s broken or empty, try:

sudo nano /etc/systemd/resolved.conf

Set:

DNS=8.8.8.8 1.1.1.1

Then restart:

sudo systemctl restart systemd-resolved

7. Wi-Fi specific (if applicable)

Check:

nmcli device wifi list

Reconnect:

nmcli device wifi connect "SSID" password "PASSWORD"

If Wi-Fi adapter missing:

lspci | grep -i network

8. Firewall check

Ubuntu often uses UFW:

sudo ufw status

Temporarily disable:

sudo ufw disable

9. Look at logs (very useful)

journalctl -u NetworkManager --no-pager | tail -50

Fast “reset everything” fix (works surprisingly often)

sudo systemctl restart NetworkManager
sudo systemctl restart systemd-resolved
sudo dhclient

Docker Image Optimization: Best Practices & Tips

Here’s the practical best-practice checklist for building Docker images today:

Use a small, trusted base image and pin versions instead of relying on latest. Docker recommends choosing the right base image, keeping it small, and pinning base image versions for better security and repeatability. (Docker Documentation)

Use multi-stage builds so build tools never end up in the final runtime image. This is one of Docker’s main recommendations for producing smaller, cleaner, more secure images. (Docker Documentation)

Keep the build context small with a .dockerignore file. Excluding node_modules, .git, test artifacts, local env files, and temp files speeds builds and reduces accidental leakage into the image. Docker explicitly recommends using .dockerignore. (Docker Documentation)

Design your Dockerfile to maximize cache reuse. Copy dependency files first, install dependencies, then copy the rest of the app. Since Docker images are layer-based, ordering instructions well can make rebuilds much faster. (Docker Documentation)

Do not install unnecessary packages. Keep the image focused on one service, and remove build-only tools from the final stage. Docker also recommends creating ephemeral containers and decoupling applications where possible. (Docker Documentation)

Run the app as a non-root user whenever possible. Docker’s learning materials call out that a production-ready Dockerfile should improve security by running as non-root. (Docker Documentation)

Rebuild images regularly and use fresh base layers, especially for security patches. Docker recommends rebuilding often and using flags like --pull and, when needed, --no-cache for clean rebuilds. Also build and test images in CI. (Docker Documentation)

A solid production pattern looks like this:

# syntax=docker/dockerfile:1
FROM node:22-alpine AS deps
WORKDIR /app
COPY package*.json ./
RUN npm ci --omit=dev
FROM node:22-alpine AS build
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build
FROM node:22-alpine
WORKDIR /app
ENV NODE_ENV=production
COPY --from=deps /app/node_modules ./node_modules
COPY --from=build /app/dist ./dist
COPY package*.json ./
USER node
EXPOSE 3000
CMD ["node", "dist/server.js"]

A matching .dockerignore should usually include:

node_modules
npm-debug.log
.git
.gitignore
Dockerfile*
docker-compose*
.env
coverage
dist
tmp

For most teams, the simplest rule set is:

  1. Small pinned base image
  2. Multi-stage build
  3. .dockerignore
  4. Cache-friendly Dockerfile order
  5. Non-root runtime
  6. Rebuild in CI and scan often (Docker Documentation)

Absolutely — here’s a production-ready Docker image pattern you can reuse for most apps.

Good Dockerfile pattern

# syntax=docker/dockerfile:1
# 1) Install dependencies in a separate stage
FROM node:22-alpine AS deps
WORKDIR /app
COPY package*.json ./
RUN npm ci
# 2) Build the app
FROM node:22-alpine AS build
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build
# 3) Runtime image
FROM node:22-alpine AS runtime
WORKDIR /app
ENV NODE_ENV=production
# Create/use non-root runtime
USER node
# Copy only what is needed at runtime
COPY --chown=node:node --from=deps /app/node_modules ./node_modules
COPY --chown=node:node --from=build /app/dist ./dist
COPY --chown=node:node package*.json ./
EXPOSE 3000
CMD ["node", "dist/server.js"]

Matching .dockerignore

node_modules
npm-debug.log
.git
.gitignore
Dockerfile*
docker-compose*
.env
.env.*
coverage
dist
tmp
.vscode
.idea

Why this is a strong default

Docker’s current guidance recommends:

  • multi-stage builds to keep the final image smaller and cleaner (Docker Documentation)
  • using a .dockerignore file to keep the build context small and avoid sending unnecessary files to the builder (Docker Documentation)
  • structuring the Dockerfile for better cache reuse, like copying dependency manifests before app source (Docker Documentation)
  • running the app as a non-root user in production images (Docker Documentation)
  • avoiding secrets in ARG or ENV; Docker recommends using secret mounts instead because build args and env vars can be exposed in image metadata or the final image (Docker Documentation)

Even better build command

docker build --pull -t myapp:latest .

--pull helps refresh the base image layers so you don’t keep building on stale images, which aligns with Docker’s recommendation to rebuild often and keep base layers fresh. (Docker Documentation)

7 rules to follow every time

  1. Pin the base image
FROM node:22.14-alpine
  1. Do not use latest in production
  2. Copy dependency files first
COPY package*.json ./
RUN npm ci
COPY . .
  1. Only copy runtime artifacts into the final stage
  2. Run as non-root
  3. Keep secrets out of the Dockerfile
  4. Keep one main responsibility per container when possible (Docker Documentation)

Common mistakes

Bad:

COPY . .
RUN npm install

Better:

COPY package*.json ./
RUN npm ci
COPY . .

Bad:

FROM node:latest

Better:

FROM node:22-alpine

Bad:

ENV API_KEY=secret123

Better: pass secrets at runtime or use Docker build secrets. (Docker Documentation)

If your app does not need Node at runtime

For frontend apps like React/Vite/Angular/Vue, it is often better to build in Node and serve with Nginx in the final stage, which Docker’s current framework guides also demonstrate for modern frontend apps. (Docker Documentation)

Best-practice summary

Use:

  • small pinned base image
  • multi-stage build
  • .dockerignore
  • cache-friendly layer order
  • non-root runtime
  • no secrets in ARG or ENV
  • regular rebuilds with fresh base layers (Docker Documentation)

Kong – full mini project folder

Here’s a full mini project folder for Kong that you can copy as-is.

It uses Kong Gateway in DB-less mode, so all config lives in one declarative kong.yml file. That mode is a good fit for CI/CD and Git-managed config, but the Admin API is effectively read-only for config changes in this setup. (Kong Docs)

Folder structure

kong-mini-project/
├── app/
│ ├── package.json
│ └── server.js
├── kong/
│ └── kong.yml
├── .dockerignore
├── Dockerfile
└── compose.yml

1) app/package.json

{
"name": "kong-mini-project",
"version": "1.0.0",
"description": "Node app behind Kong Gateway",
"main": "server.js",
"scripts": {
"start": "node server.js"
},
"license": "MIT"
}

2) app/server.js

const http = require("http");
const PORT = process.env.PORT || 3000;
const server = http.createServer((req, res) => {
if (req.url === "/healthz") {
res.writeHead(200, { "Content-Type": "application/json" });
return res.end(JSON.stringify({ ok: true }));
}
const body = {
ok: true,
message: "Hello from app behind Kong",
method: req.method,
url: req.url,
host: req.headers.host,
time: new Date().toISOString()
};
res.writeHead(200, { "Content-Type": "application/json" });
res.end(JSON.stringify(body, null, 2));
});
server.listen(PORT, () => {
console.log(`Server listening on ${PORT}`);
});

3) Dockerfile

FROM node:20-alpine
WORKDIR /app
COPY app/package.json ./
RUN npm install --omit=dev
COPY app/server.js ./
ENV PORT=3000
EXPOSE 3000
CMD ["npm", "start"]

4) .dockerignore

node_modules
npm-debug.log
.git
.github

5) kong/kong.yml

This is the heart of the project. It defines:

  • one upstream Service
  • one public Route
  • a key-auth plugin
  • a rate-limiting plugin
  • one Consumer with an API key

Kong’s declarative config format supports entities like Services, Routes, Consumers, and Plugins in DB-less mode. The Key Auth plugin can require API keys, and the Rate Limiting plugin can throttle requests by time window such as per minute. When authentication is present, rate limiting uses the authenticated Consumer identity. (Kong Docs)

_format_version: "3.0"
services:
- name: app-service
url: http://app:3000
routes:
- name: app-route
paths:
- /api
protocols:
- http
- https
plugins:
- name: key-auth
service: app-service
config:
key_names:
- apikey
- name: rate-limiting
service: app-service
config:
minute: 5
policy: local
consumers:
- username: demo-client
keyauth_credentials:
- key: super-secret-demo-key

A note on policy: local: that works well for a single local node, but Kong notes that plugins needing shared database state do not fully function in DB-less mode, so this is best for learning or single-node setups rather than clustered distributed quotas. (Kong Docs)

6) compose.yml

Kong’s Docker docs support running Kong with Docker Compose, and the read-only Docker Compose guide for DB-less mode uses KONG_DATABASE=off plus KONG_DECLARATIVE_CONFIG pointing to the config file. (Kong Docs)

services:
kong:
image: kong:3.10
environment:
KONG_DATABASE: "off"
KONG_DECLARATIVE_CONFIG: /kong/declarative/kong.yml
KONG_PROXY_ACCESS_LOG: /dev/stdout
KONG_ADMIN_ACCESS_LOG: /dev/stdout
KONG_PROXY_ERROR_LOG: /dev/stderr
KONG_ADMIN_ERROR_LOG: /dev/stderr
KONG_ADMIN_LISTEN: 0.0.0.0:8001
ports:
- "8000:8000" # public proxy
- "8001:8001" # admin api (read-only for config in DB-less mode)
volumes:
- ./kong/kong.yml:/kong/declarative/kong.yml:ro
app:
build:
context: .
dockerfile: Dockerfile

7) Run it

docker compose up -d --build

Then test it.

Without an API key, access should fail because the route is protected by the Key Auth plugin. (Kong Docs)

curl -i http://localhost:8000/api

With the API key in a header, it should succeed. Kong’s Key Auth plugin supports reading keys from headers, query parameters, or request body, depending on config. (Kong Docs)

curl -i \
-H "apikey: super-secret-demo-key" \
http://localhost:8000/api

You can also use a query string:

curl -i "http://localhost:8000/api?apikey=super-secret-demo-key"

8) Test rate limiting

The plugin is set to 5 requests per minute, so the sixth quick request should return 429. Kong’s rate-limiting plugin supports time windows including seconds, minutes, hours, days, months, and years. (Kong Docs)

for i in {1..6}; do
curl -s -o /dev/null -w "%{http_code}\n" \
-H "apikey: super-secret-demo-key" \
http://localhost:8000/api
done

9) Useful checks

See running containers:

docker compose ps

Follow Kong logs:

docker compose logs -f kong

Follow app logs:

docker compose logs -f app

Read the service list from the Admin API:

curl http://localhost:8001/services

In DB-less mode, that Admin API is useful for inspection, but Kong’s docs say you cannot use it for normal write-based configuration management because the declarative file is the source of truth. (Kong Docs)

10) What makes this different from Traefik

With Traefik, the main workflow was “discover containers and route traffic to them.” With Kong, the model is “define Services and Routes, then attach policy plugins like auth and rate limiting.” Kong’s docs emphasize entities such as Services, Routes, Consumers, Upstreams, and Plugins as the core gateway model. (Kong Docs)

So in practice:

  • Traefik is great for app routing and reverse proxying.
  • Kong is better when you want API-specific control like identity, quotas, and policy.

11) Resume line

Built a containerized API behind Kong Gateway in DB-less mode using declarative configuration, API key authentication, and per-consumer rate limiting.

12) Best next upgrade

The strongest next step is to add JWT auth or request transformation, because those show off Kong as an API gateway rather than just a reverse proxy. Kong’s plugin ecosystem is one of its main strengths. (Kong Docs)

KONG

Kong (often called Kong API Gateway) is a tool that sits in front of your APIs and manages all incoming requests—kind of like a smart gatekeeper for APIs.


Simple explanation

Instead of clients calling your backend services directly, they go through Kong first:

Client → Kong → Your APIs

Kong decides:

  • where the request goes
  • whether it’s allowed
  • how it should be handled

🔧 What Kong actually does

1. Routing (like Traefik, but API-focused)

Image
Image
  • Routes requests to the correct backend service
  • Supports paths, hosts, headers, etc.

Example:

/users → user-service
/orders → order-service

2. Authentication & Security

  • API keys
  • OAuth2 / JWT
  • Rate limiting (prevent abuse)

3. Plugins (this is Kong’s superpower)

Kong uses plugins to add features like:

  • logging
  • caching
  • transformations
  • analytics

4. Load balancing

  • Distributes traffic across multiple service instances

5. Observability

  • Logs requests
  • Tracks usage
  • Helps debug API issues

Kong vs Traefik

FeatureKongTraefik
FocusAPIsGeneral web traffic
PluginsVery powerfulMore limited
AuthBuilt-in strongBasic
Use caseMicroservices APIsContainers & routing

Quick takeaway:

  • Traefik → routing + infrastructure
  • Kong → API management + security

Where Kong fits in a system

Frontend / Mobile App
Kong
Microservices (Node, Python, etc.)
Database

Example use case

Imagine you’re building an app with:

  • user service
  • payment service
  • order service

Kong can:

  • route requests to each service
  • require authentication
  • limit requests per user
  • log all API calls

In DevOps terms

Kong is part of:

  • API Gateway layer
  • Often used with:
    • Kubernetes
    • Docker

In one sentence

Kong is an API gateway that controls, secures, and manages traffic to your backend services.


Here’s a working Kong Docker example you can compare directly with Traefik.

The cleanest starter setup is Kong Gateway in DB-less mode. In this mode, Kong runs without a database and reads its routes/services/plugins from a single declarative YAML file, which Kong documents as a supported deployment mode and a good fit for automation and CI/CD. (Kong Docs)

What you’ll build

Client → Kong → Your app

Kong will:

  • listen on port 8000 for proxied API traffic
  • expose an Admin API on port 8001 for local management/testing
  • route /api to your Node app
  • optionally apply plugins like rate limiting or key auth later

Kong’s Docker docs show Compose-based installs, and Kong’s gateway overview describes it as sitting in front of upstream services to control, analyze, and route requests. (Kong Docs)


Project structure

kong-starter/
├── app/
│ ├── package.json
│ └── server.js
├── kong/
│ └── kong.yml
├── Dockerfile
└── compose.yml

1) app/package.json

{
  "name": "kong-starter",
  "version": "1.0.0",
  "main": "server.js",
  "scripts": {
    "start": "node server.js"
  }
}



2) app/server.js

const http = require("http");
const PORT = process.env.PORT || 3000;
const server = http.createServer((req, res) => {
const body = {
ok: true,
message: "Hello from app behind Kong",
method: req.method,
url: req.url,
host: req.headers.host,
time: new Date().toISOString()
};
res.writeHead(200, { "Content-Type": "application/json" });
res.end(JSON.stringify(body, null, 2));
});
server.listen(PORT, () => {
console.log(`Server listening on ${PORT}`);
});

3) Dockerfile

FROM node:20-alpine
WORKDIR /app
COPY app/package.json ./
RUN npm install --omit=dev
COPY app/server.js ./
ENV PORT=3000
EXPOSE 3000
CMD ["npm", "start"]

4) kong/kong.yml

This is the declarative config Kong loads in DB-less mode.

_format_version: "3.0"
services:
- name: app-service
url: http://app:3000
routes:
- name: app-route
paths:
- /api

This tells Kong:

  • there is an upstream service at http://app:3000
  • requests hitting /api should be proxied there

Kong’s DB-less docs explain that entities are configured through a declarative YAML or JSON file when database=off. (Kong Docs)


5) compose.yml

services:
kong:
image: kong:3.10
environment:
KONG_DATABASE: "off"
KONG_DECLARATIVE_CONFIG: /kong/declarative/kong.yml
KONG_PROXY_ACCESS_LOG: /dev/stdout
KONG_ADMIN_ACCESS_LOG: /dev/stdout
KONG_PROXY_ERROR_LOG: /dev/stderr
KONG_ADMIN_ERROR_LOG: /dev/stderr
KONG_ADMIN_LISTEN: 0.0.0.0:8001
ports:
- "8000:8000" # proxy
- "8001:8001" # admin api
volumes:
- ./kong/kong.yml:/kong/declarative/kong.yml:ro
app:
build:
context: .
dockerfile: Dockerfile

Kong’s Docker install docs support Docker Compose installs, and Kong’s read-only/DB-less docs show using database=off with a declarative config file passed into the container. (Kong Docs)


6) Run it

docker compose up -d --build

Then test it:

curl http://localhost:8000/api

You should get JSON back from your Node app.

You can also inspect Kong locally through the Admin API:

curl http://localhost:8001/services

One important note: in DB-less mode, Kong documents that you cannot use the Admin API to write configuration the normal way, because config comes from the declarative file instead. (Kong Docs)


7) Add rate limiting

One of Kong’s main strengths is plugins. Kong’s overview emphasizes its plugin-based approach for implementing API traffic policies. (Kong Docs)

Update kong/kong.yml like this:

_format_version: "3.0"
services:
- name: app-service
url: http://app:3000
routes:
- name: app-route
paths:
- /api
plugins:
- name: rate-limiting
config:
minute: 5
policy: local

Then reload the stack:

docker compose up -d

Now Kong will rate-limit requests through the gateway.


8) Kong vs Traefik in this exact setup

Traefik version

You used labels on the app container:

- "traefik.http.routers.app.rule=Host(`app.localhost`)"

Traefik discovers Docker containers automatically and builds routing from labels. That is the core of its Docker provider model.

Kong version

You define a service and route in kong.yml:

services:
- name: app-service
url: http://app:3000
routes:
- paths:
- /api

So the practical difference is:

  • Traefik feels more infrastructure-native and auto-discovery-driven
  • Kong feels more API-platform-driven, with explicit services, routes, and plugins

Kong’s docs center services, routes, plugins, and deployment modes as the main model for managing API traffic. (Kong Docs)


9) When to use which

Use Traefik when you want:

  • simple reverse proxying
  • automatic Docker/Kubernetes discovery
  • quick app routing
  • built-in HTTPS for web apps

Use Kong when you want:

  • API gateway features
  • auth, rate limiting, transformations, analytics
  • a plugin-heavy API management layer
  • more explicit API governance

That’s an inference from how each product is documented: Traefik emphasizes reverse proxying and dynamic service discovery, while Kong emphasizes API traffic policies through plugins and gateway entities. (Kong Docs)


10) The easiest mental model

  • Traefik = “send traffic to my containers”
  • Kong = “manage and secure my APIs”

11) Resume-worthy project line

Built a containerized API service behind Kong Gateway in DB-less mode using declarative configuration for routing and traffic policy management.


Here’s the same Kong project, but now with API key auth + rate limiting — which is where Kong starts to feel very different from Traefik.

Kong’s Key Authentication plugin can require clients to send an API key in a header, query string, or request body, and Kong’s Rate Limiting plugin can throttle requests by time window. In DB-less mode, you define all of that declaratively in the config file Kong loads at startup. (Kong Docs)

What this version does

Requests to your app will:

  • go through Kong on http://localhost:8000
  • require an API key
  • be limited to 5 requests per minute
  • route to your Node app on /api

In Kong’s rate-limiting docs, if there is an auth layer, the plugin uses the authenticated Consumer for identifying clients; otherwise it falls back to client IP. (Kong Docs)

Updated kong/kong.yml

_format_version: "3.0"
services:
- name: app-service
url: http://app:3000
routes:
- name: app-route
paths:
- /api
plugins:
- name: key-auth
service: app-service
config:
key_names:
- apikey
- name: rate-limiting
service: app-service
config:
minute: 5
policy: local
consumers:
- username: demo-client
keyauth_credentials:
- key: super-secret-demo-key

Why this works:

  • key-auth protects the service with API key authentication. (Kong Docs)
  • key_names: [apikey] tells Kong to look for the API key under that name. Kong documents that keys can be supplied in headers, query params, or request body. (Kong Docs)
  • rate-limiting enforces request quotas over periods like seconds, minutes, hours, and more. (Kong Docs)
  • policy: local stores counters in-memory on the node; Kong notes this has minimal performance impact but is less accurate across multiple nodes. (Kong Docs)
  • consumers plus keyauth_credentials gives the client an identity and an API key in DB-less declarative config. That fits Kong’s DB-less model where config is the source of truth. (Kong Docs)

compose.yml

You can keep the same Compose file structure as before:

services:
kong:
image: kong:3.10
environment:
KONG_DATABASE: "off"
KONG_DECLARATIVE_CONFIG: /kong/declarative/kong.yml
KONG_PROXY_ACCESS_LOG: /dev/stdout
KONG_ADMIN_ACCESS_LOG: /dev/stdout
KONG_PROXY_ERROR_LOG: /dev/stderr
KONG_ADMIN_ERROR_LOG: /dev/stderr
KONG_ADMIN_LISTEN: 0.0.0.0:8001
ports:
- "8000:8000"
- "8001:8001"
volumes:
- ./kong/kong.yml:/kong/declarative/kong.yml:ro
app:
build:
context: .
dockerfile: Dockerfile

Kong’s Docker install docs support Compose installs, and DB-less deployments use KONG_DATABASE=off plus a declarative config file path. (Kong Docs)

Start it

docker compose up -d --build

Test without an API key

curl -i http://localhost:8000/api

This should fail because the route is protected by key-auth. Kong’s Key Auth plugin requires a valid key for access. (Kong Docs)

Test with the API key

Send the key in the apikey header:

curl -i \
-H "apikey: super-secret-demo-key" \
http://localhost:8000/api

That should succeed.

You can also pass the key as a query string because Kong’s Key Auth plugin supports query string auth too. (Kong Docs)

curl -i "http://localhost:8000/api?apikey=super-secret-demo-key"

Test the rate limit

Run this several times quickly:

for i in {1..6}; do
curl -s -o /dev/null -w "%{http_code}\n" \
-H "apikey: super-secret-demo-key" \
http://localhost:8000/api
done

You should see the first few succeed and then a 429 once you exceed the per-minute limit. Kong’s rate-limiting plugin is designed to cap requests over configured windows like minute: 5. (Kong Docs)

Why this is more “API gateway” than reverse proxy

With Traefik, the main idea was: “route traffic to the right service.” With this Kong setup, the gateway is also enforcing who can call the API and how often they can call it. Kong’s docs frame plugins like Key Auth and Rate Limiting as first-class traffic policy features for services and routes. (Kong Docs)

A practical mental model

  • Traefik: “Send requests to the right app.”
  • Kong: “Control access to the API, then send requests to the app.”

That is an inference from their documented feature emphasis: Traefik centers dynamic routing and service discovery, while Kong centers API traffic policy through gateway entities and plugins. (Kong Docs)

Good next upgrades

The next Kong features that are most worth learning are:

  • JWT auth
  • request/response transformation
  • ACLs by consumer group
  • logging plugins
  • declarative config managed from Git

Those all build naturally on Kong’s plugin model and DB-less configuration workflow. (Kong Docs)

to build a project (code + config) – production ready

Here’s the production version of the starter project: real domain, automatic HTTPS, HTTP→HTTPS redirect, and a secured Traefik dashboard.

This uses Traefik’s Docker provider with labels for routing, a Let’s Encrypt certificate resolver for TLS, and the dashboard in secure mode rather than api.insecure=true. Traefik’s docs recommend securing the dashboard and show Docker Compose setups for HTTPS with ACME. (Traefik Docs)

Before you start

You need:

  • a Linux server with Docker and Docker Compose
  • a domain or subdomain pointing to that server
  • ports 80 and 443 open to the internet

For the HTTP-01 challenge, Traefik’s ACME guide requires the app to be reachable publicly and the domain to point to the Traefik instance. (Traefik Docs)


Recommended structure

devops-starter/
├── app/
│ ├── package.json
│ └── server.js
├── letsencrypt/
│ └── acme.json
├── .github/
│ └── workflows/
│ └── publish.yml
├── .env
├── Dockerfile
└── compose.yml

1) app/package.json

{
  "name": "devops-starter",
  "version": "1.0.0",
  "description": "Node app behind Traefik with HTTPS",
  "main": "server.js",
  "scripts": {
    "start": "node server.js"
  },
  "license": "MIT"
}

2) app/server.js

const http = require("http");
const PORT = process.env.PORT || 3000;
const server = http.createServer((req, res) => {
if (req.url === "/healthz") {
res.writeHead(200, { "Content-Type": "application/json" });
return res.end(JSON.stringify({ ok: true }));
}
const body = {
ok: true,
message: "Hello from production",
method: req.method,
url: req.url,
hostname: req.headers.host,
time: new Date().toISOString()
};
res.writeHead(200, { "Content-Type": "application/json" });
res.end(JSON.stringify(body, null, 2));
});
server.listen(PORT, () => {
console.log(`Server running on port ${PORT}`);
});

3) Dockerfile

FROM node:20-alpine

WORKDIR /app

COPY app/package.json ./
RUN npm install --omit=dev

COPY app/server.js ./

ENV PORT=3000
EXPOSE 3000

CMD ["npm", "start"]


4) .env

Replace these with your real values:

DOMAIN=app.yourdomain.com
TRAEFIK_DASHBOARD_HOST=traefik.yourdomain.com
LETSENCRYPT_EMAIL=you@example.com

# Generate this with: htpasswd -nb admin 'your-strong-password'
# Then double the $ signs when putting it here for docker labels
TRAEFIK_BASIC_AUTH=admin:$$apr1$$replace$$with-real-hash

Traefik’s BasicAuth middleware supports htpasswd-style hashes, and its docs note that when using Docker labels, dollar signs need escaping. (Traefik Docs)


5) Create the certificate storage file

Run this once on the server:

mkdir -p letsencrypt
touch letsencrypt/acme.json
chmod 600 letsencrypt/acme.json

Traefik’s ACME examples store certificates in acme.json, and the file should be writable by Traefik while remaining protected. (Traefik Docs)


6) compose.yml

services:
  traefik:
    image: traefik:v3.4
    restart: unless-stopped
    command:
      - "--api.dashboard=true"
      - "--providers.docker=true"
      - "--providers.docker.exposedbydefault=false"

      - "--entrypoints.web.address=:80"
      - "--entrypoints.websecure.address=:443"

      - "--entrypoints.web.http.redirections.entrypoint.to=websecure"
      - "--entrypoints.web.http.redirections.entrypoint.scheme=https"

      - "--certificatesresolvers.le.acme.email=${LETSENCRYPT_EMAIL}"
      - "--certificatesresolvers.le.acme.storage=/letsencrypt/acme.json"
      - "--certificatesresolvers.le.acme.httpchallenge=true"
      - "--certificatesresolvers.le.acme.httpchallenge.entrypoint=web"

      - "--accesslog=true"
      - "--log.level=INFO"

    ports:
      - "80:80"
      - "443:443"

    volumes:
      - "/var/run/docker.sock:/var/run/docker.sock:ro"
      - "./letsencrypt:/letsencrypt"

    labels:
      - "traefik.enable=true"

      # Secure dashboard
      - "traefik.http.routers.dashboard.rule=Host(`${TRAEFIK_DASHBOARD_HOST}`)"
      - "traefik.http.routers.dashboard.entrypoints=websecure"
      - "traefik.http.routers.dashboard.tls=true"
      - "traefik.http.routers.dashboard.tls.certresolver=le"
      - "traefik.http.routers.dashboard.service=api@internal"
      - "traefik.http.routers.dashboard.middlewares=dashboard-auth"
      - "traefik.http.middlewares.dashboard-auth.basicauth.users=${TRAEFIK_BASIC_AUTH}"

  app:
    build:
      context: .
      dockerfile: Dockerfile
    restart: unless-stopped
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.app.rule=Host(`${DOMAIN}`)"
      - "traefik.http.routers.app.entrypoints=websecure"
      - "traefik.http.routers.app.tls=true"
      - "traefik.http.routers.app.tls.certresolver=le"

      # Tell Traefik which internal port the app listens on
      - "traefik.http.services.app.loadbalancer.server.port=3000"

Why these labels and flags matter:

  • Traefik uses Docker labels as dynamic config when Docker is the provider. (Traefik Docs)
  • entrypoints.web and entrypoints.websecure define listeners on ports 80 and 443. (Traefik Docs)
  • The web entrypoint redirects all traffic to websecure, which is the standard Traefik redirect pattern. (Traefik Docs)
  • tls.certresolver=le tells the router to request and renew certificates through the Let’s Encrypt resolver you defined. (Traefik Docs)
  • The dashboard can be exposed securely through api@internal and protected with BasicAuth instead of insecure mode. (Traefik Docs)

7) DNS records

Create DNS records like:

  • A app.yourdomain.com -> your_server_ip
  • A traefik.yourdomain.com -> your_server_ip

If you use IPv6, add AAAA records too. The names used in your router Host(...) rules must resolve to the server running Traefik for ACME issuance to work. (Traefik Docs)


8) First deploy

From the project folder on your server:

docker compose up -d --build

Then open:

  • https://app.yourdomain.com
  • https://traefik.yourdomain.com

On first startup, Traefik should obtain certificates automatically via Let’s Encrypt as requests arrive for matching routers using the resolver. (Traefik Docs)

Useful commands:

docker compose logs -f traefik
docker compose logs -f app
docker compose ps


9) Generate the dashboard password hash

If htpasswd is installed:

htpasswd -nb admin 'your-strong-password'

Put the result in .env as TRAEFIK_BASIC_AUTH=..., but replace every $ with $$ so Docker Compose does not treat them as variable substitutions. Traefik’s BasicAuth docs explicitly mention escaping dollar signs in Docker label contexts. (Traefik Docs)


10) Publish the image from GitHub Actions

If you want Actions to build and push your app image to GHCR, use this workflow.

.github/workflows/publish.yml

name: Build and publish image
on:
push:
branches: ["main"]
env:
IMAGE_NAME: ghcr.io/${{ github.repository_owner }}/devops-starter
jobs:
publish:
runs-on: ubuntu-latest
permissions:
contents: read
packages: write
steps:
- name: Check out repository
uses: actions/checkout@v4
- name: Log in to GHCR
uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Set up Buildx
uses: docker/setup-buildx-action@v3
- name: Extract metadata
id: meta
uses: docker/metadata-action@v5
with:
images: ${{ env.IMAGE_NAME }}
tags: |
type=raw,value=latest
type=sha
- name: Build and push
uses: docker/build-push-action@v6
with:
context: .
file: ./Dockerfile
push: true
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}

GitHub’s docs recommend docker/build-push-action for building and publishing images, and GHCR uses the Container registry at ghcr.io. Workflows can authenticate with GITHUB_TOKEN when package permissions are configured appropriately. (GitHub Docs)

If you switch to pulling the published image on the server, replace the app service in Compose with:

  app:
    image: ghcr.io/YOUR_GITHUB_USERNAME/devops-starter:latest
    restart: unless-stopped
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.app.rule=Host(`${DOMAIN}`)"
      - "traefik.http.routers.app.entrypoints=websecure"
      - "traefik.http.routers.app.tls=true"
      - "traefik.http.routers.app.tls.certresolver=le"
      - "traefik.http.services.app.loadbalancer.server.port=3000"


11) What makes this “production enough” for a first real project

This version is much closer to a real deployment because it has:

  • automatic HTTPS
  • secure dashboard access
  • HTTP→HTTPS redirect
  • restart policy
  • access logs
  • a health endpoint
  • optional CI image publishing

Those pieces line up with Traefik’s Docker standalone guidance and dashboard/ACME docs. (Traefik Docs)


12) Common failure points

If it does not work, the usual causes are:

  • DNS not pointing at the server
  • ports 80/443 blocked by firewall or cloud security group
  • invalid TRAEFIK_BASIC_AUTH hash formatting
  • acme.json missing or wrong permissions
  • router hostnames not matching the browser request

Those are the most common things that prevent Traefik from issuing certs or matching routers in Docker setups. (Traefik Docs)


13) Resume line for this project

Deployed a containerized Node.js service behind Traefik with automatic Let’s Encrypt TLS, secure reverse-proxy routing, and GitHub Actions image publishing to GHCR.

That is solid, real DevOps experience.

step-by-step to build a project (code + config)

here’s a copyable starter project you can build end to end.

It gives you:

  • a tiny Node app in Docker
  • Traefik in front of it
  • hostname-based routing on localhost
  • a GitHub Actions workflow that builds the image
  • a path to deploy the same stack to a server later

This matches Traefik’s current Docker provider pattern, where Traefik watches Docker and picks up routing config from container labels. (Traefik Labs Documentation)


Project structure

devops-starter/
├── app/
│ ├── package.json
│ └── server.js
├── .github/
│ └── workflows/
│ └── docker.yml
├── .dockerignore
├── Dockerfile
└── compose.yml

1) app/package.json

{
"name": "devops-starter",
"version": "1.0.0",
"description": "Simple Node app behind Traefik",
"main": "server.js",
"scripts": {
"start": "node server.js"
},
"license": "MIT"
}

2) app/server.js

const http = require("http");
const PORT = process.env.PORT || 3000;
const server = http.createServer((req, res) => {
const body = {
ok: true,
message: "Hello from the app behind Traefik",
method: req.method,
url: req.url,
hostname: req.headers.host,
time: new Date().toISOString()
};
res.writeHead(200, { "Content-Type": "application/json" });
res.end(JSON.stringify(body, null, 2));
});
server.listen(PORT, () => {
console.log(`Server running on port ${PORT}`);
});

3) Dockerfile

This uses an official Node image and a fixed major version tag, which is in line with GitHub’s Dockerfile guidance. (GitHub Docs)

FROM node:20-alpine
WORKDIR /app
COPY app/package.json ./
RUN npm install --omit=dev
COPY app/server.js ./
ENV PORT=3000
EXPOSE 3000
CMD ["npm", "start"]

4) .dockerignore

node_modules
npm-debug.log
.git
.github
Dockerfile
compose.yml

5) compose.yml

This follows the same core idea as Traefik’s Docker Compose examples: enable the Docker provider, disable exposing containers by default, define an HTTP entrypoint, and add labels to the app container so Traefik creates the router automatically. (Traefik Labs Documentation)

services:
traefik:
image: traefik:v3.0
command:
- "--api.insecure=true"
- "--providers.docker=true"
- "--providers.docker.exposedbydefault=false"
- "--entryPoints.web.address=:80"
ports:
- "80:80"
- "8080:8080"
volumes:
- "/var/run/docker.sock:/var/run/docker.sock:ro"
app:
build:
context: .
dockerfile: Dockerfile
labels:
- "traefik.enable=true"
- "traefik.http.routers.app.rule=Host(`app.localhost`)"
- "traefik.http.routers.app.entrypoints=web"

A couple of notes:

  • api.insecure=true is fine for learning locally, but not for a public server. Traefik’s dashboard docs treat this as something to secure for real deployments. (Traefik Labs Documentation)
  • Because both services are in the same Compose stack, Docker networking handles connectivity between Traefik and the app. That is the same pattern used in Docker and Traefik quick-start examples. (Traefik Labs Documentation)

6) .github/workflows/docker.yml

GitHub’s docs show Docker builds in Actions using actions/checkout and docker/build-push-action. This workflow keeps it simple: it builds on every push to main, and you can later extend it to push to Docker Hub or GHCR. (GitHub Docs)

name: Build Docker image
on:
push:
branches: ["main"]
pull_request:
jobs:
build:
runs-on: ubuntu-latest
steps:
- name: Check out repo
uses: actions/checkout@v4
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Build image
uses: docker/build-push-action@v6
with:
context: .
file: ./Dockerfile
push: false
tags: devops-starter:latest

7) Run it locally

From the project root:

docker compose up -d --build

Then open:

  • http://app.localhost
  • http://localhost:8080 for the Traefik dashboard

Traefik’s Docker quick-start uses the same localhost-style host rule pattern, and the dashboard is commonly exposed on port 8080 in the getting-started setup. (Traefik Labs Documentation)

To stop:

docker compose down

To view logs:

docker compose logs -f

8) What’s happening

When you visit http://app.localhost:

  1. your browser sends a request to port 80
  2. Traefik receives it
  3. Traefik checks Docker-discovered labels
  4. the router rule Host(\app.localhost`)` matches
  5. Traefik forwards the request to the app container

That “dynamic config from Docker labels” model is a central part of Traefik’s configuration overview and Docker provider docs. (Traefik Labs Documentation)


9) Make it feel more real

Add a second app to prove routing works.

Update compose.yml like this:

services:
traefik:
image: traefik:v3.0
command:
- "--api.insecure=true"
- "--providers.docker=true"
- "--providers.docker.exposedbydefault=false"
- "--entryPoints.web.address=:80"
ports:
- "80:80"
- "8080:8080"
volumes:
- "/var/run/docker.sock:/var/run/docker.sock:ro"
app:
build:
context: .
dockerfile: Dockerfile
labels:
- "traefik.enable=true"
- "traefik.http.routers.app.rule=Host(`app.localhost`)"
- "traefik.http.routers.app.entrypoints=web"
whoami:
image: traefik/whoami
labels:
- "traefik.enable=true"
- "traefik.http.routers.whoami.rule=Host(`whoami.localhost`)"
- "traefik.http.routers.whoami.entrypoints=web"

Then:

  • http://app.localhost → your app
  • http://whoami.localhost → sample Traefik test service

That mirrors Traefik’s own examples for exposing services with Docker labels. (Traefik Labs Documentation)


10) How to deploy this later

For a simple first deployment:

  • get a Linux VM
  • install Docker and Docker Compose
  • copy this project to the server
  • point a domain at the server IP
  • swap localhost routing for your real domain
  • add HTTPS with Traefik + Let’s Encrypt

Traefik documents Docker standalone setup, HTTPS entrypoints, and ACME/Let’s Encrypt support as part of its normal production path. (Traefik Labs Documentation)

Your production router label would look more like:

- "traefik.http.routers.app.rule=Host(`app.yourdomain.com`)"

11) Resume-worthy version of this project

Once this is live, you can honestly describe it like this:

Built and deployed a containerized Node.js service using Docker and Traefik with hostname-based routing and automated image builds via GitHub Actions.

That is a real DevOps project, not tutorial-only practice.


12) Best next upgrades

After this works, do these in order:

  1. add /healthz endpoint
  2. add a test job to GitHub Actions
  3. push built images to GHCR or Docker Hub
  4. deploy on a small cloud VM
  5. add HTTPS with Let’s Encrypt
  6. add Prometheus/Grafana later

GitHub’s Actions docs already provide the build-and-publish direction if you want to turn your build-only workflow into a registry-pushing workflow. (GitHub Docs)


13) The shortest possible checklist

Create files → run:

docker compose up -d --build

Visit:

http://app.localhost
http://localhost:8080

Push to GitHub → Actions builds the image definition automatically.


The 2026 Guide to DevOps Careers

The 2026 Guide to DevOps Careers

DevOps isn’t just a job title anymore—it’s a core engineering mindset that companies rely on to ship software faster, safer, and at scale. If you’re thinking about getting into it (or leveling up), here’s a clear, realistic guide to where things stand in 2026.


What DevOps Actually Means (Now)

DevOps sits at the intersection of:

  • Software development
  • Infrastructure / cloud
  • Automation
  • Reliability & monitoring

In practice, you’re:

  • Building CI/CD pipelines
  • Managing cloud infrastructure
  • Improving deployment speed & reliability
  • Fixing production issues
  • Automating everything repetitive

Common DevOps Roles (2026)

DevOps Engineer

  • Focus: CI/CD, automation, infrastructure
  • Tools: GitHub Actions, Jenkins, Terraform
  • Entry → Mid-level role

Cloud Engineer

  • Focus: Cloud platforms, networking, scalability
  • Platforms: AWS, Google Cloud Platform, Microsoft Azure
  • Heavy on infrastructure + cost optimization

Site Reliability Engineer (SRE)

  • Focus: uptime, performance, incident response
  • Origin: Google
  • More coding + systems thinking than typical DevOps

Platform Engineer (fastest-growing)

  • Focus: building internal developer platforms
  • Tools: Kubernetes, Backstage
  • Think: “DevOps as a product”

Core Skills You Need

1. Linux & Networking

  • SSH, processes, file systems
  • HTTP, DNS, load balancing

2. Containers & Orchestration

Image
Image
  • Docker → package apps
  • Kubernetes → run them at scale

3. CI/CD Pipelines

  • Automate build → test → deploy
  • Tools: GitLab CI, CircleCI

4. Infrastructure as Code (IaC)

  • Manage infra like code
  • Tools: Terraform, Ansible

5. Observability

  • Logs, metrics, tracing
  • Tools: Prometheus, Grafana

6. Networking & Routing (where Traefik fits)

  • Reverse proxies like Traefik or NGINX
  • TLS, domains, load balancing

Learning Roadmap (Beginner → Job Ready)

Stage 1: Foundations (2–4 weeks)

  • Linux basics
  • Git + GitHub
  • Basic networking (HTTP, DNS)

Stage 2: Build Stuff (1–2 months)

  • Learn Docker
  • Deploy a simple app locally
  • Add Traefik or NGINX

Stage 3: Cloud + Automation (2–3 months)

  • Use AWS or similar
  • Write basic Terraform
  • Create CI/CD pipeline

Stage 4: Production-Level Skills

  • Learn Kubernetes
  • Add monitoring (Prometheus + Grafana)
  • Practice debugging failures

Salaries (2026 rough ranges)

(varies by country & experience)

  • Entry: $70k–$100k
  • Mid: $100k–$150k
  • Senior: $150k–$220k+
  • Platform/SRE at big tech: even higher

What’s Changed in 2026

  • Platform Engineering > DevOps titles
  • More focus on developer experience (DX)
  • AI is assisting pipelines, but you still need fundamentals
  • Kubernetes is still dominant—but simplified tools are growing

What Actually Gets You Hired

Not certificates—projects:

  • Deploy a full app with:
    • Docker
    • CI/CD
    • HTTPS (Traefik or NGINX)
  • Show logs + monitoring
  • Break things and fix them

Example Project (highly recommended)

Build this:

  • App (Node/Python)
  • Containerized with Docker
  • Routed via Traefik
  • Deployed on AWS
  • Automated with CI/CD
  • HTTPS enabled

That one project alone can outperform most resumes.


Final Reality Check

DevOps is:

  • Less about tools
  • More about systems thinking + automation mindset

If you like:

  • solving messy problems
  • understanding how systems connect
  • building things that just work

—you’ll probably enjoy it.


realistic 30-day DevOps plan that ends with a real project you can show on your resume. No fluff, just what actually builds skill.


30-Day DevOps Roadmap (Hands-on)

Final Goal (what you’ll build)

By Day 30, you’ll have:

  • A containerized app using Docker
  • Routed with Traefik
  • CI/CD pipeline via GitHub Actions
  • Deployed on AWS (or similar)
  • HTTPS enabled
  • Public URL to show employers

Week 1 — Foundations (Linux + Networking)

Learn:

  • Basic Linux commands: cd, ls, grep, chmod
  • Processes & ports
  • What HTTP, DNS, and IP actually do

Do:

  • Install Linux (or use WSL)
  • Run a simple web server:
python3 -m http.server 8000

Outcome:

You understand how apps run and are accessed via ports.


Week 2 — Containers (Docker)

Learn:

  • What containers are
  • Images vs containers
  • Ports & volumes

Do:

Build your first container:

FROM node:18
WORKDIR /app
COPY . .
RUN npm install
CMD ["node", "app.js"]

Run it:

docker build -t myapp .
docker run -p 3000:3000 myapp

Visual idea

Image

Outcome:

You can package and run any app.


Week 3 — Reverse Proxy + Routing (Traefik)

Learn:

  • What a reverse proxy does
  • Routing by domain
  • Basic HTTPS concepts

Do:

Use Traefik with Docker Compose:

  • Route:
    • app.localhost → your app
  • Add second service to understand routing

Outcome:

You understand how real production traffic is routed.


Week 4 — CI/CD + Cloud Deployment

Learn:

  • CI/CD basics (build → test → deploy)
  • Cloud basics (VMs, ports, SSH)

Do:

1. CI/CD with GitHub Actions

Create .github/workflows/deploy.yml:

name: Deploy
on: [push]
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- run: docker build -t myapp .

2. Deploy to AWS

  • Launch EC2 instance
  • SSH in
  • Install Docker
  • Run your app + Traefik setup

3. Add HTTPS

  • Use Traefik + Let’s Encrypt
  • Point your domain to your server

Outcome:

Your app is LIVE on the internet.


Your Final Project Structure

project/
├── app/
├── Dockerfile
├── docker-compose.yml
├── traefik/
├── .github/workflows/

What to Put on Your Resume

Instead of “learning DevOps,” say:

Deployed a containerized application using Docker and Traefik with automated CI/CD via GitHub Actions on AWS, including HTTPS and reverse proxy routing.

That’s real experience.


Common Mistakes (avoid these)

  • ❌ Only watching tutorials
  • ❌ Skipping networking basics
  • ❌ Not deploying publicly
  • ❌ Over-focusing on Kubernetes too early

If You Want to Go Further (Day 31+)

Next steps:

  • Learn Kubernetes
  • Add monitoring with Prometheus + Grafana
  • Use Terraform

Final Advice

Don’t aim to “learn DevOps.”

Aim to:

Build → Break → Fix → Repeat

That loop is what actually makes you good.


I can guide you step-by-step to build the exact project (code + config) so you can literally copy, run, and understand everything.

I will create a separate post for this

Traefik

Traefik is an open-source reverse proxy and load balancer designed for modern cloud-native applications—especially those running in containers.


What that actually means

Think of Traefik as a smart traffic controller sitting in front of your apps:

  • It receives incoming requests (like someone visiting your website)
  • Then routes them to the correct service (e.g., your API, frontend, or another container)
  • It can also balance traffic across multiple instances of the same service

Key features

Automatic service discovery

Image
Image

Traefik integrates directly with tools like:

  • Docker
  • Kubernetes

It automatically detects new containers/services and routes traffic to them—no manual config needed.


Built-in HTTPS (SSL/TLS)

  • Automatically generates and renews certificates using Let’s Encrypt
  • Handles HTTPS setup for you (no manual certificate management)

Load balancing

  • Distributes requests across multiple instances
  • Helps keep your app fast and available

Dynamic configuration

  • Updates routes in real time when services start/stop
  • No restarts required

Dashboard & monitoring

  • Web UI shows routes, services, and traffic
  • Useful for debugging and observability

Simple example (Docker)

If you run a container with labels like:

labels:
- "traefik.http.routers.myapp.rule=Host(`myapp.local`)"

Traefik will:

  • Detect the container
  • Create a route for myapp.local
  • Start sending traffic there automatically

When people use Traefik

  • Microservices architectures
  • Docker or Kubernetes setups
  • Hosting multiple apps on one server
  • Replacing tools like:
    • NGINX
    • HAProxy

In one sentence

Traefik is a modern, automatic reverse proxy that makes routing traffic to containerized apps simple and dynamic.


Here’s a small working Docker Compose example you can run locally. It follows Traefik’s current Docker quick-start pattern: Traefik listens to Docker, exposes port 80 for app traffic and 8080 for the dashboard, and routes a sample whoami container using labels. (Traefik Docs)

version: "3.9"
services:
traefik:
image: traefik:v3.0
container_name: traefik
command:
- "--api.insecure=true"
- "--providers.docker=true"
- "--providers.docker.exposedbydefault=false"
- "--entryPoints.web.address=:80"
ports:
- "80:80"
- "8080:8080"
volumes:
- "/var/run/docker.sock:/var/run/docker.sock:ro"
whoami:
image: traefik/whoami
container_name: whoami
labels:
- "traefik.enable=true"
- "traefik.http.routers.whoami.rule=Host(`whoami.localhost`)"
- "traefik.http.routers.whoami.entrypoints=web"

Run it with:

docker compose up -d

Then open:

  • http://whoami.localhost → sample app
  • http://localhost:8080 → Traefik dashboard

That hostname rule is the key idea: Traefik reads the Docker labels and creates a router so requests for whoami.localhost go to the whoami container. Traefik hot-reloads this dynamic routing config from Docker without restarting. (Traefik Docs)

How to read the important lines:

  • --providers.docker=true tells Traefik to watch Docker for containers/services. (Traefik Docs)
  • --providers.docker.exposedbydefault=false means only containers with traefik.enable=true get exposed. (Traefik Docs)
  • --entryPoints.web.address=:80 creates an HTTP entrypoint on port 80. (Traefik Docs)
  • traefik.http.routers.whoami.rule=Host(\whoami.localhost`)` matches incoming requests by hostname. (Traefik Docs)

A more realistic example is routing two apps:

version: "3.9"
services:
traefik:
image: traefik:v3.0
command:
- "--api.insecure=true"
- "--providers.docker=true"
- "--providers.docker.exposedbydefault=false"
- "--entryPoints.web.address=:80"
ports:
- "80:80"
- "8080:8080"
volumes:
- "/var/run/docker.sock:/var/run/docker.sock:ro"
app1:
image: traefik/whoami
labels:
- "traefik.enable=true"
- "traefik.http.routers.app1.rule=Host(`app1.localhost`)"
- "traefik.http.routers.app1.entrypoints=web"
app2:
image: traefik/whoami
labels:
- "traefik.enable=true"
- "traefik.http.routers.app2.rule=Host(`app2.localhost`)"
- "traefik.http.routers.app2.entrypoints=web"

Then:

  • http://app1.localhost → app1
  • http://app2.localhost → app2

That is basically the Traefik workflow: define a service, add labels, and Traefik discovers it automatically. The official docs also note that when both containers are in the same Compose file, Docker’s default network is enough for Traefik to reach them. (Traefik Docs)

A couple of useful notes:

  • The dashboard setting shown here is insecure and meant for local learning, not production. (Traefik Docs)
  • For production, people usually add TLS/HTTPS and often Let’s Encrypt certificate automation through Traefik’s config. The routing and certificates are part of Traefik’s dynamic config model. (Traefik Docs)

Want me to show the same thing with HTTPS + Let’s Encrypt, or a Kubernetes Ingress example?

Here’s a real HTTPS + Let’s Encrypt Docker Compose setup for Traefik.

It uses the HTTP-01 challenge, which means your server must be publicly reachable on ports 80 and 443, and your domain’s DNS must point at that server. Traefik’s docs also note that certificate resolvers are defined in static config, the router must have TLS enabled, and the router references the resolver by name. (Traefik Docs)

version: "3.9"
services:
traefik:
image: traefik:v3.4
container_name: traefik
command:
- "--api.dashboard=true"
- "--providers.docker=true"
- "--providers.docker.exposedbydefault=false"
# entrypoints
- "--entrypoints.web.address=:80"
- "--entrypoints.websecure.address=:443"
# redirect http -> https
- "--entrypoints.web.http.redirections.entrypoint.to=websecure"
- "--entrypoints.web.http.redirections.entrypoint.scheme=https"
# lets encrypt
- "--certificatesresolvers.le.acme.email=you@example.com"
- "--certificatesresolvers.le.acme.storage=/letsencrypt/acme.json"
- "--certificatesresolvers.le.acme.httpchallenge=true"
- "--certificatesresolvers.le.acme.httpchallenge.entrypoint=web"
ports:
- "80:80"
- "443:443"
- "8080:8080" # optional dashboard
volumes:
- "/var/run/docker.sock:/var/run/docker.sock:ro"
- "./letsencrypt:/letsencrypt"
app:
image: traefik/whoami
container_name: whoami
labels:
- "traefik.enable=true"
# router
- "traefik.http.routers.app.rule=Host(`whoami.yourdomain.com`)"
- "traefik.http.routers.app.entrypoints=websecure"
- "traefik.http.routers.app.tls=true"
- "traefik.http.routers.app.tls.certresolver=le"

Then run:

mkdir -p letsencrypt
touch letsencrypt/acme.json
chmod 600 letsencrypt/acme.json
docker compose up -d

Then open:

  • https://whoami.yourdomain.com
  • http://whoami.yourdomain.com will redirect to HTTPS

Why this works:

  • providers.docker=true makes Traefik watch Docker containers and labels. (Traefik Docs)
  • entrypoints.web and entrypoints.websecure create listeners on ports 80 and 443. (Traefik Docs)
  • certificatesresolvers.le.acme.* configures Let’s Encrypt and stores certs in acme.json. (Traefik Docs)
  • tls.certresolver=le tells that router to request and renew a cert through the le resolver. (Traefik Docs)

A couple of important gotchas:

  • Replace you@example.com with your real email.
  • Replace whoami.yourdomain.com with a real domain/subdomain you control.
  • That hostname needs an A or AAAA record pointing to your server. Traefik’s docs say ACME domains must point to Traefik. (Traefik Docs)
  • Port 8080 is just for the dashboard. Don’t leave it openly exposed on the internet unless you secure it. (Traefik Docs)

For Kubernetes, the usual Traefik-native route is an IngressRoute or standard Ingress; Traefik also documents a working Kubernetes + Let’s Encrypt example with CRDs. (Traefik Docs)