Exercise
This exercise simulates an API server failure and demonstrates how to troubleshoot it.
Check the current status of all control plane components
Simulate an API server failure by introducing a configuration error:
- SSH to the control plane node
- Edit
/etc/kubernetes/manifests/kube-apiserver.yaml(you needsudofor this) - Change the
--etcd-serversparameter to an invalid value likehttps://127.0.0.1:2380(wrong port)
Wait 1-2 minutes and check what happens to the cluster
Find the issue and fix the configuration
Verify the API server is working again
Documentation
Solution
- Check the current status of all control plane components
kubectl get pods -n kube-system
kubectl get nodes
kubectl get componentstatuses # this command is deprecated but is still useful- Simulate API server failure (on control plane node):
Instructions provided above
- Wait and observe the failure:
Any kubectl command should timeout with an error message.
$ kubectl get nodes
Get "https://192.168.64.17:6443/api/v1/nodes?limit=500": net/http: TLS handshake timeout - error from a previous attempt: read tcp 192.168.64.17:43892->192.168.64.17:6443: read: connection reset by peer- Find the issue and fix the configuration
First, check the API server logs on control plane node:
sudo journalctl -u kubelet | grep apiserverYou’ll get information similar to:
Aug 06 11:37:28 controlplane kubelet[28526]: E0806 11:37:28.906487 28526 pod_workers.go:1301] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"kube-apiserver\" with CrashLoopBackOff: \"back-off 40s restarting failed container=kube-apiserver pod=kube-apiserver-controlplane_kube-system(985738be376c918fcccfba9135af2a6f)\"" pod="kube-system/kube-apiserver-controlplane" podUID="985738be376c918fcccfba9135af2a6f"This indicates kubelet is not able to start the API Server, but at this stage we don’t know why.
Let’s check the log of the API Server container from /var/log/containers (we need to use sudo for that purpose). First, we get the corresponding log file.
$ cd /var/log/containers
$ ls -al | grep kube-apiserverNext, we can check for errors in this file.
$ cat kube-apiserver-controlplane_kube-system_kube-apiserver-6171a92ceb23ba2828f84bf3bf875d39ca71272c804606ce52202cbb021b5023.log
2025-08-06T11:40:13.180718092+02:00 stderr F W0806 09:40:13.180666 1 logging.go:55] [core] [Channel #2 SubChannel #4]grpc: addrConn.createTransport failed to connect to {Addr: "127.0.0.1:2380", ServerName: "127.0.0.1:2380", }. Err: connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:2380: connect: connection refused"The logs show the API Server cannot connect to etcd. This is due to the incorrect port number, as etcd listens on 2379.
Then, fix the configuration by changing the port from 2380 back to 2379:
sudo vim /etc/kubernetes/manifests/kube-apiserver.yaml
# Change back to: --etcd-servers=https://127.0.0.1:2379- Verify API server is working:
After a few tens of seconds, the API server will be back to normal.
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
controlplane Ready control-plane 5d21h v1.32.7
worker1 Ready <none> 5d21h v1.32.7
worker2 Ready <none> 5d21h v1.32.7Below are the key troubleshooting commands we used:
# Check component status
kubectl get pods -n kube-system
kubectl get nodes
# Check static pod manifests
ls -la /etc/kubernetes/manifests/
# Check API server logs via kubelet (on control plane node)
sudo journalctl -u kubelet | grep apiserver
# Check kubelet logs
sudo journalctl -u kubelet -f
# Check control plane logs in /var/log/containers
sudo cat /var/log/containers/kube-apiserver...