TL;DR
Solution for kubernetes certificate & key (pki) expiration issues 🙏
Warning: If you are running an HA cluster, these commands must be executed on all the control-plane nodes.
Kubernetes v1.14 and before
- Tested against v1.14
// Backup current certs and keys
$ cd /etc/kubernetes/pki/
$ mkdir -p ~/tmp/BACKUP_etc_kubernetes_pki/etcd/
$ sudo mv {apiserver.crt,apiserver-etcd-client.key,apiserver-kubelet-client.crt,front-proxy-ca.crt,front-proxy-client.crt,front-proxy-client.key,front-proxy-ca.key,apiserver-kubelet-client.key,apiserver.key,apiserver-etcd-client.crt} ~/tmp/BACKUP_etc_kubernetes_pki/.
$ sudo mv {etcd/healthcheck-client.crt,etcd/healthcheck-client.key,etcd/peer.crt,etcd/peer.key,etcd/server.crt,etcd/server.key} ~/tmp/BACKUP_etc_kubernetes_pki/etcd/.
// Generate new certs and keys
// ** If you originally used any flags with `kubeadm init` command during the cluster setup,
// ** make sure you use the same here. Otherwise, the default spec will be used.
// ** Available flags: https://v1-14.docs.kubernetes.io/docs/reference/setup-tools/kubeadm/kubeadm-init-phase/#cmd-phase-certs
$ sudo kubeadm init phase certs all
$ sudo kubeadm init phase certs all --config /<k8_specs_directory>/kubeadm_config.yaml
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Using existing etcd/ca certificate authority
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [<apiserver_advertise_host> localhost] and IPs [<apiserver_advertise_ip> 127.0.0.1 ::1]
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [<apiserver_advertise_host> localhost] and IPs [<apiserver_advertise_ip> 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Using existing ca certificate authority
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [<apiserver_advertise_host> kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [<k8_subnet_and_host_ips>]
[certs] Using the existing "sa" key
// Backup current configs
$ cd /etc/kubernetes/
$ mkdir -p ~/tmp/BACKUP_etc_kubernetes
$ sudo mv {admin.conf,controller-manager.conf,kubelet.conf,scheduler.conf} ~/tmp/BACKUP_etc_kubernetes/.
// Generate new configs
// ** If you originally used any flags with `kubeadm init` command during the cluster setup,
// ** make sure you use the same here. Otherwise, the default spec will be used.
// ** Available flags: https://v1-14.docs.kubernetes.io/docs/reference/setup-tools/kubeadm/kubeadm-init-phase/#cmd-phase-kubeconfig
$ sudo kubeadm init phase kubeconfig all
$ sudo kubeadm init phase kubeconfig all --config /<k8_specs_directory>/kubeadm_config.yaml
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
// Reboot the node for changes to take effect
$ sudo reboot now
// After reboot
$ mkdir -p ~/tmp/BACKUP_home_.kube/
$ cp -r ~/.kube/* ~/tmp/BACKUP_home_.kube/.
$ sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
// `kubectl` commands should now work
$ kubectl get pods -o wide
Kubernetes v1.15 and later
- Tested against v1.15
// Check certs expiration
$ sudo kubeadm alpha certs check-expiration
// Renew certs & keys manually. See: https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-certs/#manual-certificate-renewal
$ sudo kubeadm alpha certs renew
Demystifying the issue
If you would like to dig into the each error message and learn the root causes behind each error, please read on.
Inspecting the issue symptoms
(01) kubectl
fails with the below error.
$ kubectl get pods -o wide
The connection to the server <apiserver_advertise_ip>:6443 was refused - did you specify the right host or port?
(02) kubeadm
fails to run properly.
$ systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
Drop-In: /usr/lib/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: activating (auto-restart) (Result: exit-code) since Mon 2020-06-01 08:51:47 +0530; 3s ago
Docs: https://kubernetes.io/docs/
Process: 14027 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS (code=exited, status=255)
Main PID: 14027 (code=exited, status=255)
(03) kubelet
daemon throws below errors to the system log (you can check this either via /var/log/messages
or
simply using journalctl
command)
$ journalctl | grep kubelet
Jun 01 08:42:53 <node_name> systemd[1]: Started kubelet: The Kubernetes Node Agent.
Jun 01 08:42:54 <node_name> kubelet[3653]: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Jun 01 08:42:54 <node_name> kubelet[3653]: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Jun 01 08:42:54 <node_name> kubelet[3653]: I0601 08:42:54.224801 3653 server.go:417] Version: v1.14.1
Jun 01 08:42:54 <node_name> kubelet[3653]: I0601 08:42:54.226118 3653 plugins.go:103] No cloud provider specified.
Jun 01 08:42:54 <node_name> kubelet[3653]: I0601 08:42:54.226152 3653 server.go:754] Client rotation is on, will bootstrap in background
Jun 01 08:42:54 <node_name> kubelet[3653]: E0601 08:42:54.232397 3653 bootstrap.go:264] Part of the existing bootstrap client certificate is expired: 2020-04-11 02:01:22 +0000 UTC
Jun 01 08:42:54 <node_name> kubelet[3653]: F0601 08:42:54.234118 3653 server.go:265] failed to run Kubelet: unable to load bootstrap kubeconfig: stat /etc/kubernetes/bootstrap-kubelet.conf: no such file or directory
Jun 01 08:42:54 <node_name> systemd[1]: kubelet.service: main process exited, code=exited, status=255/n/a
Jun 01 08:42:54 <node_name> systemd[1]: Unit kubelet.service entered failed state.
Jun 01 08:42:54 <node_name> systemd[1]: kubelet.service failed.
Jun 01 08:43:04 <node_name> systemd[1]: kubelet.service holdoff time over, scheduling restart.
Jun 01 08:43:04 <node_name> systemd[1]: Stopped kubelet: The Kubernetes Node Agent.
If your kubernetes instance shows all of the below symptoms, you are supposed to renew the certificates and keys used by the kubernetes services immediately (like we did in the above solution).
Experience sharing
If you use any sort of pki (public key infrastructure), those certs and keys are going to get expired someday and you will have to renew them before the expiration date. This applies to kubernetes pki as well.
If you fail to renew on time, above issue will occur and you will definitely have a hard time.
Imagine getting paralysed without access to kubectl
commands on Production. That
should be the worst nightmare of any kubernetes administrator.
In fact, while looking at the community discussions,
it is clear that most users have faced this issue while on Production servers or similar
long-term fixed environments 😱
Exploring the best practice
Until kubernetes v1.14, the only way to know the cert expiration date was by using openssl
or similar tool as below.
// Check the expiry dates of certs used by kubernetes services
$ openssl x509 -in /etc/kubernetes/pki/apiserver.crt -noout -text
$ openssl x509 -in /etc/kubernetes/pki/apiserver-kubelet-client.crt -noout -text
Considering the gravity of this issue and a lot of complaints by users, kubernetes v1.15 introduced an improved Certificate Management with kubeadm.
This had 3 notifiable improvements.
- After v1.15, kubeadm can show you cert expiration dates with below command.
$ kubeadm alpha certs check-expiration
- After v1.15, kubeadm can automatically renew certificates and keys for you during a
kubeadm upgrade
. - After v1.15, kubeadm can manually upgrade certificates and keys with below command.
// Renew certs & keys manually. See: https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-certs/#manual-certificate-renewal
$ kubeadm alpha certs renew
Normally the certificates are renewed for 1 year and as a best practice, you are highly advised to
perform kubeadm upgrade
to more recent versions, when the new version is generally available.
With that, your certs and keys
will get renewed automatically and your cluster will be more relevant while receiving the latest stable features of
kubeadm. However, if you run a closed production environment (often have no privilege to upgrade kubeadm versions),
you are advised to keep track of certification expiration periods and perform manual renewal when the
expiry dates are close.
Possible Issues
- Flannel & CoreDNS failure with
x509: certificate is valid for <subnet_ip_1>, <node_ip>, not <subnet_ip_2>
error : The root cause for this error is using a wrongclusterConfiguration
spec with invalid IPs. Read here for more details.
✅ Tested OS's | : RHEL 7+, CentOS 7+, Ubuntu 18.04+, Debian 8+ |
---|---|
✅ Tested Gear | : Cloud (AWS EC2), On-Prem (Bare Metal) |
Leave a comment