TL;DR

Solution for kubernetes certificate & key (pki) expiration issues 🙏

Warning: If you are running an HA cluster, these commands must be executed on all the control-plane nodes.

Kubernetes v1.14 and before

  • Tested against v1.14
// Backup current certs and keys
$ cd /etc/kubernetes/pki/
$ mkdir -p ~/tmp/BACKUP_etc_kubernetes_pki/etcd/
$ sudo mv {apiserver.crt,apiserver-etcd-client.key,apiserver-kubelet-client.crt,front-proxy-ca.crt,front-proxy-client.crt,front-proxy-client.key,front-proxy-ca.key,apiserver-kubelet-client.key,apiserver.key,apiserver-etcd-client.crt} ~/tmp/BACKUP_etc_kubernetes_pki/.
$ sudo mv {etcd/healthcheck-client.crt,etcd/healthcheck-client.key,etcd/peer.crt,etcd/peer.key,etcd/server.crt,etcd/server.key} ~/tmp/BACKUP_etc_kubernetes_pki/etcd/.

// Generate new certs and keys 
// ** If you originally used any flags with `kubeadm init` command during the cluster setup,
// ** make sure you use the same here. Otherwise, the default spec will be used.
// ** Available flags: https://v1-14.docs.kubernetes.io/docs/reference/setup-tools/kubeadm/kubeadm-init-phase/#cmd-phase-certs
$ sudo kubeadm init phase certs all
$ sudo kubeadm init phase certs all --config /<k8_specs_directory>/kubeadm_config.yaml
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Using existing etcd/ca certificate authority
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [<apiserver_advertise_host> localhost] and IPs [<apiserver_advertise_ip> 127.0.0.1 ::1]
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [<apiserver_advertise_host> localhost] and IPs [<apiserver_advertise_ip> 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Using existing ca certificate authority
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [<apiserver_advertise_host> kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [<k8_subnet_and_host_ips>]
[certs] Using the existing "sa" key

// Backup current configs
$ cd /etc/kubernetes/
$ mkdir -p ~/tmp/BACKUP_etc_kubernetes
$ sudo mv {admin.conf,controller-manager.conf,kubelet.conf,scheduler.conf} ~/tmp/BACKUP_etc_kubernetes/.

// Generate new configs
// ** If you originally used any flags with `kubeadm init` command during the cluster setup,
// ** make sure you use the same here. Otherwise, the default spec will be used. 
// ** Available flags: https://v1-14.docs.kubernetes.io/docs/reference/setup-tools/kubeadm/kubeadm-init-phase/#cmd-phase-kubeconfig
$ sudo kubeadm init phase kubeconfig all
$ sudo kubeadm init phase kubeconfig all --config /<k8_specs_directory>/kubeadm_config.yaml
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file

// Reboot the node for changes to take effect
$ sudo reboot now

// After reboot
$ mkdir -p ~/tmp/BACKUP_home_.kube/
$ cp -r ~/.kube/* ~/tmp/BACKUP_home_.kube/.
$ sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config

// `kubectl` commands should now work
$ kubectl get pods -o wide

Kubernetes v1.15 and later

  • Tested against v1.15
// Check certs expiration
$ sudo kubeadm alpha certs check-expiration

// Renew certs & keys manually. See: https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-certs/#manual-certificate-renewal
$ sudo kubeadm alpha certs renew

Demystifying the issue

If you would like to dig into the each error message and learn the root causes behind each error, please read on.

Inspecting the issue symptoms

(01) kubectl fails with the below error.

$ kubectl get pods -o wide
The connection to the server <apiserver_advertise_ip>:6443 was refused - did you specify the right host or port?

(02) kubeadm fails to run properly.

$ systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
   Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
  Drop-In: /usr/lib/systemd/system/kubelet.service.d
           └─10-kubeadm.conf
   Active: activating (auto-restart) (Result: exit-code) since Mon 2020-06-01 08:51:47 +0530; 3s ago
     Docs: https://kubernetes.io/docs/
  Process: 14027 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS (code=exited, status=255)
 Main PID: 14027 (code=exited, status=255)

(03) kubelet daemon throws below errors to the system log (you can check this either via /var/log/messages or simply using journalctl command)

$ journalctl | grep kubelet
Jun 01 08:42:53 <node_name> systemd[1]: Started kubelet: The Kubernetes Node Agent.
Jun 01 08:42:54 <node_name> kubelet[3653]: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Jun 01 08:42:54 <node_name> kubelet[3653]: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Jun 01 08:42:54 <node_name> kubelet[3653]: I0601 08:42:54.224801    3653 server.go:417] Version: v1.14.1
Jun 01 08:42:54 <node_name> kubelet[3653]: I0601 08:42:54.226118    3653 plugins.go:103] No cloud provider specified.
Jun 01 08:42:54 <node_name> kubelet[3653]: I0601 08:42:54.226152    3653 server.go:754] Client rotation is on, will bootstrap in background
Jun 01 08:42:54 <node_name> kubelet[3653]: E0601 08:42:54.232397    3653 bootstrap.go:264] Part of the existing bootstrap client certificate is expired: 2020-04-11 02:01:22 +0000 UTC
Jun 01 08:42:54 <node_name> kubelet[3653]: F0601 08:42:54.234118    3653 server.go:265] failed to run Kubelet: unable to load bootstrap kubeconfig: stat /etc/kubernetes/bootstrap-kubelet.conf: no such file or directory
Jun 01 08:42:54 <node_name> systemd[1]: kubelet.service: main process exited, code=exited, status=255/n/a
Jun 01 08:42:54 <node_name> systemd[1]: Unit kubelet.service entered failed state.
Jun 01 08:42:54 <node_name> systemd[1]: kubelet.service failed.
Jun 01 08:43:04 <node_name> systemd[1]: kubelet.service holdoff time over, scheduling restart.
Jun 01 08:43:04 <node_name> systemd[1]: Stopped kubelet: The Kubernetes Node Agent.

If your kubernetes instance shows all of the below symptoms, you are supposed to renew the certificates and keys used by the kubernetes services immediately (like we did in the above solution).

Experience sharing

If you use any sort of pki (public key infrastructure), those certs and keys are going to get expired someday and you will have to renew them before the expiration date. This applies to kubernetes pki as well.

If you fail to renew on time, above issue will occur and you will definitely have a hard time. Imagine getting paralysed without access to kubectl commands on Production. That should be the worst nightmare of any kubernetes administrator. In fact, while looking at the community discussions, it is clear that most users have faced this issue while on Production servers or similar long-term fixed environments 😱

Exploring the best practice

Until kubernetes v1.14, the only way to know the cert expiration date was by using openssl or similar tool as below.

// Check the expiry dates of certs used by kubernetes services
$ openssl x509 -in /etc/kubernetes/pki/apiserver.crt -noout -text
$ openssl x509 -in /etc/kubernetes/pki/apiserver-kubelet-client.crt -noout -text

Considering the gravity of this issue and a lot of complaints by users, kubernetes v1.15 introduced an improved Certificate Management with kubeadm.

This had 3 notifiable improvements.

  • After v1.15, kubeadm can show you cert expiration dates with below command.
$ kubeadm alpha certs check-expiration
  • After v1.15, kubeadm can automatically renew certificates and keys for you during a kubeadm upgrade.
  • After v1.15, kubeadm can manually upgrade certificates and keys with below command.
// Renew certs & keys manually. See: https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-certs/#manual-certificate-renewal
$ kubeadm alpha certs renew

Normally the certificates are renewed for 1 year and as a best practice, you are highly advised to perform kubeadm upgrade to more recent versions, when the new version is generally available. With that, your certs and keys will get renewed automatically and your cluster will be more relevant while receiving the latest stable features of kubeadm. However, if you run a closed production environment (often have no privilege to upgrade kubeadm versions), you are advised to keep track of certification expiration periods and perform manual renewal when the expiry dates are close.

Possible Issues

  • Flannel & CoreDNS failure with x509: certificate is valid for <subnet_ip_1>, <node_ip>, not <subnet_ip_2> error : The root cause for this error is using a wrong clusterConfiguration spec with invalid IPs. Read here for more details.

✅ Tested OS's : RHEL 7+, CentOS 7+, Ubuntu 18.04+, Debian 8+
✅ Tested Gear : Cloud (AWS EC2), On-Prem (Bare Metal)

👉 Any questions? Please comment below.


Leave a comment