Updating Kubernetes master node and worker node config if an ip address changes

I have a test Kubernetes cluster running with a CentOS7 master nodes, and 4 CentOS7 worker nodes, under VMware ESXi. The ip addresses of each of the VMs is from DHCP, and as I hadn’t booted these VMs for a while, when I recently started them up they all got new IP addresses, so the cluster would not start up, and all the .kube/config files were now referring to incorrect IP addresses. Note to self – this is a good reason why you should use DNS names for the nodes in your cluster instead of ip addresses, especially IP addresses that can change.

Anyway, to restore my cluster back to a working state, I reinitialized the master node, and the joined the workers to the new master.

First on the master:

sudo kubeadm reset
sudo kubeadm init --pod-network-cidr=10.244.0.0/16

#take a copy of the kubeadm join command to run on the workers

#copy kube config for local kubectl
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config

#apply networking overlay
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/v0.8.0/Documentation/kube-flannel.yml
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/v0.8.0/Documentation/kube-flannel-rbac.yml

#for each of the worker nodes, scp the config file to each node for local kubectl use
scp /etc/kubernetes/admin.conf kev@192.168.1.86:~/.kube/config

On each of the worker nodes:

sudo kubeadm reset

#then run the kubeadm join command shown from the master when you ran kubeadm init

Revisiting Docker and Kubernetes installation on CentOS7 (take 3)

I tried a while back to get a Kubernetes cluster up and running on CentOS7, and captured my experience in this post here. At one point I did have it up and running, but after a reboot of my master and worker nodes, I ran into an issue with some of the pods not starting, and then decided to shelf this project for a while to work on something else.

Based on tips from a colleague who had recently worked through a similar setup, the main difference in the approach he took compared to my steps was that he didn’t do a vanilla install of Docker with ‘sudo yum install docker’ but instead installed a custom version for CentOS.

Retracing my prior steps, the section in the Kubernetes install instructions here  tell you to do a ‘sudo yum install docker’, but the steps on the Docker site for CentOS here walk you through installing from a custom repo. I followed these steps on a clean CentOS 7 install, and then continued with the Kubernetes setup.

Started the Docker service with:

sudo systemctl start docker.service

Next instead of opening the required ports, since this is just a homelab setup, I just disabled the firewall (following instructions from here):

sudo systemctl disable firewalld

And then stopped it from currently running:

sudo systemctl stop firewalld

Next, picking up with the kubernetes instructions to install kubelet, kubeadm etc.

Disable selinux:

sudo setenforce 0

From my previous steps, editing /etc/selinux/configand setting:

SELINUX=disabled

CentOS7 specific config for iptables (although disabling the firewall on CentOS7 this might not be relevant, but adding it anyway :

cat <<EOF >  /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF
sysctl --system

Disable swap:

swapoff -a

Also, edit /etc/fstab and remove the swap line and reboot.

Next, following the install instructions to add the repo file, and then installing with:

sudo yum install -y kubelet kubeadm kubectl

Enabling the services:

sudo systemctl enable kubelet && systemctl start kubelet

Starting the node init:

sudo kubeadmin init

And realized we hadn’t addressed the cgroups config issue to get kubelet and docker using the same driver:

Dec 17 18:17:08 unknown000C2954104F kubelet[16450]: error: failed to run Kubelet: failed to create kubelet: misconfiguration: kubelet cgroup driver: “systemd” is different from docker cgroup driver: “

I have a post on addressing this with installing Openshift Origin. Follow the same steps here to reconfigure.

Kubeadm init, add the networking overlay (I installed Weave), and I think we’re up:

kubectl get nodes

[kev@unknown000C2954104F /]$ kubectl get nodes

NAME                  STATUS    ROLES     AGE       VERSION

unknown000c2954104f   Ready     master    31m       v1.9.0

Checking the pods though, the dns pod was stuck restarting and not coming up clean. I found this ticket for exactly the issue I was seeing. The resolution was to switch back to cgroupfs for both Docker and Kubernetes.

I did this by backing out the addition previously made to

/usr/lib/systemd/system/docker.service, and then adding a new file, 

/etc/docker/daemon.json, and pasting in:

{
"exec-opts": ["native.cgroupdriver=cgroupfs"]
}

Next, edit /etc/systemd/system/kubelet.service.d/10-kubeadm.conf and replace:

Environment="KUBELET_CGROUP_ARGS=--cgroup-driver=systemd"

with

Environment="KUBELET_CGROUP_ARGS=--cgroup-driver=cgroupfs"

Restart Docker:

sudo systemctl daemon-reload

sudo systemctl restart docker

Check we’re back on cgroupfs:

sudo docker info |grep -i cgroup

Cgroup Driver: cgroupfs

And now check the nodes:

$ kubectl get nodes

NAME                  STATUS    ROLES     AGE       VERSION

unknown000c2954104f   Ready     master    1d        v1.9.0

unknown000c29c2b640   Ready     <none>    1d        v1.9.0

And the pods:

$ kubectl get pods –all-namespaces

NAMESPACE     NAME                                          READY     STATUS    RESTARTS   AGE

kube-system   etcd-unknown000c2954104f                      1/1       Running   3          1d

kube-system   kube-apiserver-unknown000c2954104f            1/1       Running   4          1d

kube-system   kube-controller-manager-unknown000c2954104f   1/1       Running   3          1d

kube-system   kube-dns-6f4fd4bdf-jtk82                      3/3       Running   123        1d

kube-system   kube-proxy-9b6tr                              1/1       Running   3          1d

kube-system   kube-proxy-n5tkx                              1/1       Running   1          1d

kube-system   kube-scheduler-unknown000c2954104f            1/1       Running   3          1d

kube-system   weave-net-f29k9                               2/2       Running   9          1d

kube-system   weave-net-lljgc

 

Now we’re looking good! Next up, lets deploy something and check we’re looking good!

Allowing user on CentOS to run docker command without sudo

Out of the box for a Docker install on CentOS 7, you have to sudo the docker command to interact with Docker. Per the post-install steps here, create a docker group and add your user to that group:

sudo groupadd docker

sudo usermod -aG docker youruser

Logging off and back on again, you should now be able to run the docker command without sudo.

On CentOS 7 this still didn’t work for me. Following this post, it appeared that docker.sock was owned by root and in the group root:

$ ls -l /var/run/docker.sock

srw-rw---- 1 root root 0 Oct 21 15:42 /var/run/docker.sock

Changing the group ownership:

$ sudo chown root:docker /var/run/docker.sock

$ ls -l /var/run/docker.sock

srw-rw---- 1 root docker 0 Oct 21 15:42 /var/run/docker.sock

After logging back on, now can run docker commands without sudo.

Deploying kubernetes Dashboard to a kubeadm created cluster

Per installations steps here, to deploy the dashboard:

kubectl --kubeconfig ./admin.conf apply -f https://raw.githubusercontent.com/kubernetes/dashboard/master/src/deploy/recommended/kubernetes-dashboard.yaml

The start the local proxy:

./kubectl --kubeconfig ./admin.conf proxy

Accessing via http://localhost:8001/ui, gives this error:

{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {
    
  },
  "status": "Failure",
  "message": "no endpoints available for service \"kubernetes-dashboard\"",
  "reason": "ServiceUnavailable",
  "code": 503
}

Checking what’s currently running with:

./kubectl --kubeconfig ./admin.conf get pods --all-namespaces

Looks like the dashboard app is not happy:

kube-system   kubernetes-dashboard-747c4f7cf-p8blw          0/1       CrashLoopBackOff   22         1h

Checking the logs for the dashboard:

./kubectl --kubeconfig ./admin.conf logs kubernetes-dashboard-747c4f7cf-p8blw --namespace=kube-system

2017/10/19 03:35:51 Error while initializing connection to Kubernetes apiserver. This most likely means that the cluster is misconfigured (e.g., it has invalid apiserver certificates or service accounts configuration) or the –apiserver-host param points to a server that does not exist. Reason: Get https://10.96.0.1:443/version: dial tcp 10.96.0.1:443: getsockopt: no route to host

OK.

I setup my master node using the flannel overlay. I don’t know if this makes any difference or not, but I noticed this article using kubeadm used Weave Net instead. Not knowing how to move forward (and after browsing many posts and tickets on issues with kubeadm with Dashboard), knowing at least that kubadm + Weave Net works for installing dashboard, so I tried this approach instead.

After re-initializing and the adding weave-net, my pods are all started:

$ kubectl get pods –all-namespaces

NAMESPACE     NAME                                          READY     STATUS    RESTARTS   AGE

kube-system   etcd-unknown000c2960f639                      1/1       Running   0          11m

kube-system   kube-apiserver-unknown000c2960f639            1/1       Running   0          11m

kube-system   kube-controller-manager-unknown000c2960f639   1/1       Running   0          11m

kube-system   kube-dns-545bc4bfd4-nhrw7                     3/3       Running   0          12m

kube-system   kube-proxy-cgn45                              1/1       Running   0          4m

kube-system   kube-proxy-dh6jm                              1/1       Running   0          12m

kube-system   kube-proxy-spxm5                              1/1       Running   0          5m

kube-system   kube-scheduler-unknown000c2960f639            1/1       Running   0          11m

kube-system   weave-net-gs8nh                               2/2       Running   0          5m

kube-system   weave-net-jkkql                               2/2       Running   0          4m

kube-system   weave-net-xb4hx                               2/2       Running   0          10m

 

Trying to add the dashboard once more:

kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/master/src/deploy/recommended/kubernetes-dashboard.yaml

… and, o. m. g. :

$ kubectl get pods –all-namespaces

NAMESPACE     NAME                                          READY     STATUS    RESTARTS   AGE

kube-system   etcd-unknown000c2960f639                      1/1       Running   0          37m

kube-system   kube-apiserver-unknown000c2960f639            1/1       Running   0          37m

kube-system   kube-controller-manager-unknown000c2960f639   1/1       Running   0          37m

kube-system   kube-dns-545bc4bfd4-nhrw7                     3/3       Running   0          38m

kube-system   kube-proxy-cgn45                              1/1       Running   0          30m

kube-system   kube-proxy-dh6jm                              1/1       Running   0          38m

kube-system   kube-proxy-spxm5                              1/1       Running   0          31m

kube-system   kube-scheduler-unknown000c2960f639            1/1       Running   0          37m

kube-system   kubernetes-dashboard-747c4f7cf-jgmgt          1/1       Running   0          4m

kube-system   weave-net-gs8nh                               2/2       Running   0          31m

kube-system   weave-net-jkkql                               2/2       Running   0          30m

kube-system   weave-net-xb4hx                               2/2       Running   0          36m

Starting kubectl proxy and hitting localhost:8001/ui now gives me:

Error: 'malformed HTTP response "\x15\x03\x01\x00\x02\x02"'
Trying to reach: 'http://10.32.0.3:8443/'

Reading here, trying the master node directly:

https://192.168.1.80:6443/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/

gives a different error:

{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {
    
  },
  "status": "Failure",
  "message": "services \"https:kubernetes-dashboard:\" is forbidden: User \"system:anonymous\" cannot get services/proxy in the namespace \"kube-system\"",
  "reason": "Forbidden",
  "details": {
    "name": "https:kubernetes-dashboard:",
    "kind": "services"
  },
  "code": 403
}

… but reading further ahead, it seems accessing via the /ui url is not correctly working, you need to access via the url in the docs here,  which says the correct url is:

http://localhost:8001/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/

and now I get an authentication page:

Time to read ahead on the authentication approaches.

List available tokens with:

kubectl -n kube-system get secret

Using the same token as per the docs (although at this point I’ve honestly no idea what the difference in permissions is for each of the different tokens):

./kubectl --kubeconfig admin.conf -n kube-system describe secret replicaset-controller-token-7tzd5

And then pasting the token value into the authentication dialog gets me logged on! There’s some errors about this token not having access to some features, but at this point I’ve just glad I’ve managed to get this deployed and working!

If you’re intested in the specific versions I’m using, this is deployed to CentOS 7, and kubernetes version:

$ kubectl version

Client Version: version.Info{Major:”1″, Minor:”8″, GitVersion:”v1.8.1″, GitCommit:”f38e43b221d08850172a9a4ea785a86a3ffa3b3a”, GitTreeState:”clean”, BuildDate:”2017-10-11T23:27:35Z”, GoVersion:”go1.8.3″, Compiler:”gc”, Platform:”linux/amd64″}

Server Version: version.Info{Major:”1″, Minor:”8″, GitVersion:”v1.8.1″, GitCommit:”f38e43b221d08850172a9a4ea785a86a3ffa3b3a”, GitTreeState:”clean”, BuildDate:”2017-10-11T23:16:41Z”, GoVersion:”go1.8.3″, Compiler:”gc”, Platform:”linux/amd64″}