Setting up a Kubernetes cluster across 2 virtualized CentOS nodes

tl:dr – I tried installing Kubernetes from scratch on Fedora Atomic hosts, but couldn’t get it working. I captured the steps I went through up until the point where I got stuck, but thinking there has to be an easier way, I found kubeadm and successfully used that to get a cluster up and running on CentOS instead.

If you’re interested in the steps for my first failed attempt then they’re below, otherwise if you’re interested in the steps (and issues and fixes) to get kubeadm setup on CentOS 7, skip down to Attempt 2.

(Failed )Attempt 1: Setting up Kubernetes from scratch on Fedora Atomic

I following the instructions here, but instead of 4 Atomic hosts I created 2 Fedora Atomic VMs on Proxmox. On the first of the VMs, I started a local Docker Registry container:

sudo docker create -p 5000:5000 \
-v /var/lib/local-registry:/var/lib/registry \
-e REGISTRY_STORAGE_FILESYSTEM_ROOTDIRECTORY=/var/lib/registry \
-e REGISTRY_PROXY_REMOTEURL=https://registry-1.docker.io \
--name=local-registry registry:2

Next, I followed the steps to change the SELinux context on the directory that docker created for our persistence volume, and created a systemd unit file to configure a systemd service to start the registry at startup.

Next up, edit /etc/etcd/etcd.conf  file for the ports to listen on:

ETCD_LISTEN_CLIENT_URLS="http://0.0.0.0:2379,http://0.0.0.0:4001"
ETCD_ADVERTISE_CLIENT_URLS="http://0.0.0.0:2379,http://0.0.0.0:4001"

The first line replaced this original line:

ETCD_LISTEN_CLIENT_URLS="http://localhost:2379"

and the second replaced:

ETCD_ADVERTISE_CLIENT_URLS="http://localhost:2379"

After following the steps to generate the keys, I had to create the /etc/kubernetes/certs, and then copied the generated files below ./pki/ but apart from ca.crt, the other two files were named server.crt and server.key, so I renamed them to match the instructions when copying to the destination:

sudo cp pki/ca.crt /etc/kubernetes/certs
sudo cp pki/issued/server.crt /etc/kubernetes/certs/kubernetes-master.crt
sudo cp pki/private/server.key /etc/kubernetes/certs/kubernetes-master.key

I followed the steps up to the point of starting the kuberetes service, and then got an error:

$ sudo systemctl start etcd kube-apiserver kube-controller-manager kube-scheduler

Job for kube-apiserver.service failed because the control process exited with error code.

See "systemctl  status kube-apiserver.service" and "journalctl  -xe" for details.

Using journalctl -xe to take a look at the logs, I’ve got a * lot * of :

reflector.go:201] k8s.io/kubernetes/pkg/client/informers/informers_generated/externalversions//factory.go:70: Failed to list *v1.ReplicationController: Get http://192.168.1.76:8080/api/v1/replicationcontrollers?resourceVersion=0: dial tcp 192.168.1.76:8080: getsockopt: connection refused

This implies the api server is not running.

This is the point where I started reading around for help, but started to ask after I’d walked through all the manual steps to get this far and hadn’t got it working yet if there’s a better/quicker way to setup Kubernetes. Turns out there is, kubeadm

Successful attempt 2: Setting up a Kubernetes cluster using kubeadm on CentOS 7

Compared to the manual installation and configuration steps, kubeadm looks very attractive as it does a lot of the work for you.

Following the instructions here, the first issue I ran into was this step and this didn’t work for me on CentOS 7:

sudo sysctl net.bridge.bridge-nf-call-iptables=1

sysctl: cannot stat /proc/sys/net/bridge/bridge-nf-call-iptables: No such file or directory

I skipped this (not knowing if there’s something comparable for CentOS) and continued on. Next, install Docker:

yum install -y docker

Enable at boot:

systemctl enable docker && systemctl start docker

Install kubelet, kubeadm and kubectl (this section straight from the docs):

cat <<EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg
        https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
EOF
setenforce 0
yum install -y kubelet kubeadm kubectl
systemctl enable kubelet && systemctl start kubelet

Next, use kubeadm to create a cluster. Reading forward in the instructions, to initialize the networking overlay with for example flannel, you also need to pass options to init, so:

kubeadm init --pod-network-cidr=10.244.0.0/16

This gave me some errors about kubelet not yet running, so I missed the step above to enable and start kubelet:

[preflight] Running pre-flight checks
[preflight] WARNING: kubelet service is not enabled, please run 'systemctl enable kubelet.service'
[preflight] WARNING: firewalld is active, please ensure ports [6443 10250] are open or your cluster may not function correctly
[preflight] WARNING: Running with swap on is not supported. Please disable swap or set kubelet's --fail-swap-on flag to false.

One more try:

systemctl enable kubelet && systemctl start kubelet

then again:

kubeadm init --pod-network-cidr=10.244.0.0/16

It seems you can’t run init more than once:

[preflight] Some fatal errors occurred:
 /etc/kubernetes/manifests is not empty
 /var/lib/kubelet is not empty
[preflight] If you know what you are doing, you can skip pre-flight checks with `--skip-preflight-checks`

Per the ‘tear down’ instructions,

kubeadm reset

Then once more time:

kubeadm init --pod-network-cidr=10.244.0.0/16

Next error:

[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp [::1]:10255: getsockopt: connection refused.

Checking ‘systemctl status kubelet’ shows it failed on startup because swap was still enabled:

kubelet[14829]: error: failed to run Kubelet: Running with swap on is not supported, please disable swap! or set --fail-swap

To disable swap per answers to this question,

swapoff -a

And then I edited /etc/fstab to remove the /dev/mapper/centos-swap line, and then rebooted.

Once more:

kubeadm reset
kubeadm init --pod-network-cidr=10.244.0.0/16

To address the error about opening ports:

[preflight] WARNING: firewalld is active, please ensure ports [6443 10250] are open or your cluster may not function correctly

I opened up the ports with new rules:

sudo firewall-cmd --zone=public --add-port=6443/tcp --permanent
sudo firewall-cmd --zone=public --add-port=10250/tcp --permanent

At this point the cluster starts initializing and I get this message for a several minutes while it’s downloading/initializing:

[init] Waiting for the kubelet to boot up the control plane as Static Pods from directory "/etc/kubernetes/manifests"
[init] This often takes around a minute; or longer if the control plane images have to be pulled.

I waited for several minutes, but it didn’t seem to do anything. Looking in ‘journalctl -xeu kubelet’ there’s many ‘connection refused’ errors repeating over and over:

192.168.1.79:6443/api/v1/nodes: dial tcp 192.168.1.79:6443: getsockopt: connection refused
192.168.1.79:6443/api/v1/services?resourceVersion=0: dial tcp 192.168.1.79:6443: getsockopt: connection refused
192.168.1.79:6443/api/v1/nodes?fieldSelector=metadata.name%3Dunknown0a7dd195e7cb&resourceVersion=0: dial tcp 192.168.1.79:6443: getsockopt: connection refused
192.168.1.79:6443/api/v1/pods?fieldSelector=spec.nodeName%3Dunknown0a7dd195e7cb&resourceVersion=0: dial tcp 192.168.1.79:6443: getsockopt: connection refused
n0a7dd195e7cb.14ebbc48a9463585: dial tcp 192.168.1.79:6443: getsockopt: connection refused' (may retry after sleeping)

Searching for anything similar I came up with these which have similar errors and symptoms:

https://github.com/kubernetes/kubernetes/issues/33729

https://github.com/kubernetes/kubernetes/issues/43815

https://github.com/kubernetes/kubeadm/issues/353

https://github.com/kubernetes/kubernetes/issues/45787

The issue that helped for me though was this one, where in one of the steps to recreate there is a comment:

# edit /etc/selinux/config and set SELINUX=disabled

The kubeadm install steps do state:

“Disabling SELinux by running setenforce 0 is required to allow containers to access the host filesystem”

but in the steps it only mentions runningsetenforce 0

which I’m not sure is the same as editing /etc/selinux/config. I edited the file, set SELINUX=disabled, rebooted, fired up kubeadm init again, and this time we’re in luck!

[init] Waiting for the kubelet to boot up the contr
ol plane as Static Pods from directory "/etc/kubernetes/manifests"

[init] This often takes around a minute; or longer if the control plane images have to be pulled.

[apiclient] All control plane components are healthy after 43.002301 seconds

[uploadconfig] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace

[markmaster] Will mark node unknown0a7dd195e7cb as master by adding a label and a taint

[markmaster] Master unknown0a7dd195e7cb tainted and labelled with key/value: node-role.kubernetes.io/master=""

[bootstraptoken] Using token: xyz

[bootstraptoken] Configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials

[bootstraptoken] Configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token

[bootstraptoken] Creating the "cluster-info" ConfigMap in the "kube-public" namespace

[addons] Applied essential addon: kube-dns

[addons] Applied essential addon: kube-proxy

Your Kubernetes master has initialized successfully!

To start using your cluster, you need to run (as a regular user):

  mkdir -p $HOME/.kube

  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config

  sudo chown $(id -u):$(id -g) $HOME/.kube/config

You should now deploy a pod network to the cluster.

Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:

  http://kubernetes.io/docs/admin/addons/

You can now join any number of machines by running the following on each node

as root:

  kubeadm join --token xyz --discovery-token-ca-cert-hash sha256:abc

Success!

Following the next steps in the instructions, I set up the config for my regular user:

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

And then continuing the steps to add the networking config, for flannel:

kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/v0.8.0/Documentation/kube-flannel.yml
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/v0.8.0/Documentation/kube-flannel-rbac.yml

Checking the running pods to confirm we’re up and running: (two ‘-‘ in the param)

[kev@unknown0A7DD195E7CB ~]$ kubectl get pods –all-namespaces

NAMESPACE     NAME                                          READY     STATUS             RESTARTS   AGE

kube-system   etcd-unknown0a7dd195e7cb                      1/1       Running            0          2h

kube-system   kube-apiserver-unknown0a7dd195e7cb            1/1       Running            0          2h

kube-system   kube-controller-manager-unknown0a7dd195e7cb   1/1       Running            0          2h

kube-system   kube-dns-545bc4bfd4-wfs5s                     3/3       Running            0          2h

kube-system   kube-flannel-ds-qtfwh                         1/2       CrashLoopBackOff   4          3m

kube-system   kube-proxy-ggvqx                              1/1       Running            0          2h

kube-system   kube-scheduler-unknown0a7dd195e7cb            1/1       Running            0          2h

Now we’re looking good!

Last step adding an additional node and joining the cluster. Repeating all the steps up until but instead of kubeadm init, run the kubeadm join command passing the token and has values.

Back on the master node, running:

[kev@unknown0A7DD195E7CB ~]$ kubectl get nodes

NAME                  STATUS     ROLES     AGE       VERSION
unknown0a7dd195e7cb   Ready      master    3h        v1.8.0
unknown121e8862fff9   NotReady   <none>    48s       v1.8.0

A few more seconds later:

[kev@unknown0A7DD195E7CB ~]$ kubectl get nodes

NAME                  STATUS    ROLES     AGE       VERSION
unknown0a7dd195e7cb   Ready     master    3h        v1.8.0
unknown121e8862fff9   Ready     <none>    51s       v1.8.0

A 2 node kubernetes cluster, one master and one worker, ready for running some containers!