环境

OS:Centos7.1.1503
Kernel: 3.10.0-229.el7.x86_64
Kubernetes: 1.7.0
Docker: 17.06.0-ce
Etcd: 3.1.9
Calico: 2.3

k8s-master: walker-1.novalocal(172.16.6.47)
k8s-node: walker-2.novalocal(172.16.6.249)

preflight

准备yum 源

$ mkdir /etc/yum.repos.d/backup && mv /etc/yum.repos.d/*.repo /etc/yum.repos.d/backup
# 使用阿里源
$ wget http://mirrors.aliyun.com/repo/Centos-7.repo
$ wget https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
$ cat > /etc/yum.repos.d/kubernetes.repo << EOF
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/
enabled=1
gpgcheck=0
EOF
$ yum clean all
$ yum repolist

安装kubeadm

在 master 和 node 上执行:

$ yum install kubeadm kubelet docker-ce -y

安装好 kubelet 之后,需要比对docker 和 kubelet 使用的 CGroup Driver.

[root@walker-1 ~]# docker info | grep -i cgroup
Cgroup Driver: cgroupfs
[root@walker-1 ~]# grep -i cgroup /etc/systemd/system/kubelet.service.d/10-kubeadm.conf 
Environment="KUBELET_CGROUP_ARGS=--cgroup-driver=cgroupfs"
ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_SYSTEM_PODS_ARGS $KUBELET_NETWORK_ARGS $KUBELET_DNS_ARGS $KUBELET_AUTHZ_ARGS $KUBELET_CADVISOR_ARGS $KUBELET_CGROUP_ARGS $KUBELET_EXTRA_ARGS

默认docker-ce 17.06 为 cgroupfs, kubelet 为 systemd。两者不一致会导致kubelet无法调用docker, 因此需要修改。

下载镜像

由于gfw的原因,国内访问镜像资源受限。可通过阿里docker源来查找并下载相关镜像。具体的镜像版本可在各组件的 yaml 文中找到
所需的镜像有:

# k8s 
gcr.io/google_containers/kube-proxy-amd64:v1.7.0
gcr.io/google_containers/kube-apiserver-amd64:v1.7.0
gcr.io/google_containers/kube-controller-manager-amd64:v1.7.0
gcr.io/google_containers/kube-scheduler-amd64:v1.7.0
gcr.io/google_containers/pause-amd64:3.0

# dns 
gcr.io/google_containers/k8s-dns-sidecar-amd64:1.14.4
gcr.io/google_containers/k8s-dns-kube-dns-amd64:1.14.4
gcr.io/google_containers/k8s-dns-dnsmasq-nanny-amd64:1.14.4

# calico
quay.io/calico/node:v1.3.0
quay.io/calico/cni:v1.9.1
quay.io/calico/kube-policy-controller:v0.6.0

# dashboar

# heapster

准备 calico

在 master 上执行:

$ test -d /etc/kubernetes/manifests/ || mkdir -p /etc/kubernetes/manifests/
$ cd /etc/kubernetes/manifests
$ wget http://docs.projectcalico.org/v2.3/getting-started/kubernetes/installation/hosted/kubeadm/1.6/calico.yaml
$ curl -O -L https://github.com/projectcalico/calicoctl/releases/download/v1.3.0/calicoctl
$ chmod +x calicoctl && cp calicoctl /usr/bin -v 
$ echo "export ETCD_ENDPOINTS=http://walker-1:2379" >> /etc/profile && source /etc/profile

如果想使用外部 etcd, 可将calico.yaml 中有关 etcd 的 DaemonSetService 删除, 同时写上外部 etcd 的访问地址。
kubeadm 初始化时,主动构建 calico 网络。如果缺失该步骤,会造成 kube-dns 无法获取 ip, 从而创建失败:

"message":"cannot join network of a non running container"

kubeadm 初始化

在 master 上执行:

[root@walker-1 k8s_imgs]# kubeadm init --skip-preflight-checks 
[kubeadm] WARNING: kubeadm is in beta, please do not use it for production clusters.
[init] Using Kubernetes version: v1.7.1
[init] Using Authorization modes: [Node RBAC]
[preflight] Skipping pre-flight checks
[certificates] Using the existing CA certificate and key.
[certificates] Using the existing API Server certificate and key.
[certificates] Using the existing API Server kubelet client certificate and key.
[certificates] Using the existing service account token signing key.
[certificates] Using the existing front-proxy CA certificate and key.
[certificates] Using the existing front-proxy client certificate and key.
[certificates] Valid certificates and keys now exist in "/etc/kubernetes/pki"
[kubeconfig] Using existing up-to-date KubeConfig file: "/etc/kubernetes/scheduler.conf"
[kubeconfig] Using existing up-to-date KubeConfig file: "/etc/kubernetes/admin.conf"
[kubeconfig] Using existing up-to-date KubeConfig file: "/etc/kubernetes/kubelet.conf"
[kubeconfig] Using existing up-to-date KubeConfig file: "/etc/kubernetes/controller-manager.conf"
[apiclient] Created API client, waiting for the control plane to become ready
[apiclient] All control plane components are healthy after 434.505352 seconds
[token] Using token: f0945d.dbfb07f1d8952edf
[apiconfig] Created RBAC rules
[addons] Applied essential addon: kube-proxy
[addons] Applied essential addon: kube-dns

Your Kubernetes master has initialized successfully!

To start using your cluster, you need to run (as a regular user):

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  http://kubernetes.io/docs/admin/addons/

You can now join any number of machines by running the following on each node
as root:

  kubeadm join --token f0945d.dbfb07f1d8952edf 172.16.6.47:6443

更多有关 kubeadm 参数可参考:
https://kubernetes.io/docs/admin/kubeadm/

通过 kubectl get po --namespace=kube-system 来检测 pod 的启动情况。

[root@walker-1 kubernetes]# kubectl get pod --namespace=kube-system -o wide
NAME                                         READY     STATUS    RESTARTS   AGE       IP                NODE
calico-policy-controller-3271399580-hxp2d    1/1       Running   12         3d        172.16.6.47       walker-1.novalocal
kube-apiserver-walker-1.novalocal            1/1       Running   1          3h        172.16.6.47       walker-1.novalocal
kube-controller-manager-walker-1.novalocal   1/1       Running   0          3h        172.16.6.47       walker-1.novalocal
kube-dns-2425271678-2xwq4                    3/3       Running   31         2d        192.168.187.206   walker-1.novalocal
kube-proxy-m7r85                             1/1       Running   0          2h        172.16.6.47       walker-1.novalocal
kube-scheduler-walker-1.novalocal            1/1       Running   0          3h        172.16.6.47       walker-1.novalocal

注: 为了能顺利使用 kubectl 需要为其配置环境变量:

$ echo "export  KUBECONFIG=/etc/kubernetes/admin.conf" >> /etc/profile && source /etc/profile

注: 为了让master上能调度pod,执行:

$ kubectl taint nodes --all node-role.kubernetes.io/master-

加入 node 节点

先将 master 上的 kube-proxy, calico 等镜像 copy 至 node。

在 node 上执行:

[root@walker-2 k8s_imgs]# kubeadm join --token f0945d.dbfb07f1d8952edf 172.16.6.47:6443
[kubeadm] WARNING: kubeadm is in beta, please do not use it for production clusters.
[preflight] Running pre-flight checks
[preflight] WARNING: docker version is greater than the most recently validated version. Docker version: 17.06.0-ce. Max validated version: 1.12
[preflight] WARNING: hostname "" could not be reached
[preflight] WARNING: hostname "" lookup : no such host
[preflight] Some fatal errors occurred:
    hostname "" a DNS-1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')
    /proc/sys/net/bridge/bridge-nf-call-iptables contents are not set to 1
[preflight] If you know what you are doing, you can skip pre-flight checks with `--skip-preflight-checks`
[root@walker-2 k8s_imgs]# kubeadm join --token f0945d.dbfb07f1d8952edf 172.16.6.47:6443 --skip-preflight-checks
[kubeadm] WARNING: kubeadm is in beta, please do not use it for production clusters.
[preflight] Skipping pre-flight checks
[discovery] Trying to connect to API Server "172.16.6.47:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://172.16.6.47:6443"
[discovery] Cluster info signature and contents are valid, will use API Server "https://172.16.6.47:6443"
[discovery] Successfully established connection with API Server "172.16.6.47:6443"
[bootstrap] Detected server version: v1.7.0
[bootstrap] The server supports the Certificates API (certificates.k8s.io/v1beta1)
[csr] Created API client to obtain unique certificate for this node, generating keys and certificate signing request
[csr] Received signed certificate from the API server, generating KubeConfig...
[kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/kubelet.conf"

Node join complete:
* Certificate signing request sent to master and response
  received.
* Kubelet informed of new secure connection details.

Run 'kubectl get nodes' on the master to see this machine join.

为了使 kubectl 能正常使用,需要将 master 上的 /etc/kubernetes/admin.conf 拷贝至 node 上:

$ scp walker-1:/etc/kubernetes/admin.conf /etc/kubernetes
$ echo "export  KUBECONFIG=/etc/kubernetes/admin.conf" >> /etc/profile && source /etc/profile

安装完成之后,可用使用 kubectl get po 来查看 pod:

[root@walker-2 walker]# 
[root@walker-2 walker]# kubectl get nodes
NAME                 STATUS    AGE       VERSION
walker-1.novalocal   Ready     3d        v1.7.0
walker-2.novalocal   Ready     41m       v1.7.1
[root@walker-2 walker]# kubectl get po -o wide
NAME                                READY     STATUS    RESTARTS   AGE       IP                NODE
nginx-deployment-2059996365-06z60   1/1       Running   2          2d        192.168.187.207   walker-1.novalocal
nginx-deployment-2059996365-lvdcp   1/1       Running   0          3m        192.168.135.64    walker-2.novalocal

为校验网络连通状况,可在node上使用curl 来访问。eg: curl 192.168.187.207

安装完成之后,会看到 node 节点上的 /var/log/message 中一直刷

eviction manager: no observation found for eviction signal allocatableNodeFs.available

不知何解。github 上也有相关 issue, 持续观望中。

https://github.com/kubernetes/kubernetes/issues/48703

Troubleshooting

docker 17.06.0-ce 版本中,当为 pod 设置了内存配额时,可能会由于内核老旧而导致 pod 启动报错。相关 issue 信息如下所示:

https://stackoverflow.com/questions/45056968/hyperledger-fabric-1-0-on-centos-error-endorsing-chaincode

https://github.com/moby/moby/issues/34046

https://github.com/docker/for-linux/issues/43

kube-dns 中默认设有内存配额,3.10.0-229.el7.x86_64 内核版本会有问题。通过 elrepo 升级至 4.4.76-1.el7.elrepo.x86_64 后解决。

Logo

开源、云原生的融合云平台

更多推荐