Kubernetes 集群安装

环境准备To disable SELinux, run the following command:setenforce 0But this just disables it temporarily (until the next reboot). To disable it permanently, edit the/etc/selinux/config file and chang...

zhixingheyi_tian

3158人浏览 · 2019-04-30 18:21:34

zhixingheyi_tian · 2019-04-30 18:21:34 发布

环境准备

To disable SELinux, run the following command:

 setenforce 0

But this just disables it temporarily (until the next reboot). To disable it permanently, edit the
/etc/selinux/config file and change the SELINUX=enforcing line to SELINUX=permissive.

DISABLING THE FIREWALL

# systemctl disable firewalld && systemctl stop firewalld

ADDING THE KUBERNETES YUM REPO

/etc/yum.repos.d/kubernetes.repo

[kubernetes]
name=Kubernetes
baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg

INSTALLING DOCKER, KUBELET, KUBEADM, KUBECTL AND KUBERNETES-CNI

yum install -y  kubelet kubeadm kubectl kubernetes-cni

manually enable the docker and the kubelet services

# systemctl enable docker && systemctl restart docker
# systemctl enable kubelet && systemctl restart kubelet
Created symlink from /etc/systemd/system/multi-user.target.wants/kubelet.service to /usr/lib/systemd/system/kubelet.service.

ENABLING THE NET.BRIDGE.BRIDGE-NF-CALL-IPTABLES KERNEL OPTION

# sysctl -w net.bridge.bridge-nf-call-iptables=1

But since this setting isn’t preserved across restarts, we need to make it permanent by running adding a file to /etc/sysctl.d/ like this:

# echo "net.bridge.bridge-nf-call-iptables=1" > /etc/sysctl.d/k8s.conf

禁用交换分区

swapoff -a && sed -i '/ swap / s/^/#/' /etc/fstab

主节点安装

images偷梁换柱

列出需要的images

#kubeadm config images list

k8s.gcr.io/kube-apiserver:v1.14.1
k8s.gcr.io/kube-controller-manager:v1.14.1
k8s.gcr.io/kube-scheduler:v1.14.1
k8s.gcr.io/kube-proxy:v1.14.1
k8s.gcr.io/pause:3.1
k8s.gcr.io/etcd:3.3.10
k8s.gcr.io/coredns:1.3.1

因为k8s.gcr.io不能下载镜像，所以要换一个地址

在这里插入代码片

docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/kube-apiserver:v1.14.1
docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/kube-controller-manager:v1.14.1
docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/kube-scheduler:v1.14.1
docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/kube-proxy:v1.14.1
docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.1
docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/etcd:3.3.10
docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/coredns:1.3.1

重新打上 k8s.gcr.io 的tag

docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/kube-apiserver:v1.14.1 k8s.gcr.io/kube-apiserver:v1.14.1
docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/kube-controller-manager:v1.14.1 k8s.gcr.io/kube-controller-manager:v1.14.1
docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/kube-scheduler:v1.14.1 k8s.gcr.io/kube-scheduler:v1.14.1
docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/kube-proxy:v1.14.1 k8s.gcr.io/kube-proxy:v1.14.1
docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.1 k8s.gcr.io/pause:3.1
docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/etcd:3.3.10 k8s.gcr.io/etcd:3.3.10
docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/coredns:1.3.1 k8s.gcr.io/coredns:1.3.1

RUNNING KUBEADM INIT TO INITIALIZE THE MASTER

 kubeadm init

倘若出现以下错误

Unfortunately, an error has occurred:
	timed out waiting for the condition

This error is likely caused by:
	- The kubelet is not running
	- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
	- 'systemctl status kubelet'
	- 'journalctl -xeu kubelet'

Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI, e.g. docker.
Here is one example how you may list all Kubernetes containers running in docker:
	- 'docker ps -a | grep kube | grep -v pause'
	Once you have found the failing container, you can inspect its logs with:
	- 'docker logs CONTAINERID'
error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster

添加NO_PROXY 到环境变量

export NO_PROXY=localhost,127.0.0.1,10.96.0.0/12,10.0.0.0/12,192.168.99.0/24,192.168.39.0/24,*.xx.com,10.239.47.*,*.sh.xx.com

然后 reset

# kubeadm reset
[reset] Reading configuration from the cluster...
[reset] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[reset] WARNING: Changes made to this host by 'kubeadm init' or 'kubeadm join' will be reverted.
[reset] Are you sure you want to proceed? [y/N]: y
[preflight] Running pre-flight checks
[reset] Removing info for node "sr531" from the ConfigMap "kubeadm-config" in the "kube-system" Namespace
W0430 17:32:18.126981   69994 reset.go:158] [reset] failed to remove etcd member: error syncing endpoints with etc: etcdclient: no available endpoints
.Please manually remove this etcd member using etcdctl
[reset] Stopping the kubelet service
[reset] unmounting mounted directories in "/var/lib/kubelet"
[reset] Deleting contents of stateful directories: [/var/lib/etcd /var/lib/kubelet /etc/cni/net.d /var/lib/dockershim /var/run/kubernetes]
[reset] Deleting contents of config directories: [/etc/kubernetes/manifests /etc/kubernetes/pki]
[reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf]

The reset process does not reset or clean up iptables rules or IPVS tables.
If you wish to reset iptables, you must do so manually.
For example:
iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X

If your cluster was setup to utilize IPVS, run ipvsadm --clear (or similar)
to reset your system's IPVS tables.

再次init

# kubeadm init
I0430 17:32:34.844544   70410 version.go:96] could not fetch a Kubernetes version from the internet: unable to get URL "https://dl.k8s.io/release/stable-1.txt": Get https://dl.k8s.io/release/stable-1.txt: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
I0430 17:32:34.844648   70410 version.go:97] falling back to the local client version: v1.14.1
[init] Using Kubernetes version: v1.14.1
[preflight] Running pre-flight checks
	[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Activating the kubelet service
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [sr531 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 10.0.2.131]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [sr531 localhost] and IPs [10.0.2.131 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [sr531 localhost] and IPs [10.0.2.131 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[apiclient] All control plane components are healthy after 17.503804 seconds
[upload-config] storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.14" in namespace kube-system with the configuration for the kubelets in the cluster
[upload-certs] Skipping phase. Please see --experimental-upload-certs
[mark-control-plane] Marking the node sr531 as control-plane by adding the label "node-role.kubernetes.io/master=''"
[mark-control-plane] Marking the node sr531 as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]
[bootstrap-token] Using token: jn1ten.2rtj7xwusw6j1g79
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] creating the "cluster-info" ConfigMap in the "kube-public" namespace
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join 10.0.2.131:6443 --token jn1ten.2rtj7xwusw6j1g79 \
    --discovery-token-ca-cert-hash sha256:69949981f2fea5b5bba23c68382378834938770c39ebb3c4016d10d2d99db6c9

主节点配置成功

RUNNING KUBECTL ON THE MASTER

setting the KUBECONFIG environment variable

# export KUBECONFIG=/etc/kubernetes/admin.conf

LISTING THE PODS

# kubectl get po -n kube-system
NAME                            READY   STATUS    RESTARTS   AGE
coredns-fb8b8dccf-2jj46         0/1     Pending   0          26m
coredns-fb8b8dccf-p55w6         0/1     Pending   0          26m
etcd-sr531                      1/1     Running   0          25m
kube-apiserver-sr531            1/1     Running   0          25m
kube-controller-manager-sr531   1/1     Running   0          25m
kube-proxy-kh57b                1/1     Running   0          26m
kube-proxy-mb8cr                0/1     Evicted   0          12m
kube-scheduler-sr531            1/1     Running   0          25m

worker 节点安装

主节点 kubeadm init 之前的安装环节，worker 节点都要有(image 偷梁换柱，worker节点不需要)。

在主节点检查 token 生命周期， Time to Live 只有24小时

kubeadm token list
TOKEN                     TTL       EXPIRES                     USAGES                   DESCRIPTION                                                EXTRA GROUPS
jn1ten.2rtj7xwusw6j1g79   23h       2019-05-01T17:32:59+08:00   authentication,signing   The default bootstrap token generated by 'kubeadm init'.   system:bootstrappers:kubeadm:default-node-token

如果 token 失效，需要重新生成

# kubeadm token create
7mud12.hceqth71zbn9jp9v

# openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^.* //'
69949981f2fea5b5bba23c68382378834938770c39ebb3c4016d10d2d99db6c9

在从节点执行，

kubeadm join 10.0.2.131:6443 --token 7mud12.hceqth71zbn9jp9v \
    --discovery-token-ca-cert-hash sha256:69949981f2fea5b5bba23c68382378834938770c39ebb3c4016d10d2d99db6c9

这个command 来自于主节点的 kubeadm init 的输出

如果kubeadm join 出现阻塞情况，则需要在环境变量里设置代理

export NO_PROXY=localhost,127.0.0.1,10.0.2.131,10.96.0.0/12,10.0.0.0/12,192.168.99.0/16,192.168.39.0/24,*.xx.com,10.239.47.*,*.sh.xx.com

如果出现以下错误

# kubeadm join 10.0.2.131:6443 --token 3vhusy.qhdg1crbnlr8nd0e \
>     --discovery-token-ca-cert-hash sha256:5adf3991d1238fb3e95fb0cb62337808c07db86b9441de6aa26d46765e4b712a 
[preflight] Running pre-flight checks
	[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
error execution phase preflight: unable to fetch the kubeadm-config ConfigMap: failed to get config map: Unauthorized

则 token过期，需要重新生成。

# kubeadm join 10.0.2.131:6443 --token jn1ten.2rtj7xwusw6j1g79 \
>     --discovery-token-ca-cert-hash sha256:69949981f2fea5b5bba23c68382378834938770c39ebb3c4016d10d2d99db6c9
[preflight] Running pre-flight checks
	[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[kubelet-start] Downloading configuration for the kubelet from the "kubelet-config-1.14" ConfigMap in the kube-system namespace
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Activating the kubelet service
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...

This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.

Run 'kubectl get nodes' on the control-plane to see this node join the cluster.

则表示从节点加入成功

# kubectl get node
NAME    STATUS     ROLES    AGE   VERSION
sr531   NotReady   master   30m   v1.14.1
sr532   NotReady   <none>   29m   v1.14.1

According to this, the Kubelet is not fully ready, because the container network (CNI) plugin isn’t ready, which is understandable, because we haven’t deployed the CNI plugin yet.

Setting up the container network

在主节点执行

# kubectl apply -f https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 |tr -d '\n')
serviceaccount/weave-net created
clusterrole.rbac.authorization.k8s.io/weave-net created
clusterrolebinding.rbac.authorization.k8s.io/weave-net created
role.rbac.authorization.k8s.io/weave-net created
rolebinding.rbac.authorization.k8s.io/weave-net created
daemonset.extensions/weave-net created

然后

[root@sr531 ~]# kubectl get node
NAME    STATUS     ROLES    AGE   VERSION
sr531   NotReady   master   42m   v1.14.1
sr532   NotReady   <none>   41m   v1.14.1
[root@sr531 ~]# kubectl get node
NAME    STATUS     ROLES    AGE   VERSION
sr531   Ready      master   43m   v1.14.1
sr532   NotReady   <none>   42m   v1.14.1

查看所有pods

kubectl get pods --all-namespaces

等待相应的基础 pod 建立


[root@sr531 ~]#  kubectl get nodes
NAME    STATUS     ROLES    AGE    VERSION
sr531   Ready      master   15d    v1.14.1
sr533   NotReady   <none>   103s   v1.14.1
sr535   Ready      <none>   12d    v1.14.1

sr533 刚加入进去显示 "NotReady ", 其实是在等待 kube-proxy， weave-net的创建成功

#  kubectl get pod  -n kube-system
NAME                            READY   STATUS              RESTARTS   AGE
coredns-fb8b8dccf-2jj46         1/1     Running             0          15d
coredns-fb8b8dccf-p55w6         1/1     Running             0          15d
etcd-sr531                      1/1     Running             0          15d
kube-apiserver-sr531            1/1     Running             0          15d
kube-controller-manager-sr531   1/1     Running             0          15d
kube-proxy-kh57b                1/1     Running             0          15d
kube-proxy-td8d7                1/1     Running             0          12d
kube-proxy-xpzsj                0/1     ContainerCreating   0          2m
kube-scheduler-sr531            1/1     Running             0          15d
weave-net-5dl6w                 2/2     Running             1          12d
weave-net-8zzwh                 0/2     ContainerCreating   0          2m
weave-net-nlmrg                 2/2     Running             0          15d

kube-proxy， weave-net的一旦创建成功， sr533 即显示 "Ready "

#  kubectl get  node
NAME    STATUS   ROLES    AGE     VERSION
sr531   Ready    master   15d     v1.14.1
sr533   Ready    <none>   3m58s   v1.14.1
sr535   Ready    <none>   12d     v1.14.1

Cloudpods

开源、云原生的融合云平台

更多推荐

面向未来的 IT 基础设施管理架构——融合云（Unified IaaS）

随着数字化时代的到来，IT系统已成为人类社会正常运转不可或缺的组成部分。不远的未来，智能制造，5G和人工智能等技术将成为推动生产力发展的重要引擎，人类社会将面临前所未有的全面彻底的数字化浪潮。IT基础设施作为IT系统运行的平台和载体，是实现数字化的基石。在这场数字化浪潮中，企业必须积极拥抱云计算技术，采用符合技术发展趋势、面向未来的IT基础构架，才能在未来的竞争中赢得先机。一、云计算历经十余年

Cloudpods

Cloudpods负载均衡的功能介绍

作者:周有松今天的内容会从以下几个方面展开：负载均衡产品简介。主要介绍负载均衡作为一个云上产品，它的功能模型是怎样的，日常使用中会遇到的业务词汇负载均衡的功能与典型应用场景。这部分主要结合业务词汇，对负载均衡服务中常见的一些功能选项进行介绍，并举例介绍一些典型的应用场景最后，我们做一下总结，讨论一下负载均衡产品相比传统方式的优点一、产品简介 1. 以NGINX为例提到负载均衡，我们以

Cloudpods

使用Linux vfio将Nvidia GPU透传给QEMU虚拟机

Linux 上虚拟机 GPU 透传需要使用 vfio 的方式。主要是因为在 vfio 方式下对虚拟设备的权限和 DMA 隔离上做的更好。但是这么做也有个缺点，这个物理设备在主机和其他虚拟机都不能使用了。 qemu 直接使用物理设备本身命令行是很简单的，关键在于事先在主机上对系统、内核和物理设备的一些配置。单纯从 qemu 的命令行来看，其实和普通虚拟机启动就差了最后那个-device的选项。这