kubernetes部署 Prometheus Operator监控系统
版权声明:本文为博主原创文章,转载请注明出处。 https://blog.csdn.net/networken/article/details/85620793Prometheus Operator简介各组件功能说明:1.MetricServer:是kubernetes集群资源使用情况的聚合器,收集数据给kubernetes集群内使用,如kubectl,hpa,scheduler等...
版权声明:本文为博主原创文章,转载请注明出处。 https://blog.csdn.net/networken/article/details/85620793
Prometheus Operator简介
各组件功能说明:
1.MetricServer:是kubernetes集群资源使用情况的聚合器,收集数据给kubernetes集群内使用,如kubectl,hpa,scheduler等。
2.PrometheusOperator:是一个系统监测和警报工具箱,用来存储监控数据。
3.NodeExporter:用于各node的关键度量指标状态数据。
4.KubeStateMetrics:收集kubernetes集群内资源对象数据,制定告警规则。
5.Prometheus:采用pull方式收集apiserver,scheduler,controller-manager,kubelet组件数据,通过http协议传输。
6.Grafana:是可视化数据统计和监控平台。
7.Alertmanager:实现短信或邮件报警。
部署环境准备
kubernetes集群准备
Prometheus Operator的github链接:
https://github.com/coreos/prometheus-operator
Prometheus Operator所有yaml文件所在路径:
https://github.com/coreos/prometheus-operator/tree/master/contrib/kube-prometheus/manifests
克隆prometheus-operator仓库到本地:
git clone https://github.com/coreos/prometheus-operator.git
复制一份yaml文件到指定目录:
cp -R prometheus-operator/contrib/kube-prometheus/manifests/ $HOME && cd $HOME/manifests
一键部署所有yaml文件:
[centos@k8s-master manifests]$ kubectl apply -f .
- 1
查看所有pod状态:
可能部分pod由于镜像拉取失败无法正常启动:
[centos@k8s-master manifests]$ kubectl get all -n monitoring -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod/grafana-6689854d5-xtj6c 1/1 Running 0 25m 10.244.1.219 k8s-node1 <none> <none>
pod/kube-state-metrics-86bc74fd4c-9pzj7 0/4 ContainerCreating 0 25m <none> k8s-node1 <none> <none>
pod/node-exporter-5992x 0/2 ErrImagePull 0 24m 192.168.92.56 k8s-master <none> <none>
pod/node-exporter-9mnpg 0/2 ErrImagePull 0 24m 192.168.92.58 k8s-node2 <none> <none>
pod/node-exporter-xzgsv 0/2 ContainerCreating 0 24m 192.168.92.57 k8s-node1 <none> <none>
pod/prometheus-adapter-5cc8b5d556-n9nvw 0/1 ContainerCreating 0 25m <none> k8s-node1 <none> <none>
pod/prometheus-operator-5cfb7f4c54-bzc29 0/1 ContainerCreating 0 25m <none> k8s-node1 <none> <none>
获取拉取失败的镜像:
[centos@k8s-master manifests]$ kubectl describe pod node-exporter-5992x -n monitoring
......
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 30m default-scheduler Successfully assigned monitoring/node-exporter-5992x to k8s-master
Warning Failed 24m kubelet, k8s-master Failed to pull image "quay.io/prometheus/node-exporter:v0.16.0": rpc error: code = Unknown desc = context canceled
Warning Failed 24m kubelet, k8s-master Error: ErrImagePull
Normal Pulling 24m kubelet, k8s-master pulling image "quay.io/coreos/kube-rbac-proxy:v0.4.0"
Normal Pulling 19m (x2 over 30m) kubelet, k8s-master pulling image "quay.io/prometheus/node-exporter:v0.16.0"
Warning Failed 19m kubelet, k8s-master Failed to pull image "quay.io/coreos/kube-rbac-proxy:v0.4.0": rpc error: code = Unknown desc = net/http: TLS handshake timeout
Warning Failed 19m kubelet, k8s-master Error: ErrImagePull
查看Events信息可以看到有2个镜像在k8s-master节点上拉取失败:
quay.io/coreos/kube-rbac-proxy:v0.4.0
quay.io/prometheus/node-exporter:v0.16.0
从阿里云拉取镜像
登录k8s-master节点,手动拉取镜像,或者从阿里云或者dockerhub镜像仓库搜索镜像,拉取到本地后修改tag。
需要拉取的镜像列表:
#node-exporter-daemonset.yaml
quay.io/prometheus/node-exporter:v0.16.0
quay.io/coreos/kube-rbac-proxy:v0.4.0
#kube-state-metrics-deployment.yaml
quay.io/coreos/kube-state-metrics:v1.4.0
quay.io/coreos/addon-resizer:1.0
#0prometheus-operator-deployment.yaml
quay.io/coreos/configmap-reload:v0.0.1
quay.io/coreos/prometheus-config-reloader:v0.26.0
quay.io/coreos/prometheus-operator:v0.26.0
#alertmanager-alertmanager.yaml
quay.io/prometheus/alertmanager:v0.15.3
#prometheus-adapter-deployment.yaml
quay.io/coreos/k8s-prometheus-adapter-amd64:v0.4.1
#prometheus-prometheus.yaml
quay.io/prometheus/prometheus:v2.5.0
#grafana-deployment.yaml
grafana/grafana:5.2.4
以上镜像已经push到阿里云仓库,准备镜像列表文件imagepath.txt放在$HOME目录下,其他版本镜像请自行搜索获取。
cat $HOME/imagepath.txt
quay.io/prometheus/node-exporter:v0.16.0
quay.io/coreos/kube-rbac-proxy:v0.4.0
......
运行以下脚本将镜像列表文件imagepath.txt中的镜像全部拉取到本地所有节点:
wget -O- https://raw.githubusercontent.com/zhwill/LinuxShell/master/pull-aliyun-images.sh | sh
- 1
查看所有pod状态:
[centos@k8s-master manifests]$ kubectl get pod -n monitoring -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
alertmanager-main-0 2/2 Running 2 19h 10.244.1.227 k8s-node1 <none> <none>
alertmanager-main-1 2/2 Running 2 18h 10.244.2.200 k8s-node2 <none> <none>
alertmanager-main-2 2/2 Running 2 17h 10.244.1.230 k8s-node1 <none> <none>
grafana-6689854d5-xtj6c 1/1 Running 1 19h 10.244.1.233 k8s-node1 <none> <none>
kube-state-metrics-75fd9687fc-dmmlw 4/4 Running 4 19h 10.244.2.205 k8s-node2 <none> <none>
node-exporter-5992x 2/2 Running 2 19h 192.168.92.56 k8s-master <none> <none>
node-exporter-9mnpg 2/2 Running 2 19h 192.168.92.58 k8s-node2 <none> <none>
node-exporter-xzgsv 2/2 Running 2 19h 192.168.92.57 k8s-node1 <none> <none>
prometheus-adapter-5cc8b5d556-n9nvw 1/1 Running 1 19h 10.244.1.234 k8s-node1 <none> <none>
prometheus-k8s-0 3/3 Running 11 19h 10.244.1.235 k8s-node1 <none> <none>
prometheus-k8s-1 3/3 Running 5 17h 10.244.2.207 k8s-node2 <none> <none>
prometheus-operator-5cfb7f4c54-bzc29 1/1 Running 1 19h 10.244.1.229 k8s-node1 <none> <none>
[centos@k8s-master manifests]$
所有pod状态为running说明部署成功。
配置NodePort
修改grafana-service.yaml文件,使用nodepode方式访问grafana:
[centos@k8s-master manifests]$ vim grafana-service.yaml
apiVersion: v1
kind: Service
metadata:
name: grafana
namespace: monitoring
spec:
type: NodePort #添加内容
ports:
- name: http
port: 3000
targetPort: http
nodePort: 30100 #添加内容
selector:
app: grafana
修改prometheus-service.yaml,改为nodepode
[centos@k8s-master manifests]$ vim prometheus-service.yaml
apiVersion: v1
kind: Service
metadata:
labels:
prometheus: k8s
name: prometheus-k8s
namespace: monitoring
spec:
type: NodePort
ports:
- name: web
port: 9090
targetPort: web
nodePort: 30200
selector:
app: prometheus
prometheus: k8s
修改alertmanager-service.yaml,改为nodepode
[centos@k8s-master manifests]$ vim alertmanager-service.yaml
apiVersion: v1
kind: Service
metadata:
labels:
alertmanager: main
name: alertmanager-main
namespace: monitoring
spec:
type: NodePort
ports:
- name: web
port: 9093
targetPort: web
nodePort: 30300
selector:
alertmanager: main
app: alertmanager
访问prometheus
prometheus对应的nodeport端口为30200,访问http://192.168.115.5:30200
通过访问http://192.168.115.5:30200/target 可以看到prometheus已经成功连接上了k8s的apiserver
查看service-discovery
Prometheus自己的指标
prometheus的WEB界面上提供了基本的查询K8S集群中每个POD的CPU使用情况,查询条件如下:
sum by (pod_name)( rate(container_cpu_usage_seconds_total{image!="", pod_name!=""}[1m] ) )
上述的查询有出现数据,说明node-exporter往prometheus中写入数据正常,接下来我们就可以部署grafana组件,实现更友好的webui展示数据了。
访问grafana
查看grafana服务暴露的端口号:
[centos@k8s-master ~]$ kubectl get service -n monitoring | grep grafana
grafana NodePort 10.107.56.143 <none> 3000:30100/TCP 20h
[centos@k8s-master ~]$
如上可以看到grafana的端口号是30100,浏览器访问http://192.168.92.56:30100
用户名密码默认admin/admin
修改密码并登陆。
添加数据源
grafana默认已经添加了Prometheus数据源,grafana支持多种时序数据源,每种数据源都有各自的查询编辑器。
Prometheus数据源的相关参数:
目前官方支持了如下几种数据源:
导入dashboard:
导入面板,可以直接输入模板编号315在线导入,或者下载好对应的json模板文件本地导入,面板模板下载地址:
https://grafana.com/dashboards/315
https://grafana.com/dashboards/8919
导入面板之后就可以看到对应的监控数据了,点击HOME选择查看,其实Grafana已经预定义了一系列Dashboard:
查看集群监控信息
另外一个dashborad模板
可以监控 Kubernetes 集群的整体健康状态
整个集群的资源使用情况
Kubernetes 各个管理组件的状态
整个集群的资源使用情况
节点的资源使用情况
Deployment 的运行状态
Pod 的运行状态
这些 Dashboard 展示了从集群到 Pod 的运行状况,能够帮助用户更好地运维 Kubernetes。
更多推荐
所有评论(0)