第14关k8s架构师课程之业务Prometheus监控实战二
使用prometheus来监控ingress-nginx
大家好,我是博哥爱运维。我们前面部署过ingress-nginx,这个是整个K8s上所有服务的流量入口组件很关键,因此把它的metrics指标收集到prometheus来做好相关监控至关重要,因为前面ingress-nginx服务是以daemonset形式部署的,并且映射了自己的端口到宿主机上,那么我可以直接用pod运行NODE上的IP来看下metrics
curl 10.0.1.201:10254/metrics
创建 servicemonitor配置让prometheus能发现ingress-nginx的metrics
# vim servicemonitor.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
app: ingress-nginx
name: nginx-ingress-scraping
namespace: ingress-nginx
spec:
endpoints:
- interval: 30s
path: /metrics
port: metrics
jobLabel: app
namespaceSelector:
matchNames:
- ingress-nginx
selector:
matchLabels:
app: ingress-nginx
创建它
# kubectl apply -f servicemonitor.yaml
servicemonitor.monitoring.coreos.com/nginx-ingress-scraping created
# kubectl -n ingress-nginx get servicemonitors.monitoring.coreos.com
NAME AGE
nginx-ingress-scraping 8s
指标一直没收集上来,看看proemtheus服务的日志,发现报错如下:
# kubectl -n monitoring logs prometheus-k8s-0 -c prometheus |grep error
level=error ts=2020-12-13T09:52:35.565Z caller=klog.go:96 component=k8s_client_runtime func=ErrorDepth msg="/app/discovery/kubernetes/kubernetes.go:426: Failed to watch *v1.Endpoints: failed to list *v1.Endpoints: endpoints is forbidden: User \"system:serviceaccount:monitoring:prometheus-k8s\" cannot list resource \"endpoints\" in API group \"\" in the namespace \"ingress-nginx\""
需要修改prometheus的clusterrole
# kubectl edit clusterrole prometheus-k8s
#------ 原始的rules -------
rules:
- apiGroups:
- ""
resources:
- nodes/metrics
verbs:
- get
- nonResourceURLs:
- /metrics
verbs:
- get
#---------------------------
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prometheus-k8s
rules:
- apiGroups:
- ""
resources:
- nodes
- services
- endpoints
- pods
- nodes/proxy
verbs:
- get
- list
- watch
- apiGroups:
- ""
resources:
- configmaps
- nodes/metrics
verbs:
- get
- nonResourceURLs:
- /metrics
verbs:
- get
再到prometheus UI上看下,发现已经有了
ingress-nginx/nginx-ingress-scraping/0 (1/1 up)
使用Prometheus来监控二进制部署的ETCD集群
作为K8s所有资源存储的关键服务ETCD,我们也有必要把它给监控起来,正好借这个机会,完整的演示一次利用Prometheus来监控非K8s集群服务的步骤
在前面部署K8s集群的时候,我们是用二进制的方式部署的ETCD集群,并且利用自签证书来配置访问ETCD,正如前面所说,现在关键的服务基本都会留有指标metrics接口支持prometheus的监控,利用下面命令,我们可以看到ETCD都暴露出了哪些监控指标出来
# curl --cacert /etc/kubernetes/ssl/ca.pem --cert /etc/etcd/ssl/etcd.pem --key /etc/etcd/ssl/etcd-key.pem https://10.0.1.201:2379/metrics
新版证书目录在/etc/kubeasz/clusters/test/ssl/ 代替/etc/etcd/ssl/
上面查看没问题后,接下来我们开始进行配置使ETCD能被prometheus发现并监控
# 首先把ETCD的证书创建为secret
kubectl -n monitoring create secret generic etcd-certs --from-file=/etc/etcd/ssl/etcd.pem --from-file=/etc/etcd/ssl/etcd-key.pem --from-file=/etc/kubernetes/ssl/ca.pem
#新版证书目录在/etc/kubeasz/clusters/test/ssl/ 代替/etc/etcd/ssl/
# 接着在prometheus里面引用这个secrets
kubectl -n monitoring edit prometheus k8s
spec:
...
secrets:
- etcd-certs
# 保存退出后,prometheus会自动重启服务pod以加载这个secret配置,过一会,我们进pod来查看下是不是已经加载到ETCD的证书了
# kubectl -n monitoring exec -it prometheus-k8s-0 -c prometheus -- sh
/prometheus $ ls /etc/prometheus/secrets/etcd-certs/
ca.pem etcd-key.pem etcd.pem
接下来准备创建service、endpoints以及ServiceMonitor的yaml配置
注意替换下面的NODE节点IP为实际ETCD所在NODE内网IP
# vim prometheus-etcd.yaml
apiVersion: v1
kind: Service
metadata:
name: etcd-k8s
namespace: monitoring
labels:
k8s-app: etcd
spec:
type: ClusterIP
clusterIP: None
ports:
- name: api
port: 2379
protocol: TCP
---
apiVersion: v1
kind: Endpoints
metadata:
name: etcd-k8s
namespace: monitoring
labels:
k8s-app: etcd
subsets:
- addresses:
- ip: 10.0.1.201
- ip: 10.0.1.202
- ip: 10.0.1.203
ports:
- name: api
port: 2379
protocol: TCP
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: etcd-k8s
namespace: monitoring
labels:
k8s-app: etcd-k8s
spec:
jobLabel: k8s-app
endpoints:
- port: api
interval: 30s
scheme: https
tlsConfig:
caFile: /etc/prometheus/secrets/etcd-certs/ca.pem
certFile: /etc/prometheus/secrets/etcd-certs/etcd.pem
keyFile: /etc/prometheus/secrets/etcd-certs/etcd-key.pem
#use insecureSkipVerify only if you cannot use a Subject Alternative Name
insecureSkipVerify: true
selector:
matchLabels:
k8s-app: etcd
namespaceSelector:
matchNames:
- monitoring
开始创建上面的资源
# kubectl apply -f prometheus-etcd.yaml
service/etcd-k8s created
endpoints/etcd-k8s created
servicemonitor.monitoring.coreos.com/etcd-k8s created
过一会,就可以在prometheus UI上面看到ETCD集群被监控了
monitoring/etcd-k8s/0 (3/3 up)
接下来我们用grafana来展示被监控的ETCD指标
1. 在grafana官网模板中心搜索etcd,下载这个json格式的模板文件
https://grafana.com/dashboards/3070
2.然后打开自己先部署的grafana首页,
点击左边菜单栏四个小正方形方块HOME --- Manage
再点击右边 Import dashboard ---
点击Upload .json File 按钮,上传上面下载好的json文件 etcd_rev3.json,
然后在prometheus选择数据来源
点击Import,即可显示etcd集群的图形监控信息