使用Operator管理Prometheus

创建Prometheus实例

当集群中已经安装Prometheus Operator之后,对于部署Prometheus Server实例就变成了声明一个Prometheus资源,如下所示,我们在Monitoring命名空间下创建一个Prometheus实例:
1
apiVersion: monitoring.coreos.com/v1
2
kind: Prometheus
3
metadata:
4
name: inst
5
namespace: monitoring
6
spec:
7
resources:
8
requests:
9
memory: 400Mi
Copied!
将以上内容保存到prometheus-inst.yaml文件,并通过kubectl进行创建:
1
$ kubectl create -f prometheus-inst.yaml
2
prometheus.monitoring.coreos.com/inst-1 created
Copied!
此时,查看monitoring命名空间下的statefulsets资源,可以看到Prometheus Operator自动通过Statefulset创建的Prometheus实例:
1
$ kubectl -n monitoring get statefulsets
2
NAME DESIRED CURRENT AGE
3
prometheus-inst 1 1 1m
Copied!
查看Pod实例:
1
$ kubectl -n monitoring get pods
2
NAME READY STATUS RESTARTS AGE
3
prometheus-inst-0 3/3 Running 1 1m
4
prometheus-operator-6db8dbb7dd-2hz55 1/1 Running 0 45m
Copied!
通过port-forward访问Prometheus实例:
1
$ kubectl -n monitoring port-forward statefulsets/prometheus-inst 9090:9090
Copied!
通过http://localhost:9090可以在本地直接打开Prometheus Operator创建的Prometheus实例。查看配置信息,可以看到目前Operator创建了只包含基本配置的Prometheus实例:

使用ServiceMonitor管理监控配置

修改监控配置项也是Prometheus下常用的运维操作之一,为了能够自动化的管理Prometheus的配置,Prometheus Operator使用了自定义资源类型ServiceMonitor来描述监控对象的信息。
这里我们首先在集群中部署一个示例应用,将以下内容保存到example-app.yaml,并使用kubectl命令行工具创建:
1
kind: Service
2
apiVersion: v1
3
metadata:
4
name: example-app
5
labels:
6
app: example-app
7
spec:
8
selector:
9
app: example-app
10
ports:
11
- name: web
12
port: 8080
13
---
14
apiVersion: extensions/v1beta1
15
kind: Deployment
16
metadata:
17
name: example-app
18
spec:
19
replicas: 3
20
template:
21
metadata:
22
labels:
23
app: example-app
24
spec:
25
containers:
26
- name: example-app
27
image: fabxc/instrumented_app
28
ports:
29
- name: web
30
containerPort: 8080
Copied!
示例应用会通过Deployment创建3个Pod实例,并且通过Service暴露应用访问信息。
1
$ kubectl get pods
2
NAME READY STATUS RESTARTS AGE
3
example-app-94c8bc8-l27vx 2/2 Running 0 1m
4
example-app-94c8bc8-lcsrm 2/2 Running 0 1m
5
example-app-94c8bc8-n6wp5 2/2 Running 0 1m
Copied!
在本地同样通过port-forward访问任意Pod实例
1
$ kubectl port-forward deployments/example-app 8080:8080
Copied!
访问本地的http://localhost:8080/metrics实例应用程序会返回以下样本数据:
1
# TYPE codelab_api_http_requests_in_progress gauge
2
codelab_api_http_requests_in_progress 3
3
# HELP codelab_api_request_duration_seconds A histogram of the API HTTP request durations in seconds.
4
# TYPE codelab_api_request_duration_seconds histogram
5
codelab_api_request_duration_seconds_bucket{method="GET",path="/api/bar",status="200",le="0.0001"} 0
Copied!
为了能够让Prometheus能够采集部署在Kubernetes下应用的监控数据,在原生的Prometheus配置方式中,我们在Prometheus配置文件中定义单独的Job,同时使用kubernetes_sd定义整个服务发现过程。而在Prometheus Operator中,则可以直接声明一个ServiceMonitor对象,如下所示:
1
apiVersion: monitoring.coreos.com/v1
2
kind: ServiceMonitor
3
metadata:
4
name: example-app
5
namespace: monitoring
6
labels:
7
team: frontend
8
spec:
9
namespaceSelector:
10
matchNames:
11
- default
12
selector:
13
matchLabels:
14
app: example-app
15
endpoints:
16
- port: web
Copied!
通过定义selector中的标签定义选择监控目标的Pod对象,同时在endpoints中指定port名称为web的端口。默认情况下ServiceMonitor和监控对象必须是在相同Namespace下的。在本示例中由于Prometheus是部署在Monitoring命名空间下,因此为了能够关联default命名空间下的example对象,需要使用namespaceSelector定义让其可以跨命名空间关联ServiceMonitor资源。保存以上内容到example-app-service-monitor.yaml文件中,并通过kubectl创建:
1
$ kubectl create -f example-app-service-monitor.yaml
2
servicemonitor.monitoring.coreos.com/example-app created
Copied!
如果希望ServiceMonitor可以关联任意命名空间下的标签,则通过以下方式定义:
1
spec:
2
namespaceSelector:
3
any: true
Copied!
如果监控的Target对象启用了BasicAuth认证,那在定义ServiceMonitor对象时,可以使用endpoints配置中定义basicAuth如下所示:
1
apiVersion: monitoring.coreos.com/v1
2
kind: ServiceMonitor
3
metadata:
4
name: example-app
5
namespace: monitoring
6
labels:
7
team: frontend
8
spec:
9
namespaceSelector:
10
matchNames:
11
- default
12
selector:
13
matchLabels:
14
app: example-app
15
endpoints:
16
- basicAuth:
17
password:
18
name: basic-auth
19
key: password
20
username:
21
name: basic-auth
22
key: user
23
port: web
Copied!
其中basicAuth中关联了名为basic-auth的Secret对象,用户需要手动将认证信息保存到Secret中:
1
apiVersion: v1
2
kind: Secret
3
metadata:
4
name: basic-auth
5
data:
6
password: dG9vcg== # base64编码后的密码
7
user: YWRtaW4= # base64编码后的用户名
8
type: Opaque
Copied!

关联Promethues与ServiceMonitor

Prometheus与ServiceMonitor之间的关联关系使用serviceMonitorSelector定义,在Prometheus中通过标签选择当前需要监控的ServiceMonitor对象。修改prometheus-inst.yaml中Prometheus的定义如下所示: 为了能够让Prometheus关联到ServiceMonitor,需要在Pormtheus定义中使用serviceMonitorSelector,我们可以通过标签选择当前Prometheus需要监控的ServiceMonitor对象。修改prometheus-inst.yaml中Prometheus的定义如下所示:
1
apiVersion: monitoring.coreos.com/v1
2
kind: Prometheus
3
metadata:
4
name: inst
5
namespace: monitoring
6
spec:
7
serviceMonitorSelector:
8
matchLabels:
9
team: frontend
10
resources:
11
requests:
12
memory: 400Mi
Copied!
将对Prometheus的变更应用到集群中:
1
$ kubectl -n monitoring apply -f prometheus-inst.yaml
Copied!
此时,如果查看Prometheus配置信息,我们会惊喜的发现Prometheus中配置文件自动包含了一条名为monitoring/example-app/0的Job配置:
1
global:
2
scrape_interval: 30s
3
scrape_timeout: 10s
4
evaluation_interval: 30s
5
external_labels:
6
prometheus: monitoring/inst
7
prometheus_replica: prometheus-inst-0
8
alerting:
9
alert_relabel_configs:
10
- separator: ;
11
regex: prometheus_replica
12
replacement: $1
13
action: labeldrop
14
rule_files:
15
- /etc/prometheus/rules/prometheus-inst-rulefiles-0/*.yaml
16
scrape_configs:
17
- job_name: monitoring/example-app/0
18
scrape_interval: 30s
19
scrape_timeout: 10s
20
metrics_path: /metrics
21
scheme: http
22
kubernetes_sd_configs:
23
- role: endpoints
24
namespaces:
25
names:
26
- default
27
relabel_configs:
28
- source_labels: [__meta_kubernetes_service_label_app]
29
separator: ;
30
regex: example-app
31
replacement: $1
32
action: keep
33
- source_labels: [__meta_kubernetes_endpoint_port_name]
34
separator: ;
35
regex: web
36
replacement: $1
37
action: keep
38
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
39
separator: ;
40
regex: Node;(.*)
41
target_label: node
42
replacement: ${1}
43
action: replace
44
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
45
separator: ;
46
regex: Pod;(.*)
47
target_label: pod
48
replacement: ${1}
49
action: replace
50
- source_labels: [__meta_kubernetes_namespace]
51
separator: ;
52
regex: (.*)
53
target_label: namespace
54
replacement: $1
55
action: replace
56
- source_labels: [__meta_kubernetes_service_name]
57
separator: ;
58
regex: (.*)
59
target_label: service
60
replacement: $1
61
action: replace
62
- source_labels: [__meta_kubernetes_pod_name]
63
separator: ;
64
regex: (.*)
65
target_label: pod
66
replacement: $1
67
action: replace
68
- source_labels: [__meta_kubernetes_service_name]
69
separator: ;
70
regex: (.*)
71
target_label: job
72
replacement: ${1}
73
action: replace
74
- separator: ;
75
regex: (.*)
76
target_label: endpoint
77
replacement: web
78
action: replace
Copied!
不过,如果细心的读者可能会发现,虽然Job配置有了,但是Prometheus的Target中并没包含任何的监控对象。查看Prometheus的Pod实例日志,可以看到如下信息:
1
level=error ts=2018-12-15T12:52:48.452108433Z caller=main.go:240 component=k8s_client_runtime err="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:300: Failed to list *v1.Endpoints: endpoints is forbidden: User \"system:serviceaccount:monitoring:default\" cannot list endpoints in the namespace \"default\""
Copied!

自定义ServiceAccount

由于默认创建的Prometheus实例使用的是monitoring命名空间下的default账号,该账号并没有权限能够获取default命名空间下的任何资源信息。
为了修复这个问题,我们需要在Monitoring命名空间下为创建一个名为Prometheus的ServiceAccount,并且为该账号赋予相应的集群访问权限。
1
apiVersion: v1
2
kind: ServiceAccount
3
metadata:
4
name: prometheus
5
namespace: monitoring
6
---
7
apiVersion: rbac.authorization.k8s.io/v1beta1
8
kind: ClusterRole
9
metadata:
10
name: prometheus
11
rules:
12
- apiGroups: [""]
13
resources:
14
- nodes
15
- services
16
- endpoints
17
- pods
18
verbs: ["get", "list", "watch"]
19
- apiGroups: [""]
20
resources:
21
- configmaps
22
verbs: ["get"]
23
- nonResourceURLs: ["/metrics"]
24
verbs: ["get"]
25
---
26
apiVersion: rbac.authorization.k8s.io/v1beta1
27
kind: ClusterRoleBinding
28
metadata:
29
name: prometheus
30
roleRef:
31
apiGroup: rbac.authorization.k8s.io
32
kind: ClusterRole
33
name: prometheus
34
subjects:
35
- kind: ServiceAccount
36
name: prometheus
37
namespace: monitoring
Copied!
将以上内容保存到prometheus-rbac.yaml文件中,并且通过kubectl创建相应资源:
1
$ kubectl -n monitoring create -f prometheus-rbac.yaml
2
serviceaccount/prometheus created
3
clusterrole.rbac.authorization.k8s.io/prometheus created
4
clusterrolebinding.rbac.authorization.k8s.io/prometheus created
Copied!
在完成ServiceAccount创建后,修改prometheus-inst.yaml,并添加ServiceAccount如下所示:
1
apiVersion: monitoring.coreos.com/v1
2
kind: Prometheus
3
metadata:
4
name: inst
5
namespace: monitoring
6
spec:
7
serviceAccountName: prometheus
8
serviceMonitorSelector:
9
matchLabels:
10
team: frontend
11
resources:
12
requests:
13
memory: 400Mi
Copied!
保存Prometheus变更到集群中:
1
$ kubectl -n monitoring apply -f prometheus-inst.yaml
2
prometheus.monitoring.coreos.com/inst configured
Copied!
等待Prometheus Operator完成相关配置变更后,此时查看Prometheus,我们就能看到当前Prometheus已经能够正常的采集实例应用的相关监控数据了。