导航:这里主要是列出一个prometheus一些系统的学习过程,最后按照章节顺序查看,由于写作该文档经历了不同时期,所以在文中有时出现

的云环境不统一,但是学习具体使用方法即可,在最后的篇章,有一个完整的腾讯云的实战案例。

  1.什么是prometheus?

  2.Prometheus安装

  3.Prometheus的Exporter详解

  4.Prometheus的PromQL

  5.Prometheus告警处理

  6.Prometheus的集群与高可用

  7.Prometheus服务发现

  8.kube-state-metrics 和 metrics-server

  9.监控kubernetes集群的方式

  10.prometheus operator

  11.Prometheus实战之联邦+高可用+持久

  12.Prometheus实战之配置汇总

  13.Grafana简单用法

  14.Grafana SQL汇总

  15.prometheus SQL汇总

  参考:

  https://prometheus.io/docs/prometheus/latest/configuration/configuration/#kubernetes_sd_config

  https://yunlzheng.gitbook.io/prometheus-book/part-iii-prometheus-shi-zhan/readmd/use-prometheus-monitor-kubernetes

  https://www.bookstack.cn/read/prometheus_practice/introduction-README.md

  https://www.kancloud.cn/huyipow/prometheus/521184

  https://www.qikqiak.com/k8s-book/docs/

 

  有了上述几章的学习之后,这里将总结一下之前所有内容;正好,在我司架构中正好是多数据中心,多项目,且k8s环境,宿主机环境共存,这里正好将学习的知识用在实战中.

  以下可能不会贴出所有配置,但是会将重要的展示出来.

 

  基础概念:

  高可用: 2台汇总所有数据的prometheus 做负载均衡,这样即使有一台down机也不会影响任务,这2台机器汇总所有联邦prometheus的数据.

  持久:2台汇总所有数据的prometheus 数据保存在本地这样不利于迁移和更长之间的持久,所以2台prometheus的数据保存至远端数据库.

  联邦:简单来说就是zabbix-proxy的概念,这些prometheus不需要去持久化数据,只要能采集数据,主prometheus需要数据时能采集到即可.

  这样,多数据中心,多集群,多环境就可以使用这种架构解决.

 

1.环境信息

角色 部署方式 信息 云环境 作用
Prometheus 主(总)/alertmanager01 宿主 10.1.1.10 腾讯云 汇总prometheus
Prometheus 备(总)/alertmanager02 宿主 10.1.1.5 腾讯云 汇总prometheus
联邦 Prometheus K8s K8s 阿里云 Lcm k8s环境
联邦 Prometheus 宿主 阿里云内网 阿里云 罪恶王冠游戏

 

软件对应版本
软件名称 软件版本
主prometheus 2.13.1
联邦子prometheus 2.13.1
Node-exporter 0.18.1
blackbox_exporter 0.16.0
Consul 1.6.1
Metrics-server 0.3.5
Kube-state-metrics 1.8.0

这里我们由小变大来配置这个项目.

本章不讲解任何prometsql和alert告警规则的方法,只讲解集群的配置方式。

 

2.子联邦配置

2.1 K8s集群联邦点

  因为子联邦节点是监控k8s集群的,为了方便,肯定是部署在prometheus里,下面看看deploy的配置文件。

  这里不讲解获取数据来源工具组件等一些授权以及安装,默认当作已安装完成的情况,需要的去查看kube-state-metrics和metrics-server章节。

apiVersion: v1
kind: "Service"
metadata:
  name: prometheus
  namespace: monitoring
  labels:
    name: prometheus
spec:
  ports:
  - name: prometheus
    protocol: TCP
    port: 9090
    targetPort: 9090
    nodePort: 30946
  selector:
    app: prometheus
  type: NodePort
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    name: prometheus
  name: prometheus
  namespace: monitoring
spec:
  replicas: 1
  selector:
    matchLabels:
      app: prometheus
  template:
    metadata:
      labels:
        app: prometheus
    spec:
      serviceAccountName: prometheus
      containers:
      - name: prometheus
        image: prom/prometheus:v2.3.0
        env:
        - name: ver
          value: "15"
        command:
        - "/bin/prometheus"
        args:
        - "--config.file=/etc/prometheus/prometheus.yml"
        - "--log.level=debug"
        ports:
        - containerPort: 9090
          protocol: TCP
        volumeMounts:
        - mountPath: "/etc/prometheus"
          name: prometheus-config
      volumes:
      - name: prometheus-config
        configMap:
          name: prometheus-config

  当然也不能少了RBAC的授权。

apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
  name: prometheus
rules:
- apiGroups: [""]
  resources:
  - nodes
  #node发现模式的授权资源,不然通过kubelet自带的发现模式不授权这个资源,会在prometheus爆出403错误
  - nodes/metrics
  - nodes/proxy
  - services
  - endpoints
  - pods
  - namespaces
  verbs: ["get", "list", "watch"]
- apiGroups:
  - extensions
  resources:
  - ingresses
  verbs: ["get", "list", "watch"]
- nonResourceURLs: ["/metrics","/api/*"]
  verbs: ["get"]
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: prometheus
  namespace: monitoring
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: prometheus
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: prometheus
subjects:
- kind: ServiceAccount
  name: prometheus
  namespace: monitoring

上面的deploy挂载了一个卷,就是配置文件,如下的配置文件,没有告警条目,没有持久化存储,因为他们不需要,只要总prometheus来向它收集数据就可以了.它也是一个prometheus,只是做的工作比较少罢了.

这个prometheus包含了以下采集任务:

  • kubernetes-kubelet
  • kubernetes-cadvisor
  • kubernetes-pods
  • kubernetes-apiservers
  • kubernetes-services
  • kubernetes-ingresses
  • kubernetes-service-endpoints

基本涵盖了k8s 的大部分的key,所以k8s内联邦prometheus角色需要注意这么几点就可以.

下面的配置文件如下,可能(cn-lcm-prod)项目标识不太一样,这里可以忽略,改成自己对应的项目即可

apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-config
  namespace: monitoring
data:
  prometheus.yml: |
    global:
      scrape_interval:     15s 
      evaluation_interval: 15s
    scrape_configs:
      - job_name: 'prometheus'
        static_configs:
        - targets: ['localhost:9090']
     
      - job_name: 'cn-lcm-prod-kubernetes-kubelet'
        scheme: https
        tls_config:
          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        kubernetes_sd_configs:
        - role: node
       
        relabel_configs:
        - action: labelmap
          regex: __meta_kubernetes_node_label_(.+)
        - target_label: __address__
          replacement: kubernetes.default.svc:443
        - source_labels: [__meta_kubernetes_node_name]
          regex: (.+)
          target_label: __metrics_path__
          replacement: /api/v1/nodes/${1}/proxy/metrics 

      - job_name: 'cn-web-prod-kubernetes-cadvisor'
        scheme: https
        tls_config:
          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        kubernetes_sd_configs:
        - role: node

        relabel_configs:
        - target_label: __address__
          replacement: kubernetes.default.svc:443
        - source_labels: [__meta_kubernetes_node_name]
          regex: (.+)
          target_label: __metrics_path__
          replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
        - action: labelmap
          regex: __meta_kubernetes_node_label_(.+)
 
      - job_name: 'cn-lcm-prod-kubernetes-pods'
        kubernetes_sd_configs:
        - role: pod
        relabel_configs:
        - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
          action: keep
          regex: true
        - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
          action: replace
          target_label: __metrics_path__
          regex: (.+)
        - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
          action: replace
          regex: ([^:]+)(?::\d+)?;(\d+)
          replacement: $1:$2
          target_label: __address__
        - action: labelmap
          regex: __meta_kubernetes_pod_label_(.+)
        - source_labels: [__meta_kubernetes_namespace]
          action: replace
          target_label: kubernetes_namespace
        - source_labels: [__meta_kubernetes_pod_name]
          action: replace
          target_label: kubernetes_pod_name


      - job_name: 'cn-lcm-prod-kubernetes-apiservers'
        kubernetes_sd_configs:
        - role: endpoints
        scheme: https
        tls_config:
          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        relabel_configs:
        - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
          action: keep
          regex: default;kubernetes;https
        - target_label: __address__
          replacement: kubernetes.default.svc:443

      - job_name: 'cn-lcm-prod-kubernetes-services'
        metrics_path: /probe
        params:
          module: [http_2xx]
        kubernetes_sd_configs:
        - role: service
        relabel_configs:
        - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe]
          action: keep
          regex: true
        - source_labels: [__address__]
          target_label: __param_target
        - target_label: __address__
          replacement: blackbox-exporter.monitoring.svc.cluster.local:9115
        - source_labels: [__param_target]
          target_label: instance
        - action: labelmap
          regex: __meta_kubernetes_service_label_(.+)
        - source_labels: [__meta_kubernetes_namespace]
          target_label: kubernetes_namespace
        - source_labels: [__meta_kubernetes_service_name]
          target_label: kubernetes_name

      - job_name: 'cn-lcm-prod-kubernetes-ingresses'
        metrics_path: /probe
        params:
          module: [http_2xx]
        kubernetes_sd_configs:
        - role: ingress
        relabel_configs:
        - source_labels: [__meta_kubernetes_ingress_annotation_prometheus_io_probe]
          action: keep
          regex: true
        - source_labels: [__meta_kubernetes_ingress_scheme,__address__,__meta_kubernetes_ingress_path]
          regex: (.+);(.+);(.+)
          replacement: ${1}://${2}${3}
          target_label: __param_target
        - target_label: __address__
          replacement: blackbox-exporter.monitoring.svc.cluster.local:9115
        - source_labels: [__param_target]
          target_label: instance
        - action: labelmap
          regex: __meta_kubernetes_ingress_label_(.+)
        - source_labels: [__meta_kubernetes_namespace]
          target_label: kubernetes_namespace
        - source_labels: [__meta_kubernetes_ingress_name]
          target_label: kubernetes_name

      - job_name: 'cn-lcm-prod-kubernetes-service-endpoints'  
 

        scrape_interval: 10s
        scrape_timeout:  10s
        #这个job配置不太一样,采集时间是10秒,因为使用全局配置的15秒,会出现拉取数据闪断的情况,所以,这里单独配置成10秒

        kubernetes_sd_configs:  
        - role: endpoints  
        relabel_configs:  
        - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]  
          action: keep  
          regex: true  
        - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]  
          action: replace  
          target_label: __scheme__  
          regex: (https?)  
        - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]  
          action: replace  
          target_label: __metrics_path__  
          regex: (.+)  
        - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]  
          action: replace  
          target_label: __address__  
          regex: ([^:]+)(?::\d+)?;(\d+)  
          replacement: $1:$2  
        - action: labelmap  
          regex: __meta_kubernetes_service_label_(.+)  
        - source_labels: [__meta_kubernetes_namespace]  
          action: replace  
          target_label: kubernetes_namespace  
        - source_labels: [__meta_kubernetes_service_name]  
          action: replace  
          target_label: kubernetes_name
prometheus-config.yaml

相关文章:

  • 2022-12-23
  • 2021-11-27
  • 2022-01-10
  • 2021-07-24
  • 2021-09-19
  • 2022-12-23
  • 2021-06-20
猜你喜欢
  • 2022-12-23
  • 2023-04-07
  • 2022-02-06
  • 2021-11-18
  • 2021-09-11
  • 2021-04-27
  • 2021-04-13
相关资源
相似解决方案