【问题标题】:Helm / kube-prometheus-stack: Can I create rules for exporters in values.yaml?Helm / kube-prometheus-stack:我可以在 values.yaml 中为导出器创建规则吗?
【发布时间】:2021-12-10 14:10:39
【问题描述】:

我希望能够为 prometheus-blackbox-exporter 指定我的所有规则,因此已将其添加到 rules-mine.yaml 并使用

helm upgrade --install -n monitoring blackbox -f values.yaml -f rules-mine.yaml .

我看不到 http://localhost:9090/rules 中列出的任何规则,并且似乎没有任何东西被评估为没有警报...。我需要以 IaC 的方式完成所有工作,并以自动化方式通过 terraform 进行部署。

  • 是否可以通过这种方式向导出器添加规则?
  • 如果是这样,那么任何人都可以看到下面的文件有问题吗?
  • 如果没有,我怎样才能有效地向许多出口商添加规则?

rules-mine.yaml 文件包含:

prometheusRule:
  enabled:  true
  namespace: monitoring
  additionalLabels:
    team: foxtrot_blackbox
    environment: production
    cluster: cluster
    namespace: namespace_x
  namespace: "monitoring"

  rules:
  - alert: BlackboxProbeFailed
    expr: probe_success == 0
    for: 0m
    labels:
      severity: critical
    annotations:
      summary: Blackbox probe failed (instance {{`{{`}} $labels.instance {{`}}`}})
      description: "Probe failed\n  VALUE = {{`{{`}} $value {{`}}`}}"

  - alert: BlackboxSlowProbe
    expr: avg_over_time(probe_duration_seconds[1m]) > 1
    for: 1m
    labels:
      severity: warning
    annotations:
      summary: Blackbox slow probe (instance {{`{{`}} $labels.instance {{`}}`}})
      description: "Blackbox probe took more than 1s to complete\n  VALUE = {{`{{`}} $value {{`}}`}}"

感谢您的帮助....

【问题讨论】:

    标签: prometheus kubernetes-helm prometheus-alertmanager prometheus-operator kube-prometheus-stack


    【解决方案1】:

    您确定您没有在标签名称中打错字:“环境”吗? 这肯定与您的预期不符,除非您实际标记了您的来源。

    最好的

    【讨论】:

    • 我认为这个错字与问题的内容没有任何关系。
    【解决方案2】:

    我发现最好的方法似乎是将导出器规则添加到 kube-prometheus-stack values.yaml 文件(我实际上创建了一个单独的 rules.yaml 文件)并将其提供给 helm:

    • helm upgrade --install -n monitoring prometheus --create-namespace -f values-mine.yaml -f rules-mine.yaml prometheus-community/kube-prometheus-stack

    然后按照我的意愿选择所有规则,这似乎是一个不错的解决方案。但我仍然希望它们与导出器分组 - 如果我找到解决方案,我会再次发布。

    additionalPrometheusRulesMap:
      prometheus.rules:
        groups:
        - name: company.prometheus.rules
          rules:
          - alert: PrometheusNotificationsBacklog
            expr: min_over_time(prometheus_notifications_queue_length[10m]) > 0
            for: 0m
            labels:
              severity: warning
            annotations:
              summary: Prometheus notifications backlog (instance {{ $labels.instance }})
              description: The Prometheus notification queue has not been empty for 10 minutes\nVALUE = {{ $value }}
              dashboard_url: ${grafana_url}/d/blackbox/blackbox-exporter?var-instance={{ $labels.instance }}
              runbook_url: ${wiki_url}/{{ $labels.alertname }}
    
      company.blackbox.rules:
        groups:
        - name: company.blackbox.rules
          rules:
          - alert: BlackboxProbeFailed
            expr: probe_success == 0
            for: 1m
            labels:
              severity: critical
            annotations:
              summary: Blackbox probe failed (instance {{ $labels.instance }})
              description: Probe failed\nVALUE = {{ $value }}
              dashboard_url: ${grafana_url}/d/blackbox/blackbox-exporter?var-instance={{ $labels.instance }}
              runbook_url: ${wiki_url}/{{ $labels.alertname }}
    
          - alert: BlackboxSlowProbe
            expr: avg_over_time(probe_duration_seconds[1m]) > 1
            for: 3m
            labels:
              severity: warning
            annotations:
              summary: Blackbox slow probe (instance {{ $labels.instance }})
              description: "Blackbox probe took more than 1s to complete\nVALUE = {{ $value }}"
              dashboard_url: ${grafana_url}/d/blackbox/blackbox-exporter?var-instance={{ $labels.instance }}
              runbook_url: ${wiki_url}/{{ $labels.alertname }}
    
    # etc....
    

    【讨论】:

      【解决方案3】:

      一位同事发现这是完全可能的。它似乎与原始实现中使用的引用有关。以下内容正在使用中,因此在此发布,希望对其他人有用。

      总之,

      • {{`{{`}} $labels.instance {{`}}`}} == 不好
      • {{`{{$labels.instance}}`}} ==
      prometheusRule:
        enabled: true
        additionalLabels:
          client: ${client_id}
          cluster: ${cluster}
          environment: ${environment}
          grafana: ${grafana_url}
      
        rules:
          - alert: BlackboxProbeFailed
            expr: probe_success == 0
            for: 1m
            labels:
              severity: critical
            annotations:
              summary: Blackbox probe failed for {{`{{$labels.instance}}`}}
              description: Probe failed VALUE = {{`{{$value}}`}}
              dashboard_url: https://${grafana_url}/d/blackbox/blackbox-exporter?var-instance={{`{{$labels.instance}}`}}
              runbook_url: ${wiki_url}/BlackboxProbeFailed
      
          - alert: BlackboxSlowProbe
            expr: avg_over_time(probe_duration_seconds[1m]) > 1
            for: 2m
            labels:
              severity: warning
            annotations:
              summary: Blackbox slow probe for {{`{{$labels.instance}}`}}
              description: Blackbox probe took more than 1s to complete VALUE = {{`{{$value|humanizeDuration}}`}}
              dashboard_url: https://${grafana_url}/d/blackbox/blackbox-exporter?var-instance={{`{{$labels.instance}}`}}
              runbook_url: ${wiki_url}/BlackboxSlowProbe
      

      请忽略任何缺失的变量等

      【讨论】:

        猜你喜欢
        • 2021-09-06
        • 2021-05-12
        • 2018-12-08
        • 2022-08-11
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多