【发布时间】:2026-01-04 21:45:01
【问题描述】:
我已经在 Ubuntu 服务器上配置了 prometheus alertmanager 来监控多个 azure vm。 目前,所有 vm 实例警报都会通知到默认电子邮件组。 我需要触发警报
- 团队 A(user1,user2,user3) 和默认组,如果服务器 A(使用 Jobname)出现故障。
- 如果服务器 B 出现故障,团队 B(User1,User2) 和默认组。
尝试了一些与下面在 alertmanager.yml 中给出的路由配置的组合,但它没有按预期工作。
如果有人可以在 alertmanager 中解释发送组特定警报通知背后的逻辑,我们将不胜感激。
谢谢你的时间!
route:
group_wait: 30s
group_interval: 5m
repeat_interval: 2h
receiver: 'default-receiver'
routes:
- match:
alertname: A_down
receiver: TeamA
- match:
alertname: B_down
receiver: TeamB
我当前的 Alertmanager.yml 文件:
global:
resolve_timeout: 1m
route:
receiver: 'email-notifications'
receivers:
- name: 'email-notifications'
email_configs:
- to: alertgroups@example.com
from: default@example.com
smarthost: smtp.gmail.com:587
auth_username: default@example.com
auth_identity: default@example.com
auth_password: password
send_resolved: true
alertrule.yml 文件:
groups:
- name: alert.rules
rules:
- alert: InstanceDown
# Condition for alerting
expr: up == 0
for: 1m
# Annotation - additional informational labels to store more information
annotations:
title: 'Instance {{ $labels.instance }} down'
description: '{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 1 minute.'
# Labels - additional labels to be attached to the alert
labels:
severity: 'critical'
- alert: HostOutOfMemory
# Condition for alerting
expr: node_memory_MemAvailable / node_memory_MemTotal * 100 < 80
for: 5m
# Annotation - additional informational labels to store more information
annotations:
title: 'Host out of memory (instance {{ $labels.instance }})'
description: 'Node memory is filling up (< 25% left)\n VALUE = {{ $value }}\n LABELS: {{ $labels }}'
# Labels - additional labels to be attached to the alert
labels:
severity: 'warning'
- alert: HostHighCpuLoad
# Condition for alerting
expr: (sum by (instance) (irate(node_cpu{job="node_exporter_metrics",mode="idle"}[5m]))) > 80
for: 5m
# Annotation - additional informational labels to store more information
annotations:
title: 'Host high CPU load (instance {{ $labels.instance }})'
description: 'CPU load is > 30%\n VALUE = {{ $value }}\n LABELS: {{ $labels }}'
# Labels - additional labels to be attached to the alert
labels:
severity: 'warning'
- alert: HostOutOfDiskSpace
# Condition for alerting
expr: (node_filesystem_avail{mountpoint="/"} * 100) / node_filesystem_size{mountpoint="/"} < 70
for: 5m
# Annotation - additional informational labels to store more information
annotations:
title: 'Host out of disk space (instance {{ $labels.instance }})'
description: 'Disk is almost full (< 50% left)\n VALUE = {{ $value }}\n LABELS: {{ $labels }}'
【问题讨论】:
标签: prometheus monitoring prometheus-alertmanager