Prometheus 监控与告警规则实战

YBB 8 阅读 0 评论 0 点赞

概览与核心价值Prometheus 提供强大的时序监控与规则计算能力。通过 Recording/Alert 规则与 Alertmanager，可实现服务可用性、延时与错误率的可靠监控与告警。关键规则示例Recording Rules（聚合加速）groups:

- name: recording.rules

rules:

- record: job:http_request_duration_seconds:p95

expr: histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le, job))

- record: job:http_error_rate

expr: sum(rate(http_requests_total{status=~"5.."}[5m])) by (job) / sum(rate(http_requests_total[5m])) by (job)

Alert 规则（错误率与可用性）groups:

- name: alerts.rules

rules:

- alert: HighErrorRate

expr: job:http_error_rate > 0.05

for: 10m

labels:

severity: critical

annotations:

summary: 高错误率告警

description: "{{ $labels.job }} 错误率超过 5% 持续 10 分钟"

- alert: ServiceUnavailable

expr: sum(rate(http_requests_total{status=~"5..|4(0[13])"}[5m])) by (job) > 0

for: 5m

labels:

severity: warning

annotations:

summary: 服务可用性告警

description: "{{ $labels.job }} 存在不可用行为，需排查"

Alertmanager 路由route:

group_by: ['alertname']

group_wait: 30s

group_interval: 5m

repeat_interval: 2h

receivers:

- name: default

webhook_configs:

- url: http://ops.example.com/alerts

参数与验证环境：`Prometheus v2.47+`、`Alertmanager v0.27+`。验证点：录制规则减少查询开销，仪表盘响应更快错误率 > 5% 持续 10m 触发告警，路由正常送达P95 时延曲线可用，趋势与业务观察一致最佳实践使用 Recording Rules 为复杂聚合提供缓存告警设置 `for` 避免瞬时波动误报与服务指标定义（SLI/SLO）对齐阈值与窗口结论通过 Recording 与 Alert 规则，结合 Alertmanager 路由，可构建稳定可靠的监控与告警系统，指标与阈值可验证与可审计。

点赞(0) 打赏

本文分类：可观测性
本文标签：Prometheus 告警规则 Alertmanager Recording Rules 服务可用性延时错误率
浏览次数：8 次浏览
发布日期：2026-02-13 01:56:29
本文链接：https://www.ybb.press/observability/4780.html

上一篇 > Grafana Loki 日志采集与查询实践
下一篇 > 前端可观测性：Web Vitals、RUM 与错误监控

Prometheus 监控与告警规则实战

rules:

Alert 规则（错误率与可用性）groups:

rules:

labels:

annotations:

labels:

annotations:

Alertmanager 路由route:

receivers:

webhook_configs:

评论列表共有 0 条评论

发表评论取消回复

Prometheus 监控与告警规则实战

rules:

Alert 规则（错误率与可用性）groups:

rules:

labels:

annotations:

labels:

annotations:

Alertmanager 路由route:

receivers:

webhook_configs:

Alertmanager路由与静默治理：多环境告警策略

延时队列与优先级调度（定时任务、重试、死信与优先级）

Prometheus 监控与告警规则实战

Kubernetes HPA自动扩缩容深度实践

评论列表 共有 0 条评论

发表评论 取消回复

评论列表共有 0 条评论

发表评论取消回复