概述目标:将多集群Prometheus数据通过Thanos统一归档到对象存储并进行长期保留与跨集群查询。组件:Thanos Sidecar、Store Gateway、Compactor、Query/Query Frontend、对象存储(S3)。核心与实战对象存储配置(`objstore.yml`):type: S3
config:
bucket: "thanos-data"
endpoint: "s3.amazonaws.com"
region: "us-east-1"
access_key: "${S3_ACCESS_KEY}"
secret_key: "${S3_SECRET_KEY}"
insecure: false
Prometheus + Thanos Sidecar:prometheus:
storage.tsdb.path: /prometheus
--web.enable-admin-api
thanos sidecar \
--http-address=:10902 \
--grpc-address=:10901 \
--prometheus.url=http://prometheus:9090 \
--tsdb.path=/prometheus \
--objstore.config-file=/etc/thanos/objstore.yml
Store Gateway与Query:thanos store \
--http-address=:10906 \
--grpc-address=:10905 \
--objstore.config-file=/etc/thanos/objstore.yml
thanos query \
--http-address=:10904 \
--grpc-address=:10903 \
--store=sidecar1:10901 \
--store=store-gw:10905
Compactor(压缩与下采样):thanos compact \
--objstore.config-file=/etc/thanos/objstore.yml \
--http-address=:10908 \
--retention.resolution-raw=90d \
--retention.resolution-5m=365d \
--retention.resolution-1h=730d
示例远端写与查询:prometheus.yml:
remote_write:
- url: http://thanos-receive:19291/api/v1/receive
curl "http://thanos-query:10904/api/v1/query?query=rate(http_requests_total[5m])"
多集群标签对齐:-- 通过`external_labels`设定 cluster/region 区分来源
验证与监控组件健康:查询`/metrics`与UI,确保sidecar、store、query、compact正常;对象存储目录增长。查询效果:跨集群查询合并结果;观察长周期数据下采样有效。成本治理:设置合理保留与压缩;监控S3费用与请求量。常见误区未配置`external_labels`导致来源混淆;需统一标签规范。Compactor与Store权限不足导致错误;需对象存储读写权限。忽视下采样与保留策略导致成本上升;需按查询需求调优。结语Thanos将多集群Prometheus数据实现统一存储与查询,结合合理保留与下采样可在保障可观测性的同时控制成本。

发表评论 取消回复