IoT 边缘集群Kubernetes Events告警通知进一步配置详解-迪思分享

本站所有内容来自互联网收集，仅供学习和交流，请勿用于商业用途。如有侵权、不妥之处，请第一时间联系我们删除！Q群：

免费资源网 – https://freexyz.cn/

目录目标配置告警内容显示资源名称屏蔽特定的节点和工作负载最终效果

目标

上一篇文章

IoT 边缘集群基于 Kubernetes Events 的告警通知实现

告警恢复通知 – 经过评估无法实现

原因: 告警和恢复是单独完全不相关的事件, 告警是 Warning 级别, 恢复是 Normal 级别, 要开启恢复, 就会导致所有 Normal Events 都会被发送, 这个数量是很恐怖的; 而且, 除非特别有经验和耐心, 否则无法看出哪条 Normal 对应的是告警的恢复.

未恢复进行持续告警 – 默认就带的能力, 无需额外配置.告警内容显示资源名称，比如节点和pod名称

可以设置屏蔽特定的节点和工作负载并可以动态调整

比如，集群001中的节点worker-1做计划性维护，期间停止监控，维护完成后重新开始监控。

配置

告警内容显示资源名称

典型的几类 events:

apiVersion: v1 count: 101557 eventTime: null firstTimestamp: “2022-04-08T03:50:47Z” involvedObject: apiVersion: v1 fieldPath: spec.containers{prometheus} kind: Pod name: prometheus-rancher-monitoring-prometheus-0 namespace: cattle-monitoring-system kind: Event lastTimestamp: “2022-04-14T11:39:19Z” message: Readiness probe failed: Get “http://10.42.0.87:9090/-/ready”: context deadline exceeded (Client.Timeout exceeded while awaiting headers) metadata: creationTimestamp: “2022-04-08T03:51:17Z” name: prometheus-rancher-monitoring-prometheus-0.16e3cf53f0793344 namespace: cattle-monitoring-system reason: Unhealthy reportingComponent: “” reportingInstance: “” source: component: kubelet host: master-1 type: Warning apiVersion: v1 count: 116 eventTime: null firstTimestamp: “2022-04-13T02:43:26Z” involvedObject: apiVersion: v1 fieldPath: spec.containers{grafana} kind: Pod name: rancher-monitoring-grafana-57777cc795-2b2x5 namespace: cattle-monitoring-system kind: Event lastTimestamp: “2022-04-14T11:18:56Z” message: Readiness probe failed: Get “http://10.42.0.90:3000/api/health”: context deadline exceeded (Client.Timeout exceeded while awaiting headers) metadata: creationTimestamp: “2022-04-14T11:18:57Z” name: rancher-monitoring-grafana-57777cc795-2b2x5.16e5548dd2523a13 namespace: cattle-monitoring-system reason: Unhealthy reportingComponent: “” reportingInstance: “” source: component: kubelet host: master-1 type: Warning apiVersion: v1 count: 20958 eventTime: null firstTimestamp: “2022-04-11T10:34:51Z” involvedObject: apiVersion: v1 fieldPath: spec.containers{lb-port-1883} kind: Pod name: svclb-emqx-dt22t namespace: emqx kind: Event lastTimestamp: “2022-04-14T11:39:48Z” message: Back-off restarting failed container metadata: creationTimestamp: “2022-04-11T10:34:51Z” name: svclb-emqx-dt22t.16e4d11e2b9efd27 namespace: emqx reason: BackOff reportingComponent: “” reportingInstance: “” source: component: kubelet host: worker-1 type: Warning apiVersion: v1 count: 21069 eventTime: null firstTimestamp: “2022-04-11T10:34:48Z” involvedObject: apiVersion: v1 fieldPath: spec.containers{lb-port-80} kind: Pod name: svclb-traefik-r5p8t namespace: kube-system kind: Event lastTimestamp: “2022-04-14T11:44:59Z” message: Back-off restarting failed container metadata: creationTimestamp: “2022-04-11T10:34:48Z” name: svclb-traefik-r5p8t.16e4d11daf0b79ce namespace: kube-system reason: BackOff reportingComponent: “” reportingInstance: “” source: component: kubelet host: worker-1 type: Warning { “metadata”: { “name”: “event-exporter-79544df9f7-xj4t5.16e5c540dc32614f”, “namespace”: “monitoring”, “uid”: “baf2f642-2383-4e22-87e0-456b6c3eaf4e”, “resourceVersion”: “14043444”, “creationTimestamp”: “2022-04-14T13:08:40Z” }, “reason”: “Pulled”, “message”: “Container image “ghcr.io/opsgenie/kubernetes-event-exporter:v0.11″ already present on machine”, “source”: { “component”: “kubelet”, “host”: “worker-2” }, “firstTimestamp”: “2022-04-14T13:08:40Z”, “lastTimestamp”: “2022-04-14T13:08:40Z”, “count”: 1, “type”: “Normal”, “eventTime”: null, “reportingComponent”: “”, “reportingInstance”: “”, “involvedObject”: { “kind”: “Pod”, “namespace”: “monitoring”, “name”: “event-exporter-79544df9f7-xj4t5”, “uid”: “b77d3e13-fa9e-484b-8a5a-d1afc9edec75”, “apiVersion”: “v1”, “resourceVersion”: “14043435”, “fieldPath”: “spec.containers{event-exporter}”, “labels”: { “app”: “event-exporter”, “pod-template-hash”: “79544df9f7”, “version”: “v1” } } }

我们可以把更多的字段加入到告警信息中, 其中就包括:

节点: {{ Source.Host }}Pod: {{ .InvolvedObject.Name }}

综上, 修改后的event-exporter-cfg yaml 如下:

apiVersion: v1 kind: ConfigMap metadata: name: event-exporter-cfg namespace: monitoring resourceVersion: 5779968 data: config.yaml: | logLevel: error logFormat: json route: routes: – match: – receiver: “dump” – drop: – type: “Normal” match: – receiver: “feishu” receivers: – name: “dump” stdout: {} – name: “feishu” webhook: endpoint: “https://open.feishu.cn/open-apis/bot/v2/hook/…” headers: Content-Type: application/json layout: msg_type: interactive card: config: wide_screen_mode: true enable_forward: true header: title: tag: plain_text content: xxx测试K3S集群告警 template: red elements: – tag: div text: tag: lark_md content: “**EventID:** {{ .UID }}n**EventNamespace:** {{ .InvolvedObject.Namespace }}n**EventName:** {{ .InvolvedObject.Name }}n**EventType:** {{ .Type }}n**EventKind:** {{ .InvolvedObject.Kind }}n**EventReason:** {{ .Reason }}n**EventTime:** {{ .LastTimestamp }}n**EventMessage:** {{ .Message }}n**EventComponent:** {{ .Source.Component }}n**EventHost:** {{ .Source.Host }}n**EventLabels:** {{ toJson .InvolvedObject.Labels}}n**EventAnnotations:** {{ toJson .InvolvedObject.Annotations}}”

屏蔽特定的节点和工作负载

比如，集群001中的节点worker-1做计划性维护，期间停止监控，维护完成后重新开始监控。

继续修改event-exporter-cfg yaml 如下:

apiVersion: v1 kind: ConfigMap metadata: name: event-exporter-cfg namespace: monitoring data: config.yaml: | logLevel: error logFormat: json route: routes: – match: – receiver: “dump” – drop: – type: “Normal” – source: host: “worker-1” – namespace: “cattle-monitoring-system” – name: “*emqx*” – kind: “Pod|Deployment|ReplicaSet” – labels: version: “dev” match: – receiver: “feishu” receivers: – name: “dump” stdout: {} – name: “feishu” webhook: endpoint: “https://open.feishu.cn/open-apis/bot/v2/hook/…” headers: Content-Type: application/json layout: msg_type: interactive card: config: wide_screen_mode: true enable_forward: true header: title: tag: plain_text content: xxx测试K3S集群告警 template: red elements: – tag: div text: tag: lark_md content: “**EventID:** {{ .UID }}n**EventNamespace:** {{ .InvolvedObject.Namespace }}n**EventName:** {{ .InvolvedObject.Name }}n**EventType:** {{ .Type }}n**EventKind:** {{ .InvolvedObject.Kind }}n**EventReason:** {{ .Reason }}n**EventTime:** {{ .LastTimestamp }}n**EventMessage:** {{ .Message }}n**EventComponent:** {{ .Source.Component }}n**EventHost:** {{ .Source.Host }}n**EventLabels:** {{ toJson .InvolvedObject.Labels}}n**EventAnnotations:** {{ toJson .InvolvedObject.Annotations}}”

默认的 drop 规则为: – type: “Normal”, 即不对 Normal 级别进行告警;

现在加入以下规则:

– source: host: “worker-1” – namespace: “cattle-monitoring-system” – name: “*emqx*” – kind: “Pod|Deployment|ReplicaSet” – labels: version: “dev” … host: “worker-1”: 不对节点worker-1 做告警;… namespace: “cattle-monitoring-system”: 不对 NameSpace: cattle-monitoring-system 做告警;… name: “*emqx*”: 不对 name(name 往往是 pod name) 包含 emqx 的做告警kind: “Pod|Deployment|ReplicaSet”: 不对 Pod Deployment ReplicaSet 做告警(也就是不关注应用, 组件相关的告警)…version: “dev”: 不对 label 含有 version: “dev” 的做告警(可以通过它屏蔽特定的应用的告警)

最终效果

如下图:

以上就是IoT 边缘集群Kubernetes Events告警通知进一步配置详解的详细内容，更多关于IoT Kubernetes Events告警的资料请关注其它相关文章！

免费资源网 – https://freexyz.cn/

迪思分享版权声明 ① 本网站名称：❤迪思分享❤ 本站永久网址：▶https://www.dsary.com◀
② 如果您喜欢本站，点击这儿

开通VIP，同时按Ctrl+D保存网页
③ 在浏览网站中可能会帮助到您：

④ 本站接受投稿，同时也开启了创作分成，投稿用户只需自行设置收费即可！点击查看如果需要投稿，请点击投稿发布文章！
⑤ 本站一律禁止以任何方式发布或转载任何违法的相关信息，如果发现请点击上方联系方式进行举报！情况如实，可获得本站一个月的VIP
⑥ 本站资源大多存储在云盘，如发现链接失效，请联系我们我们会第一时间更新。如遇压缩包需解压密码，一般为：www.dsary.com 丨 www.syymw.com请知悉！
⑦ 修改版本安卓及电脑软件，加群提示为修改者自留，非本站信息，注意鉴别！资源来源于网络，仅供大家学习与参考，请于下载后24小时内删除；
⑧ 若作商业用途，请联系原作者授权，若本站侵犯了您的权益请联系站长进行删除处理；可联系上方QQ或进入QQ群进行反馈！
⑨互联网的本质是自由与分享，我们真诚的希望，每一份有价值的正能量能够在互联网中自由传播。

THE END

编程教程

IoT 边缘集群Kubernetes Events告警通知进一步配置详解

目标

配置

告警内容显示资源名称

屏蔽特定的节点和工作负载

最终效果

请登录后发表评论