如何使用 Grafana Loki 警报规则并透过 Alertmanager 发送警告

我们会把 Loki 的警报规则发送到 Alertmanager 来进行管理,包括静音、删除重複数据与分组,并将它们路由到正确的接收器,例如电子邮件或 LINE Notify。

设置警报和通知的主要步骤如下

设置 Alertmanager配置 Loki 与 Alertmanager 对话在 Loki 中创建警报规则

设置 Alertmanager
如何安装 Alertmanager 可以参考这篇文章

修改 alertmanager.yml 配置文件

sudo vi /opt/alertmanager/alertmanager-0.25.0.linux-amd64/alertmanager.yml

新增一组接收器 team-infra-mails,透过电子邮件来发送警报。

global:  smtp_smarthost: 'your_smtp_ip:your_port'  smtp_from: 'your_from_mail_address'  smtp_require_tls: falseroute:  group_by: ['alertname']  group_wait: 30s  group_interval: 5m  repeat_interval: 1h  receiver: 'team-infra-mails'receivers:  - name: 'team-infra-mails'    email_configs:      - to: 'your_to_mail_address'        send_resolved: true# Inhibition rules allow to mute a set of alerts given that another alert is firing.# We use this to mute any warning-level notifications if the same alert is already critical.inhibit_rules:  - source_match:      severity: 'critical'    target_match:      severity: 'warning'    # Apply inhibition if the alertname is the same.    # CAUTION:    #   If all label names listed in `equal` are missing    #   from both the source and target alerts,    #   the inhibition rule will apply!    equal: ['alertname', 'dev', 'instance']

记得重启 Alertmanager 服务

sudo service alertmanager restart

配置 Loki 与 Alertmanager 对话
编辑 Loki 的配置文件

sudo vi /opt/loki/loki-local-config.yaml

修改 rules_directory 指向您存放警报规则的资料夹

common:  path_prefix: /tmp/loki  storage:    filesystem:      chunks_directory: /tmp/loki/chunks      rules_directory: /tmp/loki/rules  replication_factor: 1  ring:    instance_addr: 127.0.0.1    kvstore:      store: inmemory

注意 filesystem 的 chunks_directory 与 rules_directory 的路径为 /tmp,代表重开后资料就会消失,若需要保留的数据记得自行修改。

修改 alertmanager_url 指向您安装的伺服器

ruler:  alertmanager_url: http://localhost:9093

记得重启 Loki 服务

sudo service loki restart

在 /tmp/loki/rules 底下建立 fake 资料夹

sudo mkdir /tmp/loki/rules/fake

为什么要建立 fake 资料夹?
主要是因为 Loki 支援多租户模式,单租户模式下 fake 是预设的用户名称。若您开启多租户模式,请记得透过用户名称区隔开来。

建立警报规则
我们使用资料库或者资料表执行 CREATE、ALTER 或 DROP 作为演示範例。

sudo vi /tmp/loki/rules/fake/mssql-ddl-alert.yml

文件内容如下

groups:  - name: mssql-object-created    rules:      - alert: mssql-object-created        expr: |          count_over_time({computer=~"your_mssql_server", source="MSSQLSERVER", eventID="33205"}            | pattern `<_>event_time:<event_time>\n<_>`            | pattern `<_>action_id:<action_id>\n<_>`            | label_format action_id=`{{.action_id | trim | replace "CR" "CREATE" | replace "AL" "ALTER" | replace "DR" "DROP"}}`            | action_id ="CREATE"            | pattern `<_>class_type:<class_type>\n<_>`            | label_format class_type=`{{.class_type | trim | replace "DB" "DATABASE" | replace "U" "TABLE" | replace "V" "VIEW" | replace "P" "STORED PROCEDURE"}}`            | pattern `<_>database_name:<database_name>\n<_>`            | database_name !~`(tempdb)`            | pattern `<_>object_name:<object_name>\n<_>`            | pattern `<_>schema_name:<schema_name>\n<_>`            | pattern `<_>server_instance_name:<server_instance_name>\n<_>`            | pattern `<_>server_principal_name:<server_principal_name>\n<_>`            | pattern `<_>statement:<statement>\nadditional_information<_>`            | label_format statement=`{{.statement | replace "\\r\\n" " " | replace "\\r" " " | replace "\\n" " " | replace "\u005c\u005c" "\u005c" | replace "[" "" | replace "]" ""}}` [1m]) > 0        for: 0m        labels:          severity: critical        annotations:          summary: "主机名称: {{ $labels.computer }}\n警示讯息: {{ $labels.object_name }} has been created.\n叙述句: {{ $labels.statement }}\n"  - name: mssql-object-alerted    rules:      - alert: mssql-object-alerted        expr: |          count_over_time({computer=~"your_mssql_server", source="MSSQLSERVER", eventID="33205"}            | pattern `<_>event_time:<event_time>\n<_>`            | pattern `<_>action_id:<action_id>\n<_>`            | label_format action_id=`{{.action_id | trim | replace "CR" "CREATE" | replace "AL" "ALTER" | replace "DR" "DROP"}}`            | action_id ="ALTER"            | pattern `<_>class_type:<class_type>\n<_>`            | label_format class_type=`{{.class_type | trim | replace "DB" "DATABASE" | replace "U" "TABLE" | replace "V" "VIEW" | replace "P" "STORED PROCEDURE"}}`            | pattern `<_>database_name:<database_name>\n<_>`            | database_name !~`(tempdb)`            | pattern `<_>object_name:<object_name>\n<_>`            | pattern `<_>schema_name:<schema_name>\n<_>`            | pattern `<_>server_instance_name:<server_instance_name>\n<_>`            | pattern `<_>server_principal_name:<server_principal_name>\n<_>`            | pattern `<_>statement:<statement>\nadditional_information<_>`            | label_format statement=`{{.statement | replace "\\r\\n" " " | replace "\\r" " " | replace "\\n" " " | replace "\u005c\u005c" "\u005c" | replace "[" "" | replace "]" ""}}` [1m]) > 0        for: 0m        labels:          severity: critical        annotations:          summary: "主机名称: {{ $labels.computer }}\n警示讯息: {{ $labels.object_name }} has been alerted.\n叙述句: {{ $labels.statement }}\n"  - name: mssql-object-dropped    rules:      - alert: mssql-object-dropped        expr: |          count_over_time({computer=~"your_mssql_server", source="MSSQLSERVER", eventID="33205"}            | pattern `<_>event_time:<event_time>\n<_>`            | pattern `<_>action_id:<action_id>\n<_>`            | label_format action_id=`{{.action_id | trim | replace "CR" "CREATE" | replace "AL" "ALTER" | replace "DR" "DROP"}}`            | action_id ="DROP"            | pattern `<_>class_type:<class_type>\n<_>`            | label_format class_type=`{{.class_type | trim | replace "DB" "DATABASE" | replace "U" "TABLE" | replace "V" "VIEW" | replace "P" "STORED PROCEDURE"}}`            | pattern `<_>database_name:<database_name>\n<_>`            | database_name !~`(tempdb)`            | pattern `<_>object_name:<object_name>\n<_>`            | pattern `<_>schema_name:<schema_name>\n<_>`            | pattern `<_>server_instance_name:<server_instance_name>\n<_>`            | pattern `<_>server_principal_name:<server_principal_name>\n<_>`            | pattern `<_>statement:<statement>\nadditional_information<_>`            | label_format statement=`{{.statement | replace "\\r\\n" " " | replace "\\r" " " | replace "\\n" " " | replace "\u005c\u005c" "\u005c" | replace "[" "" | replace "]" ""}}` [1m]) > 0        for: 0m        labels:          severity: critical        annotations:          summary: "主机名称: {{ $labels.computer }}\n警示讯息: {{ $labels.object_name }} has been dropped.\n叙述句: {{ $labels.statement }}\n"

利用下列的 T-SQL 指令码来触发警报规则

USE [Database_1]GOCREATE VIEW [dbo].[View_1118]  AS  SELECT *  FROM [dbo].[Table_1]GOALTER VIEW [dbo].[View_1118]  AS  SELECT *  FROM [dbo].[Table_2]GODROP VIEW [dbo].[View_1118]GO

查看 Alermanager 是否接收到警报

检视邮件伺服器,Alermanager 有确实的透过电子邮件进行发送。

若有设定 send_resolved,Alermanager 也会发送警报解除的通知。

receivers:  - name: 'team-infra-mails'    email_configs:      - to: 'your_to_mail_address'        send_resolved: true

使用 LINE Notify
由于目前公司主要还是透过 LINE 来进行协同合作,因此把警报推播到 LINE Notify 来进行警告吧。

如何申请 LINE Notify 发行存取权杖可以参考这篇文章

很可惜的是目前 Alermanager 的 Receiver 并不支援 LINE Notify

# The unique name of the receiver.name: <string># Configurations for several notification integrations.email_configs:  [ - <email_config>, ... ]opsgenie_configs:  [ - <opsgenie_config>, ... ]pagerduty_configs:  [ - <pagerduty_config>, ... ]pushover_configs:  [ - <pushover_config>, ... ]slack_configs:  [ - <slack_config>, ... ]sns_configs:  [ - <sns_config>, ... ]victorops_configs:  [ - <victorops_config>, ... ]webhook_configs:  [ - <webhook_config>, ... ]wechat_configs:  [ - <wechat_config>, ... ]telegram_configs:  [ - <telegram_config>, ... ]webex_configs:  [ - <webex_config>, ... ]

不过还好可以使用 webhook 的方式来串接 LINE Notify
感谢泰国曼谷的大大已经帮我们种好树了
https://github.com/be99inner/line-notify-gateway/

不过 message 是写死在 app.py 里面有些可惜

def firing_alert(request):    if request.json['status'] == 'firing':        icon = "⛔⛔⛔ 

关于作者: 网站小编

码农网专注IT技术教程资源分享平台,学习资源下载网站,58码农网包含计算机技术、网站程序源码下载、编程技术论坛、互联网资源下载等产品服务,提供原创、优质、完整内容的专业码农交流分享平台。

热门文章