我们会把 Loki 的警报规则发送到 Alertmanager 来进行管理,包括静音、删除重複数据与分组,并将它们路由到正确的接收器,例如电子邮件或 LINE Notify。
设置警报和通知的主要步骤如下
设置 Alertmanager配置 Loki 与 Alertmanager 对话在 Loki 中创建警报规则设置 Alertmanager
如何安装 Alertmanager 可以参考这篇文章
修改 alertmanager.yml 配置文件
sudo vi /opt/alertmanager/alertmanager-0.25.0.linux-amd64/alertmanager.yml
新增一组接收器 team-infra-mails,透过电子邮件来发送警报。
global: smtp_smarthost: 'your_smtp_ip:your_port' smtp_from: 'your_from_mail_address' smtp_require_tls: falseroute: group_by: ['alertname'] group_wait: 30s group_interval: 5m repeat_interval: 1h receiver: 'team-infra-mails'receivers: - name: 'team-infra-mails' email_configs: - to: 'your_to_mail_address' send_resolved: true# Inhibition rules allow to mute a set of alerts given that another alert is firing.# We use this to mute any warning-level notifications if the same alert is already critical.inhibit_rules: - source_match: severity: 'critical' target_match: severity: 'warning' # Apply inhibition if the alertname is the same. # CAUTION: # If all label names listed in `equal` are missing # from both the source and target alerts, # the inhibition rule will apply! equal: ['alertname', 'dev', 'instance']
记得重启 Alertmanager 服务
sudo service alertmanager restart
配置 Loki 与 Alertmanager 对话
编辑 Loki 的配置文件
sudo vi /opt/loki/loki-local-config.yaml
修改 rules_directory 指向您存放警报规则的资料夹
common: path_prefix: /tmp/loki storage: filesystem: chunks_directory: /tmp/loki/chunks rules_directory: /tmp/loki/rules replication_factor: 1 ring: instance_addr: 127.0.0.1 kvstore: store: inmemory
注意 filesystem 的 chunks_directory 与 rules_directory 的路径为 /tmp,代表重开后资料就会消失,若需要保留的数据记得自行修改。
修改 alertmanager_url 指向您安装的伺服器
ruler: alertmanager_url: http://localhost:9093
记得重启 Loki 服务
sudo service loki restart
在 /tmp/loki/rules 底下建立 fake 资料夹
sudo mkdir /tmp/loki/rules/fake
为什么要建立 fake 资料夹?
主要是因为 Loki 支援多租户模式,单租户模式下 fake 是预设的用户名称。若您开启多租户模式,请记得透过用户名称区隔开来。
建立警报规则
我们使用资料库或者资料表执行 CREATE、ALTER 或 DROP 作为演示範例。
sudo vi /tmp/loki/rules/fake/mssql-ddl-alert.yml
文件内容如下
groups: - name: mssql-object-created rules: - alert: mssql-object-created expr: | count_over_time({computer=~"your_mssql_server", source="MSSQLSERVER", eventID="33205"} | pattern `<_>event_time:<event_time>\n<_>` | pattern `<_>action_id:<action_id>\n<_>` | label_format action_id=`{{.action_id | trim | replace "CR" "CREATE" | replace "AL" "ALTER" | replace "DR" "DROP"}}` | action_id ="CREATE" | pattern `<_>class_type:<class_type>\n<_>` | label_format class_type=`{{.class_type | trim | replace "DB" "DATABASE" | replace "U" "TABLE" | replace "V" "VIEW" | replace "P" "STORED PROCEDURE"}}` | pattern `<_>database_name:<database_name>\n<_>` | database_name !~`(tempdb)` | pattern `<_>object_name:<object_name>\n<_>` | pattern `<_>schema_name:<schema_name>\n<_>` | pattern `<_>server_instance_name:<server_instance_name>\n<_>` | pattern `<_>server_principal_name:<server_principal_name>\n<_>` | pattern `<_>statement:<statement>\nadditional_information<_>` | label_format statement=`{{.statement | replace "\\r\\n" " " | replace "\\r" " " | replace "\\n" " " | replace "\u005c\u005c" "\u005c" | replace "[" "" | replace "]" ""}}` [1m]) > 0 for: 0m labels: severity: critical annotations: summary: "主机名称: {{ $labels.computer }}\n警示讯息: {{ $labels.object_name }} has been created.\n叙述句: {{ $labels.statement }}\n" - name: mssql-object-alerted rules: - alert: mssql-object-alerted expr: | count_over_time({computer=~"your_mssql_server", source="MSSQLSERVER", eventID="33205"} | pattern `<_>event_time:<event_time>\n<_>` | pattern `<_>action_id:<action_id>\n<_>` | label_format action_id=`{{.action_id | trim | replace "CR" "CREATE" | replace "AL" "ALTER" | replace "DR" "DROP"}}` | action_id ="ALTER" | pattern `<_>class_type:<class_type>\n<_>` | label_format class_type=`{{.class_type | trim | replace "DB" "DATABASE" | replace "U" "TABLE" | replace "V" "VIEW" | replace "P" "STORED PROCEDURE"}}` | pattern `<_>database_name:<database_name>\n<_>` | database_name !~`(tempdb)` | pattern `<_>object_name:<object_name>\n<_>` | pattern `<_>schema_name:<schema_name>\n<_>` | pattern `<_>server_instance_name:<server_instance_name>\n<_>` | pattern `<_>server_principal_name:<server_principal_name>\n<_>` | pattern `<_>statement:<statement>\nadditional_information<_>` | label_format statement=`{{.statement | replace "\\r\\n" " " | replace "\\r" " " | replace "\\n" " " | replace "\u005c\u005c" "\u005c" | replace "[" "" | replace "]" ""}}` [1m]) > 0 for: 0m labels: severity: critical annotations: summary: "主机名称: {{ $labels.computer }}\n警示讯息: {{ $labels.object_name }} has been alerted.\n叙述句: {{ $labels.statement }}\n" - name: mssql-object-dropped rules: - alert: mssql-object-dropped expr: | count_over_time({computer=~"your_mssql_server", source="MSSQLSERVER", eventID="33205"} | pattern `<_>event_time:<event_time>\n<_>` | pattern `<_>action_id:<action_id>\n<_>` | label_format action_id=`{{.action_id | trim | replace "CR" "CREATE" | replace "AL" "ALTER" | replace "DR" "DROP"}}` | action_id ="DROP" | pattern `<_>class_type:<class_type>\n<_>` | label_format class_type=`{{.class_type | trim | replace "DB" "DATABASE" | replace "U" "TABLE" | replace "V" "VIEW" | replace "P" "STORED PROCEDURE"}}` | pattern `<_>database_name:<database_name>\n<_>` | database_name !~`(tempdb)` | pattern `<_>object_name:<object_name>\n<_>` | pattern `<_>schema_name:<schema_name>\n<_>` | pattern `<_>server_instance_name:<server_instance_name>\n<_>` | pattern `<_>server_principal_name:<server_principal_name>\n<_>` | pattern `<_>statement:<statement>\nadditional_information<_>` | label_format statement=`{{.statement | replace "\\r\\n" " " | replace "\\r" " " | replace "\\n" " " | replace "\u005c\u005c" "\u005c" | replace "[" "" | replace "]" ""}}` [1m]) > 0 for: 0m labels: severity: critical annotations: summary: "主机名称: {{ $labels.computer }}\n警示讯息: {{ $labels.object_name }} has been dropped.\n叙述句: {{ $labels.statement }}\n"
利用下列的 T-SQL 指令码来触发警报规则
USE [Database_1]GOCREATE VIEW [dbo].[View_1118] AS SELECT * FROM [dbo].[Table_1]GOALTER VIEW [dbo].[View_1118] AS SELECT * FROM [dbo].[Table_2]GODROP VIEW [dbo].[View_1118]GO
查看 Alermanager 是否接收到警报
检视邮件伺服器,Alermanager 有确实的透过电子邮件进行发送。
若有设定 send_resolved,Alermanager 也会发送警报解除的通知。
receivers: - name: 'team-infra-mails' email_configs: - to: 'your_to_mail_address' send_resolved: true
使用 LINE Notify
由于目前公司主要还是透过 LINE 来进行协同合作,因此把警报推播到 LINE Notify 来进行警告吧。
如何申请 LINE Notify 发行存取权杖可以参考这篇文章
很可惜的是目前 Alermanager 的 Receiver 并不支援 LINE Notify
# The unique name of the receiver.name: <string># Configurations for several notification integrations.email_configs: [ - <email_config>, ... ]opsgenie_configs: [ - <opsgenie_config>, ... ]pagerduty_configs: [ - <pagerduty_config>, ... ]pushover_configs: [ - <pushover_config>, ... ]slack_configs: [ - <slack_config>, ... ]sns_configs: [ - <sns_config>, ... ]victorops_configs: [ - <victorops_config>, ... ]webhook_configs: [ - <webhook_config>, ... ]wechat_configs: [ - <wechat_config>, ... ]telegram_configs: [ - <telegram_config>, ... ]webex_configs: [ - <webex_config>, ... ]
不过还好可以使用 webhook 的方式来串接 LINE Notify
感谢泰国曼谷的大大已经帮我们种好树了
https://github.com/be99inner/line-notify-gateway/
不过 message 是写死在 app.py 里面有些可惜
def firing_alert(request): if request.json['status'] == 'firing': icon = "⛔⛔⛔