九五之尊网

Prometheus监控进程

Prometheus监控进程

Prometheus监控进程

process-export主要用来做进程监控,监控进程比如某个服务的监控进程进程数、消耗了多少CPU、监控进程内存等资源。监控进程

一、监控进程process-exporter使用

1.1 下载 process-exporter

process-exporter GibHUB地址
process-exporter 下载地址

process-exporter可以使用命令行参数也可以指定配置文件启动

1.2 配置 process-exporter

vim /usr/local/process-exporter/process_name.yaml #存放脚本的监控进程地方process_names:#  - name: "{ { .Comm}}"#    cmdline:#    - '.+'  - name: "{ { .Matches}}"    cmdline:    - 'nginx' #唯一标识  - name: "{ { .Matches}}"    cmdline:    - '/opt/atlassian/confluence/bin/tomcat-juli.jar'  - name: "{ { .Matches}}"    cmdline:    - 'vsftpd'  - name: "{ { .Matches}}"    cmdline:    - 'redis-server'

示例:

cmdline: 所选进程的唯一标识,ps -ef 可以查询到。监控进程如果改进程不存在,监控进程则不会有该进程的监控进程数据采集到。

例如:>ps -ef | grep redis

redis 监控进程4287 4127 0 Oct31 ? 00:58:12 redis-server *:6379

{ { .Matches}}groupname=”map[:redis]”表示配置到关键字“redis”

1.3 编写启动脚本

vim /usr/lib/systemd/system/process_exporter.service [Unit]Description=Prometheus exporter for processors metrics, written in Go with pluggable metric collectors.Documentation=https://github.com/ncabatoff/process-exporterAfter=network.target  [Service]Type=simpleUser=rootWorkingDirectory=/usr/local/process-exporterExecStart=/usr/local/process-exporter/process-exporter -config.path=/usr/local/process-exporter/process-exporter.yamlRestart=on-failure  [Install]WantedBy=multi-user.target

1.4 启动 procexx-export

systemctl daemon-reloadsystemctl start process_exportersystemctl enable process_exporter

验证监控数据

curl http://localhost:9256/metrics#相关测试的数据# HELP http_response_size_bytes The HTTP response sizes in bytes.# TYPE http_response_size_bytes summaryhttp_response_size_bytes{ handler="prometheus",quantile="0.5"} 2988http_response_size_bytes{ handler="prometheus",quantile="0.9"} 2996http_response_size_bytes{ handler="prometheus",quantile="0.99"} 3006http_response_size_bytes_sum{ handler="prometheus"} 1.34205181e+08http_response_size_bytes_count{ handler="prometheus"} 45188# HELP namedprocess_namegroup_context_switches_total Context switches# TYPE namedprocess_namegroup_context_switches_total counternamedprocess_namegroup_context_switches_total{ ctxswitchtype="nonvoluntary",groupname="map[:bladebit]"} 7.7977455e+07namedprocess_namegroup_context_switches_total{ ctxswitchtype="nonvoluntary",groupname="map[:pw_python.py]"} 2.02666e+06namedprocess_namegroup_context_switches_total{ ctxswitchtype="voluntary",groupname="map[:bladebit]"} 3.335109e+06namedprocess_namegroup_context_switches_total{ ctxswitchtype="voluntary",groupname="map[:pw_python.py]"} 8.22652233e+08# HELP namedprocess_namegroup_cpu_system_seconds_total Cpu system usage in seconds# TYPE namedprocess_namegroup_cpu_system_seconds_total counternamedprocess_namegroup_cpu_system_seconds_total{ groupname="map[:bladebit]"} 94275.01000000017namedprocess_namegroup_cpu_system_seconds_total{ groupname="map[:pw_python.py]"} 64818.93000000004# HELP namedprocess_namegroup_cpu_user_seconds_total Cpu user usage in seconds# TYPE namedprocess_namegroup_cpu_user_seconds_total counternamedprocess_namegroup_cpu_user_seconds_total{ groupname="map[:bladebit]"} 2.42621264299998e+07namedprocess_namegroup_cpu_user_seconds_total{ groupname="map[:pw_python.py]"} 85.29000000000613# HELP namedprocess_namegroup_major_page_faults_total Major page faults# TYPE namedprocess_namegroup_major_page_faults_total counternamedprocess_namegroup_major_page_faults_total{ groupname="map[:bladebit]"} 18261namedprocess_namegroup_major_page_faults_total{ groupname="map[:pw_python.py]"} 1236# HELP namedprocess_namegroup_memory_bytes number of bytes of memory in use# TYPE namedprocess_namegroup_memory_bytes gaugenamedprocess_namegroup_memory_bytes{ groupname="map[:bladebit]",memtype="resident"} 4.46810939392e+11namedprocess_namegroup_memory_bytes{ groupname="map[:bladebit]",memtype="swapped"} 0namedprocess_namegroup_memory_bytes{ groupname="map[:bladebit]",memtype="virtual"} 4.47847292928e+11namedprocess_namegroup_memory_bytes{ groupname="map[:pw_python.py]",memtype="resident"} 1.2959744e+07namedprocess_namegroup_memory_bytes{ groupname="map[:pw_python.py]",memtype="swapped"} 0namedprocess_namegroup_memory_bytes{ groupname="map[:pw_python.py]",memtype="virtual"} 2.4733696e+08

二、prometheus 配置

添加或修改配置

- job_name: 'dev_prometheus'  scrape_interval: 10s  honor_labels: true  metrics_path: '/metrics'  static_configs:  - targets: ['127.0.0.1:9090',监控进程'127.0.0.1:9100']    labels: { cluster: 'dev',type: 'basic',env: 'dev',job: 'prometheus',export: 'prometheus'}  - targets: ['127.0.0.1:9256']    labels: { cluster: 'dev',type: 'process',env: 'dev',job: 'prometheus',export: 'process_exporter'}

重启prometheus服务

curl -X POST http://127.0.0.1:9090/-/reload

三、grafana出图

process-exporter对应的监控进程dashboard为:https://grafana.com/grafana/dashboards/249

效果如下

在这里插入图片描述

四、常用监控规则

进程数

alert: 进程告警expr: sum(namedprocess_namegroup_states) by (cluster,监控进程job,instance) >500for: 20slabels:  severity: warningannotations:  value: 服务器当前已产生 {  $value }} 个进程,大于告警阈值

僵尸进程数

alert: 进程告警expr: sum by(cluster,监控进程 job, instance, groupname) (namedprocess_namegroup_states{ state="Zombie"}) >0for: 1mlabels:  severity: warningannotations:  value: 当前产生 {  $value }} 个僵尸进程

进程重启

alert: 进程重启告警expr: ceil(time() - max by(cluster, job, instance, groupname) (namedprocess_namegroup_oldest_start_time_seconds)) < 60for: 25slabels:  label: alert_once  severity: warningannotations:  value: 进程 {  $value }} 秒前发生重启

进程退出

alert: 进程退出告警expr: up{ groupname=~"^map.*"}[10d])) < 0for: 55slabels:  severity: warningannotations:  value: 进程 {  $labels.export}} 已退出

五、Ansible批量添加

在这里插入图片描述

这里采用Consul注册发现方式,监控进程相关类容可以查询网上

5.1Consul注册脚本

#!/bin/bashservice_name=$1instance_id=$2ip=$3port=$4 curl -X PUT -d '{ "id": "'"$instance_id"'","name": "'"$service_name"'","address": "'"$ip"'","port": '"$port"',"tags": ["'"$service_name"'"],"checks": [{ "http": "http://'"$ip"':'"$port"'","interval": "5s"}]}' http://10.1.8.202:8500/v1/agent/service/register

Ansible剧本脚本

[root@openvpn process]# cat playbook.yml - hosts: Harvester  remote_user: root  gather_facts: no  tasks:    - name: 推送采集器安装包      unarchive: src=process-exporter.tar.gz dest=/usr/local/    - name: 重命名      shell: |        cd /usr/local/         if [ ! -d process-exporter ];then            mv process-exporter-0.4.0.linux-amd64  process-exporter         fi    - name: 查询主机名称      shell: echo "h-`hostname`"      register: name_host    - name: 推送system文件      copy: src=process_exporter.service dest=/usr/lib/systemd/system    - name: 启动服务      systemd: name=process_exporter state=started enabled=yes    - name: 推送注册脚本      copy: src=consul-register.sh dest=/usr/local/process-exporter    - name: 注册当前节点      shell: /bin/sh /usr/local/process-exporter/consul-register.sh {  inventory_hostname }} 9256

在这里插入图片描述

未经允许不得转载:九五之尊网 » Prometheus监控进程