nagios 多服务器监控服务端与客户端配置

张映发表于 2012-04-21

nagios可以一台机器监控多台服务器，客户端可以是linux，也可以是windows服务器。我现在nagios的监控是这样的。nagios服务端和邮件服务器放在一起，剩下的服务器都装一下nagios的客户端，这样我就可以在服务端的web界面里，看到所有服务器状况，并且还有邮件报警。

服务端ip地址：192.168.1.122

客户端ip地址：192.168.1.120

一，服务端安装nagios-plugins-nrpe

[root@localhost nagios]# ll /usr/lib64/nagios/plugins/ |grep nrpe
-rwxr-xr-x  1 root root    21840 10月 25 2010 check_nrpe
//如果没有显示上面的内容，说明你没有安装

//安装check_nrpe
[root@localhost nagios]# yum install nagios-plugins-nrpe

在这里只是提到了，多服务器监控所要的软件，如果要从头开始来安装nagios，请参考前面几篇博客

二，客户端要安装nrpe

[root@localhost nagios]# yum install nagios-plugin nagios nrpe fcgi-devel fcgi nagios-plugins-all

开始我以为在客户端上安装nrpe就行了，但是不对，这根munin不一样，思维定式啊。客户端不要安装web环境。

三，服务端配置

[root@localhost objects]# cp /etc/nagios/objects/localhost.cfg /etc/nagios/objects/192.168.1.120.cfg
[root@localhost objects]# vim /etc/nagios/objects/192.168.1.120.cfg    //修改配置文件1

define host{
        use                     linux-server
        host_name               ads2               //把localhost改成你知道是哪台标识
        alias                   ads2
        address                 192.168.1.120     //被监控服务器的ip,可以是局域网，也可以是共网的
        }

//把define hostgroup，全注释掉。加上hostgroup，重加载就报错
#define hostgroup{
#        hostgroup_name  linux-servers
#         alias           Linux Servers
#        members         192.168.1.120
#        }

//define service有很多以check_disk为例说一下
define service{
        use                             generic-service        //local-service改成generic-service
        host_name                       ads2                   //localhost改成自定义ads2
        service_description             Root Partition
        check_command                   check_nrpe!check_disk!20%!10%!/
        }

//把上面的check_local_disk改成check_nrpe!check_disk，如果有不带local的，例如：check_ping改成check_nrpe!check_ping

[root@localhost objects]# vim /etc/nagios/objects/commands.cfg //修改配置文件2
//添加以下内容
define command{
 command_name    check_nrpe
 command_line    $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
 }

[root@localhost objects]# vim /etc/nagios/nagios.cfg     //修改配置文件3

cfg_file=/etc/nagios/objects/192.168.1.120.cfg     //加上这一行。

到此服务器的配置就完成了，重新加载一下nagios。/etc/init.d/nagios reload

四，客户端配置

[root@localhost ~]# vim /etc/nagios/nrpe.cfg
allowed_hosts=192.168.1.122,127.0.0.1  //79行，加上服务端的ip，可以是局域网，或者是共网的IP

command[check_users]=/usr/lib64/nagios/plugins/check_users -w 5 -c 10
command[check_load]=/usr/lib64/nagios/plugins/check_load -w 15,10,5 -c 30,25,20
command[check_disk]=/usr/lib64/nagios/plugins/check_disk -w 20% -c 10% -p /dev/mapper/VolGroup-lv_root
command[check_zombie_procs]=/usr/lib64/nagios/plugins/check_procs -w 5 -c 10 -s Z
command[check_procs]=/usr/lib64/nagios/plugins/check_procs -w 150 -c 200
command[check_http]=/usr/lib64/nagios/plugins/check_http -H 127.0.0.1 -w 5 -c 10
command[check_ping]=/usr/lib64/nagios/plugins/check_ping -H 127.0.0.1 -w 3000.0,80% -c 5000.0,100% -p 5
command[check_ssh]=/usr/lib64/nagios/plugins/check_ssh -4 127.0.0.1
command[check_swap]=/usr/lib64/nagios/plugins/check_swap  -w 30% -c 10%

在配置过程中遇到以下错误：

1,NRPE: Command 'check_http' not defined
2,NRPE: Command 'check_ssh' not defined
3,NRPE: Command 'check_ping' not defined
4,NRPE: Command 'check_disk' not defined
5,NRPE: Command 'check_swap' not defined
6,NRPE: Command 'check_procs' not defined

解决方法：

就是/etc/nagios/nrpe.cfg里面添加相应的command[]配置，例如解决check_http not defined只要加一个command[check_http]就行了。

以下是我在命令行下的测试，只要命令行下OK了，nagios的web监控就不会报错了。

[root@localhost objects]# /usr/lib64/nagios/plugins/check_nrpe -H 192.168.1.120 -c check_http
OK - load average: 1.04, 0.31, 0.10|load1=1.040;15.000;30.000;0; load5=0.310;10.000;25.000;0; load15=0.100;5.000;20.000;0; 

[root@localhost ~]# /usr/lib64/nagios/plugins/check_nrpe -H 192.168.1.120 -c check_ping
PING OK - Packet loss = 0%, RTA = 0.04 ms|rta=0.035000ms;3000.000000;5000.000000;0.000000 pl=0%;80;100;0

[root@localhost ~]# /usr/lib64/nagios/plugins/check_nrpe -H 192.168.1.120 -c check_disk
DISK OK - free space: / 41756 MB (87% inode=96%);| /=6080MB;40316;45356;0;50396

[root@localhost ~]# /usr/lib64/nagios/plugins/check_nrpe -H 192.168.1.120 -c check_ssh
SSH OK - OpenSSH_5.3 (protocol 2.0)

[root@localhost ~]# /usr/lib64/nagios/plugins/check_nrpe -H 192.168.1.120 -c check_swap
SWAP OK - 100% free (10047 MB out of 10047 MB) |swap=10047MB;3014;1004;0;10047

[root@localhost ~]# /usr/lib64/nagios/plugins/check_nrpe -H 192.168.1.120 -c check_procs
PROCS CRITICAL: 285 processes

客户端配置很简单，重起一下/etc/init.d/nrpe reload

如果使用了iptables等类似防火墙的话，开一个端口5666，你可以通过netstat来查看端口情况。或者是配置文件。

nagios监控多台服务器

转载请注明
作者:海底苍鹰
地址:http://blog.51yip.com/server/1394.html

文雨留言 (2012年4月23日 07:52 ):

也可以试试zabbix和ossim
kuka 留言 (2012年8月16日 10:00 ):

一个小疑问：
在“四，客户端配置”check_ssh；check_http；check_ping 这三个加上来貌似没有意义

如果按照文中所说的在监控端的client.cfg里指定的是check_nrpe!check_ssh(http/ping)，实际的处理过程是

监控端check_nrpe-->被监控端nrpe--->check_ssh(http/ping)

此时执行命令检查的对象都已经是127.0.0.1,这个不一定能够保证在外部访问时结果也是正确的吧
张映留言 (2012年8月16日 10:19 ):

不加，会报错的。

留下评论

抱歉，发表回复评论您必须登录。

海底苍鹰(tank)博客

－－一步，二步，三步，N步，二行脚印

赞助本站

关于我

留言板

开发手册

linux命令

首页

nagios 多服务器监控服务端与客户端配置

3 条评论

留下评论

分类目录

最近文章

最近评论和留言

登录

海底苍鹰(tank)博客

－－一步，二步，三步，N步，二行脚印

赞助本站 关于我 留言板 开发手册 linux命令 首页

nagios 多服务器监控 服务端与客户端 配置

3 条评论

留下评论

分类目录

最近文章

最近评论和留言

登录

赞助本站

关于我

留言板

开发手册

linux命令

首页

nagios 多服务器监控服务端与客户端配置