keepalived采用VRRP(virtual router redundancy protocol),虚拟路由冗余协议,以软件的形式实现服务器热备功能。通常情况下是将两台linux服务器组成一个热备组(master-backup),同一时间热备组内只有一台主服务器(master)提供服务,同时master会虚拟出一个共用IP地址(VIP),这个VIP只存在master上并对外提供服务。如果keepalived检测到master宕机或服务故障,备服务器(backup)会自动接管VIP成为master,keepalived并将master从热备组移除,当master恢复后,会自动加入到热备组,默认再抢占成为master,起到故障转移功能。
Layer4:工作在四层时,keepalived以TCP端口的状态判断服务器是否故障,比如检测mysql 3306端口,如果故障则将这台服务器从热备组移除。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 | 示例: ! Configuration File for keepalived global_defs { notification_email { example@163.com } notification_email_from example@example.com smtp_server smtp_connect_timeout 30 router_id MYSQL_HA } vrrp_instance VI_1 { state BACKUP interface eth1 virtual_router_id 50 nopreempt #当主down时,备接管,主恢复,不自动接管 priority 100 advert_int 1 authentication { auth_type PASS ahth_pass 123 } virtual_ipaddress { #虚拟IP地址 } } virtual_server 3306 { delay_loop 6 # lb_algo rr # lb_kind NAT persistence_timeout 50 protocol TCP real_server 3306 { #监控本机3306端口 weight 1 notify_down /etc/keepalived/kill_keepalived .sh #检测3306端口为down状态就执行此脚本(只有keepalived关闭,VIP才漂移 ) TCP_CHECK { #健康状态检测方式,可针对业务需求调整(TTP_GET|SSL_GET|TCP_CHECK|SMTP_CHECK|MISC_CHECK) connect_timeout 3 nb_get_retry 3 delay_before_retry 3 } } } |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | 示例: ! Configuration File for keepalived global_defs { notification_email { example@163.com } notification_email_from example@example.com smtp_server smtp_connect_timeout 30 router_id MYSQL_HA } vrrp_script check_nginx { script /etc/keepalived/check_nginx .sh #检测脚本 interval 2 #执行间隔时间 } vrrp_instance VI_1 { state BACKUP interface eth1 virtual_router_id 50 nopreempt #当主down时,备接管,主恢复,不自动接管 priority 100 advert_int 1 authentication { auth_type PASS ahth_pass 123 } virtual_ipaddress { #虚拟IP地址 } track_script { #在实例中引用脚本 check_nginx } } |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | 脚本内容如下: # cat /etc/keepalived/check_nginx.sh Count1=` netstat -antp | grep - v grep | grep nginx | wc -l` if [ $Count1 - eq 0 ]; then /usr/local/nginx/sbin/nginx sleep 2 Count2=` netstat -antp | grep - v grep | grep nginx | wc -l` if [ $Count2 - eq 0 ]; then service keepalived stop else exit 0 fi else exit 0 fi |
4.1 HTTP服务状态检测
1 2 3 4 5 6 7 8 9 10 11 12 | HTTP_GET或SSL_GET { url { path /index .html #检测url,可写多个 digest 24326582a86bee478bac72d5af25089e #检测效验码 #digest效验码获取方法:genhash -s IP -p 80 -u http://IP/index.html status_code 200 #检测返回http状态码 } connect_port 80 #连接端口 connect_timeout 3 #连接超时时间 nb_get_retry 3 #重试次数 delay_before_retry 2 #连接间隔时间 } |
4.2 TCP端口状态检测(使用TCP端口服务基本上都可以使用)
1 2 3 4 5 6 | TCP_CHECK { connect_port 80 #健康检测端口,默认为real_server后跟端口 connect_timeout 5 nb_get_retry 3 delay_before_retry 3 } |
4.3 邮件服务器SMTP检测
1 2 3 4 5 6 7 8 9 10 | SMTP_CHECK { #健康检测邮件服务器smtp host { connect_ip connect_port } connect_timeout 5 retry 2 delay_before_retry 3 hello_name "mail.domain.com" } |
4.4 用户自定义脚本检测real_server服务状态
1 2 3 4 5 | MISC_CHECK { misc_path /script .sh #指定外部程序或脚本位置 misc_timeout 3 #执行脚本超时时间 !misc_dynamic #不动态调整服务器权重(weight),如果启用将通过退出状态码动态调整real_server权重值 } |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | 主配置默认邮件通知配置模板如下: global_defs # Block id { notification_email # To: { admin@example1.com ... } # From: from address that will be in header notification_email_from admin@example.com smtp_server # IP smtp_connect_timeout 30 # integer, seconds router_id my_hostname # string identifying the machine, # (doesn't have to be hostname). enable_traps # enable SNMP traps } |
5.1 实例状态通知
a) notify_master :节点变为master时执行
b) notify_backup : 节点变为backup时执行
c) notify_fault : 节点变为故障时执行
5.2 虚拟服务器检测通知
a) notify_up : 虚拟服务器up时执行
b) notify_down : 虚拟服务器down时执行
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 | 示例: ! Configuration File for keepalived global_defs { notification_email { example@163.com } notification_email_from example@example.com smtp_server smtp_connect_timeout 30 router_id MYSQL_HA } vrrp_instance VI_1 { state BACKUP interface eth1 virtual_router_id 50 nopreempt #当主down时,备接管,主恢复,不自动接管 priority 100 advert_int 1 authentication { auth_type PASS ahth_pass 123 } virtual_ipaddress { } notify_master /etc/keepalived/to_master .sh notify_backup /etc/keepalived/to_backup .sh notify_fault /etc/keepalived/to_fault .sh } virtual_server 3306 { delay_loop 6 persistence_timeout 50 protocol TCP real_server 3306 { weight 1 notify_up /etc/keepalived/mysql_up .sh notify_down /etc/keepalived/mysql_down .sh TCP_CHECK { connect_timeout 3 nb_get_retry 3 delay_before_retry 3 } } } |
1) 当服务器改变为主时执行此脚本
1 2 3 4 5 6 | # cat to_master.sh #!/bin/bash Date=$( date +%F " " %T) IP=$( ifconfig eth0 | grep "inet addr" | cut -d ":" -f2 | awk '{print $1}' ) Mail= "baojingtongzhi@163.com" echo "$Date $IP change to master." |mail -s "Master-Backup Change Status" $Mail |
2) 当服务器改变为备时执行此脚本
1 2 3 4 5 6 | # cat to_backup.sh #!/bin/bash Date=$( date +%F " " %T) IP=$( ifconfig eth0 | grep "inet addr" | cut -d ":" -f2 | awk '{print $1}' ) Mail= "baojingtongzhi@163.com" echo "$Date $IP change to backup." |mail -s "Master-Backup Change Status" $Mail |
3) 当服务器改变为故障时执行此脚本
1 2 3 4 5 6 | # cat to_fault.sh #!/bin/bash Date=$( date +%F " " %T) IP=$( ifconfig eth0 | grep "inet addr" | cut -d ":" -f2 | awk '{print $1}' ) Mail= "baojingtongzhi@163.com" echo "$Date $IP change to fault." |mail -s "Master-Backup Change Status" $Mail |
4) 当检测TCP端口3306为不可用时,执行此脚本,杀死keepalived,实现切换
1 2 3 4 5 6 7 | # cat mysql_down.sh #!/bin/bash Date=$( date +%F " " %T) IP=$( ifconfig eth0 | grep "inet addr" | cut -d ":" -f2 | awk '{print $1}' ) Mail= "baojingtongzhi@163.com" pkill keepalived echo "$Date $IP The mysql service failure,kill keepalived." |mail -s "Master-Backup MySQL Monitor" $Mail |
5) 当检测TCP端口3306可用时,执行此脚本
1 2 3 4 5 6 | # cat mysql_up.sh #!/bin/bash Date=$( date +%F " " %T) IP=$( ifconfig eth0 | grep "inet addr" | cut -d ":" -f2 | awk '{print $1}' ) Mail= "baojingtongzhi@163.com" echo "$Date $IP The mysql service is recovery." |mail -s "Master-Backup MySQL Monitor" $Mail |