View Issue Details
|ID||Project||Category||View Status||Date Submitted||Last Update|
|0017405||CentOS-8||keepalived||public||2020-05-27 11:34||2020-12-02 14:56|
|Summary||0017405: keepalived stops working after a while|
|Description||Sometimes keepalived daemons stop participating in the group and start using a lot of CPU. This results in good cluster members losing the floating IP and less healthy (broken) members receiving it.|
Looks like this upstream issue: https://github.com/acassen/keepalived/issues/1364
|Steps To Reproduce||1. Set up keepalived with frequent check scripts. A vrrp_script checking for a process with "killall -0" every second is a common pattern.|
2. After some hours/days/weeks keepalived reports something like 'Track script prcheck_nginx is already running, expect idle - skipping run'.
3. There are no more logs, the daemon uses 100% of CPU and no longer advertises its state.
|Additional Information||We have built a new keepalived package from latest sources, but it will be some time before we can say if the problem is gone.|
|Tags||No tags attached.|
Still in CentOS 8.2.2004
Bug is fixed in keepalived 2.0.12 (as noted in bug 0017088)
I have used the workaround with setting a timeout on the keepalived check script, but today keepalived went to 100% cpu and caused problems.
With no such timeout, this always happens after a time.