View Issue Details

IDProjectCategoryView StatusLast Update
0017405CentOS-8keepalivedpublic2020-12-02 14:56
Reportermihkela Assigned To 
PrioritynormalSeveritymajorReproducibilityrandom
Status newResolutionopen 
Platformx86_64/VMwareOSCentOSOS Version8.1.1911
Product Version8.1.1911 
Summary0017405: keepalived stops working after a while
DescriptionSometimes keepalived daemons stop participating in the group and start using a lot of CPU. This results in good cluster members losing the floating IP and less healthy (broken) members receiving it.

Looks like this upstream issue: https://github.com/acassen/keepalived/issues/1364
Steps To Reproduce1. Set up keepalived with frequent check scripts. A vrrp_script checking for a process with "killall -0" every second is a common pattern.
2. After some hours/days/weeks keepalived reports something like 'Track script prcheck_nginx is already running, expect idle - skipping run'.
3. There are no more logs, the daemon uses 100% of CPU and no longer advertises its state.
Additional InformationWe have built a new keepalived package from latest sources, but it will be some time before we can say if the problem is gone.
TagsNo tags attached.

Activities

sindre

sindre

2020-12-02 14:56

reporter   ~0038026

Still in CentOS 8.2.2004
Bug is fixed in keepalived 2.0.12 (as noted in bug 0017088)

I have used the workaround with setting a timeout on the keepalived check script, but today keepalived went to 100% cpu and caused problems.
With no such timeout, this always happens after a time.

Issue History

Date Modified Username Field Change
2020-05-27 11:34 mihkela New Issue
2020-12-02 14:56 sindre Note Added: 0038026