View Issue Details

IDProjectCategoryView StatusLast Update
0013961CentOS-7kernelpublic2017-10-23 21:19
Reporterstefanlasiewski 
PrioritynormalSeverityminorReproducibilityalways
Status resolvedResolutionfixed 
Product Version7.4.1708 
Target VersionFixed in Version 
Summary0013961: Bonded interfaces throwing thousands of errors: "kernel: bond1: link status up for interface enp2s0f0, enabling it in 200 ms"
DescriptionThis host is a bare metal server with dual 10G interfaces (bond0) and dual 1G interfaces (bond1). We've configured bonding for both pairs of interfaces.

On September 18, we updated to kernel-3.10.0-693.2.2.el7.x86_64. After rebooting the host, the Kernel started throwing out thousands upon thousands of errors like this:

Sep 18 16:53:23 host03 kernel: bond1: link status up for interface enp2s0f0, enabling it in 200 ms
Sep 18 16:53:23 host03 kernel: bond0: link status up for interface ens3f1, enabling it in 200 ms
Sep 18 16:53:23 host03 kernel: bond1: link status up for interface enp2s0f0, enabling it in 200 ms
Sep 18 16:53:23 host03 kernel: bond0: link status up for interface ens3f1, enabling it in 200 ms
Sep 18 16:53:23 host03 kernel: bond1: link status up for interface enp2s0f0, enabling it in 200 ms
Sep 18 16:53:23 host03 kernel: bond0: link status up for interface ens3f1, enabling it in 200 ms

The parameters for both bond0 & bond1 are fairly standard. Here's the settings from bond0.

DEVICE=bond0
NAME=bond0
BONDING_MASTER=yes
TYPE=Bond
ONBOOT=yes
BOOTPROTO=none
BONDING_OPTS="mode=balance-xor miimon=100 downdelay=200 updelay=200"
MTU=9000
NM_CONTROLLED=no
USERCTL=no

Each of the child interfaces for bond0, enp2s0f0 & enp2s0f1, are configured like:

DEVICE=enp2s0f0
ONBOOT=yes
BOOTPROTO=none
NM_CONTROLLED=no
USERCTL=no
MASTER=bond0
SLAVE=yes
MTU=9000
IPV6_AUTOCONF=no
Steps To Reproduce1. Configure bonding on your system. Add flags like BONDING_OPTS="mode=balance-xor miimon=100 downdelay=200 updelay=200"
2. Update system to kernel-3.10.0-693.2.2.el7.x86_64 and reboot the node to activate the new Kernel.
3. Look for the error in your system logs with `grep "link status up for interface" /var/log/messages"`.

4. Change the "updelay=200" flag to "updelay=0" and restart the network.
5. Watch /var/log/messages . The "link status up for interface" errors should stop.
Additional InformationGoogling for this error on the internet only reveals two recent articles about this error.

1. The Red Hat Knowledgebase shows the same error here (Solution requires a RHEL subscription). The public version of that page suggests this is an issue with the MII monitor:

https://access.redhat.com/solutions/3152981

2. CoreOS shows the bug for their OS, running Kernel 4.x. They also suggest this is an error with the MII monitor: https://github.com/coreos/bugs/issues/2065


To workaround this bug, I followed the solution presented by CoreOS, and disabled the "updelay=" flag by setting this:

    BONDING_OPTS="mode=balance-xor miimon=100 downdelay=200 updelay=200"

to 0, like this:

    BONDING_OPTS="mode=balance-xor miimon=100 downdelay=200 updelay=0"

However, I believe setting `updelay=0` effectively disables that feature, which can have unintended consequences.

A recent Kernel patch seems to talk about this bug, but I'm not certain. Red Hat or CentOS don't seem to have any public bugs filed for this issue, and I assume it hasn't been incorporated into the EL7 Kernel.

https://github.com/torvalds/linux/commit/d94708a553022bf012fa95af10532a134eeb5a52#diff-4fd608f7ba30987ab64415586df797f7
TagsNo tags attached.
abrt_hash
URL

Activities

toracat

toracat

2017-10-02 23:59

manager   ~0030284

Given the fact that RH is aware of the problem and a patch is available, I'd think an updated kernel is forthcoming.

In the meantime, we can provide the centosplus kernel with the patch. Or else, since the referenced patch was applied to kernel v4.13, one can use ELRepo's kernel-ml as a temporary solution.
toracat

toracat

2017-10-03 00:12

manager  

centos-linux-3.10-bonding-bug13961.patch (1,511 bytes)
centosplus patch [bug#13691]

commit	d94708a553022bf012fa95af10532a134eeb5a52

bonding: commit link status change after propose
Commit de77ecd4ef02 ("bonding: improve link-status update in mii-monitoring")
moves link status commitment into bond_mii_monitor(), but it still relies
on the return value of bond_miimon_inspect() as the hint. We need to return
non-zero as long as we propose a link status change.

Fixes: de77ecd4ef02 ("bonding: improve link-status update in mii-monitoring")
Reported-by: Benjamin Gilbert <benjamin.gilbert@coreos.com>
Tested-by: Benjamin Gilbert <benjamin.gilbert@coreos.com>
Cc: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Applied-by: Akemi Yagi <toracat@centos.org>

--- a/drivers/net/bonding/bond_main.c	2017-09-09 00:06:42.000000000 -0700
+++ b/drivers/net/bonding/bond_main.c	2017-10-02 17:05:48.436241247 -0700
@@ -2063,6 +2063,7 @@ static int bond_miimon_inspect(struct bo
 				continue;
 
 			bond_propose_link_state(slave, BOND_LINK_FAIL);
+			commit++;
 			slave->delay = bond->params.downdelay;
 			if (slave->delay) {
 				netdev_info(bond->dev, "link status down for %sinterface %s, disabling it in %d ms\n",
@@ -2101,6 +2102,7 @@ static int bond_miimon_inspect(struct bo
 				continue;
 
 			bond_propose_link_state(slave, BOND_LINK_BACK);
+			commit++;
 			slave->delay = bond->params.updelay;
 
 			if (slave->delay) {
toracat

toracat

2017-10-19 16:18

manager   ~0030414

The patch is in the kernel update released today (kernel-3.10.0-693.5.2.el7).

Issue History

Date Modified Username Field Change
2017-10-02 22:37 stefanlasiewski New Issue
2017-10-02 23:59 toracat Status new => assigned
2017-10-02 23:59 toracat Note Added: 0030284
2017-10-03 00:12 toracat File Added: centos-linux-3.10-bonding-bug13961.patch
2017-10-19 16:18 toracat Note Added: 0030414
2017-10-19 16:18 toracat Status assigned => resolved
2017-10-19 16:18 toracat Resolution open => fixed