View Issue Details

IDProjectCategoryView StatusLast Update
0017292CentOS-6kernelpublic2020-05-06 17:51
Reporterdigimer 
PriorityhighSeveritymajorReproducibilityalways
Status newResolutionopen 
Product Version6.10 
Target VersionFixed in Version 
Summary0017292: Bonding not failing over in mode=1 under 2.6.32-754.28.1 (...27.1 works OK)
DescriptionNote: I'm copying RHBZ #1828604 as that bug is set to private (I am the reporter).

Summary:

Bonding drivers don't fail over when link drops with mode=1 (active-passive) bonds under kernel-2.6.32-754.28.1.el6.x86_64, works under kernel-2.6.32-754.27.1.el6.x86_64.

Full:

With a two interface active-passive bond, issuing 'ifdown <link1>' works, the backup link takes over. However, if you unplug a cable, /proc/net/bonding/<bond> shows the active interface as 'down', but it remains the in-use interface. So traffic over the bond fails.

Configuration:

====
[root@an-a02n02 ~]# cat /etc/sysconfig/network-scripts/ifcfg-sn_link1
# Generated by: [InstallManifest.pm] on: [2020-03-24, 19:33:15].
# Storage Network - Link 1
DEVICE="sn_link1"
NM_CONTROLLED="no"
BOOTPROTO="none"
ONBOOT="yes"
SLAVE="yes"
MASTER="sn_bond1"

[root@an-a02n02 ~]# cat /etc/sysconfig/network-scripts/ifcfg-sn_link2
# Generated by: [InstallManifest.pm] on: [2020-03-24, 19:33:15].
# Storage Network - Link 2
DEVICE="sn_link2"
NM_CONTROLLED="no"
BOOTPROTO="none"
ONBOOT="yes"
SLAVE="yes"
MASTER="sn_bond1"

[root@an-a02n02 ~]# cat /etc/sysconfig/network-scripts/ifcfg-sn_bond1
# Generated by: [InstallManifest.pm] on: [2020-03-24, 19:33:15].
# Storage Network - Bond 1
DEVICE="sn_bond1"
BOOTPROTO="static"
ONBOOT="yes"
BONDING_OPTS="mode=1 miimon=100 use_carrier=1 updelay=120000 downdelay=0 primary=sn_link1 primary_reselect=always"
IPADDR="10.10.20.2"
NETMASK="255.255.0.0"
DEFROUTE="no"
====

-=] Under 2.6.32-754.27.1.el6.x86_64 [=-

/proc/net/bonding/sn_bond1 pre-failure:

====
[root@an-a02n02 ~]# cat /proc/net/bonding/sn_bond1
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: sn_link1 (primary_reselect always)
Currently Active Slave: sn_link1
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 120000
Down Delay (ms): 0

Slave Interface: sn_link1
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: b4:96:91:4f:10:15
Slave queue ID: 0

Slave Interface: sn_link2
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: b4:96:91:4f:10:14
Slave queue ID: 0
====

/var/log/messages failing the sn_link1:

====
Apr 27 17:22:01 an-a02n02 kernel: ixgbe 0000:05:00.1: sn_link1: NIC Link is Down
Apr 27 17:22:01 an-a02n02 kernel: sn_bond1: link status definitely down for interface sn_link1, disabling it
Apr 27 17:22:01 an-a02n02 kernel: sn_bond1: making interface sn_link2 the new active one
====

/proc/net/bonding/sn_bond1 post-failure:

====
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: sn_link1 (primary_reselect always)
Currently Active Slave: sn_link2
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 120000
Down Delay (ms): 0

Slave Interface: sn_link1
MII Status: down
Speed: Unknown
Duplex: Unknown
Link Failure Count: 1
Permanent HW addr: b4:96:91:4f:10:15
Slave queue ID: 0

Slave Interface: sn_link2
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: b4:96:91:4f:10:14
Slave queue ID: 0
====

Worked fine.

-=] Under 2.6.32-754.28.1.el6.x86_64 [=-

/proc/net/bonding/sn_bond1 pre-failure:

====
[root@an-a02n02 ~]# cat /proc/net/bonding/sn_bond1
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: sn_link1 (primary_reselect always)
Currently Active Slave: sn_link1
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 120000
Down Delay (ms): 0

Slave Interface: sn_link1
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: b4:96:91:4f:10:15
Slave queue ID: 0

Slave Interface: sn_link2
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: b4:96:91:4f:10:14
Slave queue ID: 0
====

/var/log/messages failing the sn_link1 (just the one line...):

====
Apr 27 17:32:08 an-a02n02 kernel: ixgbe 0000:05:00.1: sn_link1: NIC Link is Down
====

/proc/net/bonding/sn_bond1 post-failure:

====
[root@an-a02n02 ~]# cat /proc/net/bonding/sn_bond1
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: sn_link1 (primary_reselect always)
Currently Active Slave: sn_link1
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 120000
Down Delay (ms): 0

Slave Interface: sn_link1
MII Status: down
Speed: Unknown
Duplex: Unknown
Link Failure Count: 0
Permanent HW addr: b4:96:91:4f:10:15
Slave queue ID: 0

Slave Interface: sn_link2
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: b4:96:91:4f:10:14
Slave queue ID: 0
====
Steps To Reproduce1. Create bond as described above
2. Physically fail an interface (do not use 'ifdown')
Additional InformationThere are a lot of entries on bond changes in the .28 kernel;

====
[root@an-a02n02 ~]# rpm -q --changelog kernel-2.6.32-754.28.1.el6.x86_64 | grep bond | wc -l
794
====
Tagsregression

Activities

TrevorH

TrevorH

2020-04-27 22:00

manager   ~0036780

https://bugzilla.redhat.com/show_bug.cgi?id=1828604 is the upstream bugzilla
digimer

digimer

2020-05-06 17:51

reporter   ~0036879

This has been resolved upstream. I've not gotten confirmation that it will be in .30, but I expect it will.

Issue History

Date Modified Username Field Change
2020-04-27 21:57 digimer New Issue
2020-04-27 21:57 digimer Tag Attached: regression
2020-04-27 22:00 TrevorH Note Added: 0036780
2020-05-06 17:51 digimer Note Added: 0036879