CentOS Bug Tracker
Mantis Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0005859CentOS-6kernelpublic2012-07-25 00:492012-08-17 14:26
Reporterjlawer 
PriorityhighSeveritymajorReproducibilityalways
StatusresolvedResolutionfixed 
PlatformDell R320OSCentOSOS Version6.3
Product Version6.3 
Target VersionFixed in Version 
Summary0005859: Bridge is disabled when bonded slave is disabled despite bond still being active
DescriptionOn a Bonded (tested in bonding mode 1, 2, 4 & 5) ethernet connection (tested with both intel & broadcom nics) on a VLAN'd and Bridged (for KVM) connection the bridge will be disabled when a bonded slave becomes disabled.

After the message in the logs that the slave is down, within a second the following message appears :

kernel: br1: port 1(bond0.8) entering disabled state

Interestingly enough packets will flow for that moment between the slave loosing connection and the bridge disabled message.

I was browsing through the relevant kernel source, and while completely amateur I am curious if the bridge gets the NETDEV notify message when the bond slave is lost?

Steps To Reproduce1. Setup bonding between 2 ethernet interfaces (and configure switch if needed for a LAG if using mode 4)
2.) VLAN bond
3.) Bridge VLAN
4.) ping connection (works)
5.) Unplug slave connection
6.) ping stops working
Additional InformationR320 server (2 broadcom + 4 intel nics), PowerConnect 5524 switches in a stacked config.

TagsNo tags attached.
Attached Files

- Relationships

-  Notes
(0015535)
jlawer (reporter)
2012-07-25 00:52

Reproduced on second server. Unable to reproduce on Fedora 17.
(0015536)
jlawer (reporter)
2012-07-25 02:27

OK, I've installed kernel-ml from elrepo (Kernel 3.5) and has this fixed the issue, so while this is not ideal for me, I am going to use this as a workaround.

As such it is no longer a "major" severity for myself.
(0015541)
toracat (developer)
2012-07-25 16:03

@jlawer

If not already reported, could you try filing a new issue with the upstream bugzilla ( http://bugzilla.redhat.com [^] ) ? Mentioning your finding that kernel 3.5 does not have the same problem _might_ trigger investigation.
(0015542)
jlawer (reporter)
2012-07-25 21:15

Thanks @Toracat,

I did another search of the upstream bugzilla and have since found issue BZ 841983 which looks like this may have just been resolved in 2.6.32-279.2.1 (even though I thought I had re-tested under this).

I will try and reproduce today on this 2.6 kernel version and let you know the outcome.
(0015543)
jlawer (reporter)
2012-07-25 22:49

Sorry, I misread, this will be a patch post 2.6.32.279.2.1 currently targeting Z stream (ie 6.3.z).

I suppose as such there is nothing much to do except wait for this fix to land.

If anyone else finds this report you can grab the .diff from upstream bugzilla and rebuild a 2.6.32 kernel that will fix this issue. The issue is fixed by adding some code in the vlan driver which explicitly checks if its getting a NETDEV notify from a bonded slave and checks to see if another slave is active. If another slave is active the VLAN driver ignores the event.
(0015544)
toracat (developer)
2012-07-25 23:49

I think that the patch is a good candidate for the centosplus kernel.

jlawer, if I build a patched kernel, will you be able to test it? Which arch, i686 or x86_64?
(0015545)
jlawer (reporter)
2012-07-25 23:53

x86_64 mate, and yeah, I would be happy to use the test kernel on the two servers I have.

Thanks
(0015547)
toracat (developer)
2012-07-26 02:36

@jlawer

The patched kernel ( 2.6.32-279.2.1.bug5859.el6.centos.plus ) is now available at:

http://people.centos.org/toracat/kernel/6/plus/2.6.32-279.2.1.bug5859.el6.centos.plus/x86_64/ [^]

Please note that it is not signed. If you can confirm the patch resolves the issue, it will be included in the next release of the Cplus kernel.
(0015548)
jlawer (reporter)
2012-07-26 03:44

@toracat

The patched kernel worked on my test system. I will keep it running that kernel until there is a new release and report if there is any issues, but it's looking good.
(0015549)
toracat (developer)
2012-07-26 04:33

@jlawer

That's good news. Yes, please do update with anything you find.
(0015562)
toracat (developer)
2012-07-28 11:09

The patched kernel is now in:

http://people.centos.org/toracat/kernel/6/plus/2.6.32-279.2.1.bug5859bug5424.el6.centos.plus/ [^]
(0015674)
jlawer (reporter)
2012-08-17 12:34

This is now in the stock kernel-2.6.32-279.5.1.el6

[root@hydrogen etc]# rpm -q --changelog kernel | grep 841983
- [net] 8021q/vlan: filter device events on bonds (Neil Horman) [842429 841983]

Thanks toracat for the bug fix kernel, it helped keep my project going.
(0015678)
toracat (developer)
2012-08-17 14:25

You are welcome. Yes, they got this patch in very quickly upstream. Closing this report as 'resolved'.

- Issue History
Date Modified Username Field Change
2012-07-25 00:49 jlawer New Issue
2012-07-25 00:52 jlawer Note Added: 0015535
2012-07-25 02:27 jlawer Note Added: 0015536
2012-07-25 16:03 toracat Note Added: 0015541
2012-07-25 16:03 toracat Status new => acknowledged
2012-07-25 21:15 jlawer Note Added: 0015542
2012-07-25 22:49 jlawer Note Added: 0015543
2012-07-25 23:49 toracat Note Added: 0015544
2012-07-25 23:53 jlawer Note Added: 0015545
2012-07-26 02:36 toracat Note Added: 0015547
2012-07-26 03:44 jlawer Note Added: 0015548
2012-07-26 04:33 toracat Note Added: 0015549
2012-07-28 11:09 toracat Note Added: 0015562
2012-08-17 12:34 jlawer Note Added: 0015674
2012-08-17 14:25 toracat Note Added: 0015678
2012-08-17 14:26 toracat Status acknowledged => resolved
2012-08-17 14:26 toracat Resolution open => fixed


Copyright © 2000 - 2011 MantisBT Group
Powered by Mantis Bugtracker