View Issue Details

IDProjectCategoryView StatusLast Update
0003672CentOS-5-OTHERpublic2009-10-17 22:36
ReporterJeff lang 
PrioritynormalSeveritycrashReproducibilityalways
Status closedResolutionfixed 
Product Version5.3 
Target VersionFixed in Version5.4 
Summary0003672: jumbo Frames causes RPC to fail
DescriptionI enabled jumbo frames on a gig network, using an MTU 9000. This gig network has NFS running between 34 client systems. The file server on the network after a short while hung (required forced reboot). After reboot, in checking the logs I see allot of messages like:

messages.2:May 29 13:46:42 seismicfsm kernel: RPC: bad TCP reclen 0x31373130 (non-terminal)
messages.2:May 29 13:46:44 seismicfsm kernel: RPC: bad TCP reclen 0x4d512e5a (non-terminal)
messages.2:May 29 13:46:47 seismicfsm kernel: RPC: bad TCP reclen 0x61630000 (non-terminal)
messages.2:May 29 13:46:50 seismicfsm kernel: RPC: bad TCP reclen 0x61630000 (non-terminal)
messages.2:May 29 13:46:53 seismicfsm kernel: RPC: bad TCP reclen 0x2e30302e (non-terminal)
messages.2:May 29 13:46:56 seismicfsm kernel: RPC: bad TCP reclen 0x73616300 (non-terminal)
messages.2:May 29 13:46:59 seismicfsm kernel: RPC: bad TCP reclen 0x3833382e (non-terminal)
messages.2:May 29 13:47:02 seismicfsm kernel: RPC: bad TCP reclen 0x0327a91a (large)
messages.2:May 29 13:47:05 seismicfsm kernel: RPC: bad TCP reclen 0x20202020 (non-terminal)
messages.2:May 29 13:47:08 seismicfsm kernel: RPC: bad TCP reclen 0x2e2e253a (large)
messages.2:May 29 13:48:11 seismicfsm kernel: RPC: bad TCP reclen 0x00000014 (non-terminal)
messages.2:May 29 13:48:11 seismicfsm kernel: RPC: bad TCP reclen 0x2e637574 (non-terminal)
messages.2:May 29 13:48:14 seismicfsm kernel: RPC: bad TCP reclen 0x2e736163 (non-terminal)
messages.2:May 29 13:48:14 seismicfsm kernel: RPC: bad TCP reclen 0x5a76fef5 (non-terminal)
messages.2:May 29 14:43:54 seismicfsm kernel: RPC: bad TCP reclen 0x3b0a2020 (non-terminal)
messages.2:May 29 14:43:56 seismicfsm kernel: RPC: bad TCP reclen 0x00000000 (non-terminal)

I immediately turned off the jumbo frames, setting MTU back to 1500 and haven't seen this issue with the MTU since Thursday. I tried again setting jumbo frames and the file server again hung. I installed the latest kernel on Thursday morning and the file server hung about 2 hours later.

The master node shares one common file system to all the nodes, and is also receiving these errors:

messages.1:Jun 4 18:11:17 seismicmstm kernel: RPC: bad TCP reclen 0x204d6179 (non-terminal)
messages.1:Jun 4 18:11:17 seismicmstm kernel: RPC: bad TCP reclen 0x5d972d7d (non-terminal)
messages.1:Jun 4 18:11:17 seismicmstm kernel: RPC: bad TCP reclen 0x204d6179 (non-terminal)
messages.1:Jun 4 18:11:17 seismicmstm kernel: RPC: bad TCP reclen 0x5d972d7d (non-terminal)



Additional InformationThe environment is a Cluster consisting of a master node, fileserver and 32 computer nodes. The switches in the network are all CISCO 3750. All nodes in the cluster are running: CentOS 5.3 2.6.18-128.1.10.el5
Tagsfixed in 5.4

Activities

user430

2009-06-08 16:06

  ~0009451

Can you check if you are seeing the same as in https://bugzilla.redhat.com/show_bug.cgi?id=482747 ?

Jeff lang

Jeff lang

2009-06-08 16:33

reporter   ~0009452

In reading through the Bug report as listed above, it does seem to be the same problem.

The problem first appeared after I enabled jumbo frames across the cluster and a user tried to untar a file from a node.

The hardware here is all IBM 3650 and 3550 hardware, which show a PCI hardware list of for network cards:

03:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5708 Gigabit Ethernet (rev 12)
06:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5708 Gigabit Ethernet (rev 12)

The machines are all 64 bit kernels. I have tried the latest two kernel releases: CentOS (2.6.18-128.1.10.el5) and CentOS (2.6.18-128.1.6.el5) and they both fail. The systems set with MTU of 1500 are stable.

user430

2009-06-08 20:02

  ~0009454

Could you then please add to that bug report? We need upstream to fix that to get a fixed kernel for CentOS.

Thank you.
Jeff lang

Jeff lang

2009-06-08 21:56

reporter   ~0009455

Note that I updated my bnx2 driver the the latest availabe on the broadcom site. version:

[root@seismicmstm broadcomm]# ethtool -i eth1
driver: bnx2
version: 1.8.5b
firmware-version: 4.0.3 ipms 1.6.0
bus-info: 0000:06:00.0


So far i have not seen the failure or hang.
sunnydavis

sunnydavis

2009-09-07 04:39

reporter   ~0009889

I believe it is the duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=476897

It seems to be fixed in kernel-xen-2.6.18-141.el5 (bnx2 v1.9.3)

user430

2009-10-17 22:36

  ~0010080

Has been fixed with the latest kernel

Issue History

Date Modified Username Field Change
2009-06-08 16:03 Jeff lang New Issue
2009-06-08 16:06 user430 Note Added: 0009451
2009-06-08 16:06 user430 Status new => feedback
2009-06-08 16:33 Jeff lang Note Added: 0009452
2009-06-08 20:02 user430 Note Added: 0009454
2009-06-08 21:56 Jeff lang Note Added: 0009455
2009-09-07 04:39 sunnydavis Note Added: 0009889
2009-09-07 05:00 toracat Tag Attached: fixed in 5.4
2009-10-17 22:36 user430 Note Added: 0010080
2009-10-17 22:36 user430 Status feedback => closed
2009-10-17 22:36 user430 Resolution open => fixed
2009-10-17 22:36 user430 Fixed in Version => 5.4