View Issue Details

IDProjectCategoryView StatusLast Update
0006476CentOS-5xenpublic2013-10-09 06:50
Reporterq7joey Assigned To 
Status newResolutionopen 
Product Version5.9 
Summary0006476: Frag is bigger than frame.
Descriptiona recent xen kernel update causes dom0 to emit "Frag is bigger than frame." and then disconnect the network from the domu. this appears to be caused by tcp offload generating frames that are too big. there is a thread at:
discussing it.

dom0 emits logs like:
vif14.0: Frag is bigger than frame.
vif14.0: fatal error; disabling device

i think i have seen some domu messages, but the last instance of this didn't cause any logs.

a reboot of domu will hang during the ifdown phase (maybe trying to flush buffers?). doing a xm network-detach from dom0 will allow a clean shutdown, you can even attach a new vif via network-attach.

supposedly, disabling offload functions will fix this. i have used:
ethtool -K eth0 tx off tso off sg off

which might be overkill, but seems to have fixed the issue.

i have yet to be able to come up with a test case that will trigger this. i have only seen this on 3 out of 20 vms.

dom0 and domu are running 2.6.18-348.6.1.el5xen, 64bit. everything up to date.

i don't have an rhel system to test on, but i would guess this is an upstream issue.
Steps To Reproducehaven't been able to find a specific trigger. i've tried writing code to generate large packets, with so_sndbuf set large as well. so far just normal web server load will cause the problem once a month or so.
TagsNo tags attached.




2013-05-27 22:52

reporter   ~0017512

I'm also experiencing this, there is a patch set here:


2013-06-03 23:58

reporter   ~0017523

We have seen this problem as well.

Jun 3 15:50:37 box38 kernel: vif1.0: Frag is bigger than frame.
Jun 3 15:50:37 box38 kernel: vif1.0: fatal error; disabling device

dom0: 2.6.18-348.4.1.el5xen
domU: 2.6.18-348.4.1.el5.i386.xen

(Running a custom built kernel on the domU)


2013-07-10 18:51

reporter   ~0017638

I have run into the same issues.
Does anyone know what the timeline is to fix this problem without using a custom patch or hack?


2013-10-09 06:49

reporter   ~0018169

I have run into the same issue running

Debian Wheezy 3.2.0-4-amd6

CentOS 6.4
stock 2.6.32-358.18.1.el6.x86_64 kernel

Should I create a different bug report? (since this one is CentOS 5)

Issue History

Date Modified Username Field Change
2013-05-27 16:12 q7joey New Issue
2013-05-27 22:52 LIV2 Note Added: 0017512
2013-06-03 23:58 Gene Note Added: 0017523
2013-07-10 18:51 mrabinormal Note Added: 0017638
2013-10-09 06:49 silviu.vulcan Note Added: 0018169