View Issue Details

IDProjectCategoryView StatusLast Update
0016242CentOS-7kernelpublic2019-10-04 18:27
Reporteremrvb 
PrioritynormalSeveritymajorReproducibilityalways
Status resolvedResolutionfixed 
Platformx86_64OSCentOSOS Version7.6.1810
Product Version7.6.1810 
Target VersionFixed in Version 
Summary0016242: SMB/CIFS server connection stalls accessing files using kernel-3.10.0-957.21.3.el7
DescriptionWhen accessing files on a SMB share on a CentOS server running kernel 3.10.0-957.21.3.el7 the connection appears to stall. The client will hang and eventually log something along the line of:

kernel: CIFS VFS: Server *** has not responded in 120 seconds. Reconnecting...
kernel: CIFS VFS: Send error in read = -11

- The issue does not occur when the server is running an older kernel, including kernel-3.10.0-957.21.2.el7.x86_64.
- Reliably triggered by reading several files (e.g. `grep "something" * -R` or `git status`)
- Listing files does not appear to trigger the issue
- Tested with fully patched CentOS 7 and Ubuntu 18.04 as clients


Steps To Reproduce1. Run an smb server with kernel 3.10.0-957.21.3.el7
2. Create a share with sufficient data to trigger the issue (a few hundred files less than 10MB total was sufficient for me)
3. Mount the share from another computer
4. (Rapidly) access files (e.g. grep "something" * -R)
5. Observe client acting up (might hang) will eventually log CIFS VFS: Server *** has not responded in 120 seconds. Reconnecting...
Tags3.10.0-957.21.3.el7, cifs
abrt_hash
URL

Activities

sentos

sentos

2019-07-11 21:03

reporter   ~0034811

I have the same problem with Windows (7 and 10) clients hanging.
They eventually return an error "the specifed network name is no longer available"

Completely patched Centos 7 server.
Rebooted into 3.10.0-957.12.2.el7.x86_64 kernel and everything works normally.
pavelonline

pavelonline

2019-07-12 07:07

reporter   ~0034812

Same here. We have a custom system running on that same kernel and when it sends a 15Kb message (over any interface, even loopback) the data gets stuck in Send-Q (netstat report) and it never reaches the destination socket buffers, nor I see that data in tcpdump. Shorter messages are ok. I could not reproduce it in a straightforward way, though.
pavelonline

pavelonline

2019-07-12 08:51

reporter   ~0034814

I've managed to reproduce that issue. See the attached python script (version 3 is required).

centosbug.py (595 bytes)
TrevorH

TrevorH

2019-07-12 10:39

manager   ~0034816

CentOS is a rebuild of the sources used to create RHEL. We do not modify anything except to remove branding and logos. You will need to submit your request to Redhat via bugzilla.redhat.com and if/when RH accepts it and incorporates it into RHEL and releases a patched version, then CentOS will pick it up and rebuild it.
emrvb

emrvb

2019-07-12 11:38

reporter   ~0034818

@TrevorH

I understand that, but I currently do not have any systems running RHEL to test with. Any suggestions on how to get this upstream?
pavelonline

pavelonline

2019-07-12 11:47

reporter   ~0034819

I've filed a but in Redhat's bug tracker:
https://bugzilla.redhat.com/show_bug.cgi?id=1729482
TrevorH

TrevorH

2019-07-12 11:57

manager   ~0034820

emvrb: By raising a bugzilla request, which has now been done.

Pavelonline: it's marked as private, probably because it is filed under the kernel and all kernel bugs are automatically marked private. For others to be able to see it, you would need to add their email addresses to the cc list in the bz.
pavelonline

pavelonline

2019-07-12 12:05

reporter   ~0034821

@TrevorH
I can only see a username in user details.
TrevorH

TrevorH

2019-07-12 12:07

manager   ~0034822

Yeah. People will need to provide their email addresses to you to get added or raise their own bz entry. Unfortunately not something we have any influence over.
holyspectral

holyspectral

2019-07-15 20:35

reporter   ~0034831

I'm wondering if it's related to this commit, which is added into 3.10.0-957.21.3.
https://github.com/torvalds/linux/commit/f070ef2ac66716357066b683fb0baf55f8191a2e

Does anyone else have TCPWqueueTooBig counter increased after your connection is dropped?
holyspectral

holyspectral

2019-07-15 21:38

reporter   ~0034832

Increasing SO_SNDBUF has been proved to help in my case.
abhay2101

abhay2101

2019-08-01 01:07

reporter   ~0034895

Solution is there we just need to backport

https://www.spinics.net/lists/netdev/msg586999.html can we please do this asap?
toracat

toracat

2019-08-01 05:05

manager   ~0034896

I confirm that one of the patches applied to kernel 3.10.0-957.21.3.el7 caused the current issue:

[net] tcp: tcp_fragment() should apply sane memory limits (Florian Westphal) [1719849 1719850] {CVE-2019-11478}
(commit f070ef2ac66716357066b683fb0baf55f8191a2e)

and that it was fixed by the patch referenced by @abhay2101 (commit b617158dc096709d8600c53b6052144d12b89fab).

@pavelonline please make sure you provide this info in the bugzilla entry you created.

CentOS can apply the fix to the centosplus kernel as an interim solution.
abhay2101

abhay2101

2019-08-01 17:15

reporter   ~0034898

@toracat : Do we know if redhat accepted this and already working on it?
pgreco

pgreco

2019-08-01 17:52

developer   ~0034899

@pavelonline, can you add toracat@elrepo.org and pablo@fliagreco.com.ar to the cc list of the bugzilla entry?
That way we can keep track for future releases.
@abhay2101, normally kernel bugs are made private, so unless we are added to the cc, we have no way of knowing.
That said, this is a fix to a recent CVE, so it is a safe bet that rh is aware and working on it
abhay2101

abhay2101

2019-08-01 17:58

reporter   ~0034900

@paveonline please cc abhay2101@gmail.com as well. Thanks.
pavelonline

pavelonline

2019-08-01 19:49

reporter   ~0034902

@pgreco, done
@abhay, I can only add email that is known to the redhat's bug tracker. Create an account there first.
holyspectral

holyspectral

2019-08-01 20:12

reporter   ~0034903

@pavelonline can you add sam_s_wang@trend.com.tw as well? Thanks.
toracat

toracat

2019-08-02 14:50

manager   ~0034906

I wrote in comment 34896 that the patch causing the issue was commit f070ef2ac66716357066b683fb0baf55f8191a2e. The actual patch used in RHEL (therefore CentOS) was based on that upstream commit but is not the same because the code in RHEL-7 deviates from upstream. Therefore applying the patch that fixes the issue is not as straightforward as it was originally thought.
pgreco

pgreco

2019-08-02 16:05

developer   ~0034907

As @toracat said, the code doesn't match, but more importantly, part of the logic doesn't match.
We're working on a PoC patch, and hopefully we'll have a test build soon.
toracat

toracat

2019-08-02 17:28

manager   ~0034908

We have built a set of kernel-plus that has a patch candidate. It's available from:

https://people.centos.org/toracat/kernel/7/plus/bug16242/

Please test if you are able. Feedback appreciated.
abhay2101

abhay2101

2019-08-02 18:28

reporter   ~0034909

@torcat : Thanks. Attached kernel solves issue. What is progress from Red hat to merge this patch? BTW did you backported just one above patch or other dependent patch too where they are using RB tree to queue?
toracat

toracat

2019-08-02 18:33

manager   ~0034910

@abhay

Thanks for the report. Glad that it worked. Regarding the patch, it was taken from :

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git/commit/?h=linux-4.4.y&id=46c7b5d6f2a51c355b29118814fbfbdb79c35656

as @pgreco suggested.
abhay2101

abhay2101

2019-08-06 17:38

reporter   ~0034926

@toracat : Any insights into when this fix is landing in Redhat? any info will be much appreciated. Thanks.
toracat

toracat

2019-08-06 22:02

manager   ~0034927

@abhay2101

I'm waiting to hear from @pavelonline, the submitter of the bug. Once he, too, confirms that the patch works, he (or I) can propose the patch as a fix in the upstream BZ.
abhay2101

abhay2101

2019-08-07 05:12

reporter   ~0034929

@pavelonline @toracat Can you please try adding abhay.kumar@salesforce.com to Redhat Bug.
pgreco

pgreco

2019-08-07 11:06

developer   ~0034930

@abhay2101 looks like this fix was added to the kernel for 7.7, so now it is just a matter of waiting until we finish CentOS 7.7.1908
toracat

toracat

2019-10-04 18:27

manager   ~0035331

Fixed in 7.7.

Issue History

Date Modified Username Field Change
2019-07-05 14:14 emrvb New Issue
2019-07-05 14:14 emrvb Tag Attached: cifs
2019-07-05 14:14 emrvb Tag Attached: 3.10.0-957.21.3.el7
2019-07-11 21:03 sentos Note Added: 0034811
2019-07-12 07:07 pavelonline Note Added: 0034812
2019-07-12 08:51 pavelonline File Added: centosbug.py
2019-07-12 08:51 pavelonline Note Added: 0034814
2019-07-12 10:39 TrevorH Note Added: 0034816
2019-07-12 11:38 emrvb Note Added: 0034818
2019-07-12 11:47 pavelonline Note Added: 0034819
2019-07-12 11:57 TrevorH Note Added: 0034820
2019-07-12 12:05 pavelonline Note Added: 0034821
2019-07-12 12:07 TrevorH Note Added: 0034822
2019-07-15 20:35 holyspectral Note Added: 0034831
2019-07-15 21:38 holyspectral Note Added: 0034832
2019-08-01 01:07 abhay2101 Note Added: 0034895
2019-08-01 05:05 toracat Status new => assigned
2019-08-01 05:05 toracat Note Added: 0034896
2019-08-01 17:15 abhay2101 Note Added: 0034898
2019-08-01 17:52 pgreco Note Added: 0034899
2019-08-01 17:58 abhay2101 Note Added: 0034900
2019-08-01 19:49 pavelonline Note Added: 0034902
2019-08-01 20:12 holyspectral Note Added: 0034903
2019-08-02 14:50 toracat Note Added: 0034906
2019-08-02 16:05 pgreco Note Added: 0034907
2019-08-02 17:28 toracat Note Added: 0034908
2019-08-02 18:28 abhay2101 Note Added: 0034909
2019-08-02 18:33 toracat Note Added: 0034910
2019-08-06 17:38 abhay2101 Note Added: 0034926
2019-08-06 22:02 toracat Note Added: 0034927
2019-08-07 05:12 abhay2101 Note Added: 0034929
2019-08-07 11:06 pgreco Note Added: 0034930
2019-10-04 18:27 toracat Status assigned => resolved
2019-10-04 18:27 toracat Resolution open => fixed
2019-10-04 18:27 toracat Note Added: 0035331