View Issue Details

IDProjectCategoryView StatusLast Update
0016242CentOS-7kernelpublic2019-07-15 21:38
Reporteremrvb 
PrioritynormalSeveritymajorReproducibilityalways
Status newResolutionopen 
Platformx86_64OSCentOSOS Version7.6.1810
Product Version7.6.1810 
Target VersionFixed in Version 
Summary0016242: SMB/CIFS server connection stalls accessing files using kernel-3.10.0-957.21.3.el7
DescriptionWhen accessing files on a SMB share on a CentOS server running kernel 3.10.0-957.21.3.el7 the connection appears to stall. The client will hang and eventually log something along the line of:

kernel: CIFS VFS: Server *** has not responded in 120 seconds. Reconnecting...
kernel: CIFS VFS: Send error in read = -11

- The issue does not occur when the server is running an older kernel, including kernel-3.10.0-957.21.2.el7.x86_64.
- Reliably triggered by reading several files (e.g. `grep "something" * -R` or `git status`)
- Listing files does not appear to trigger the issue
- Tested with fully patched CentOS 7 and Ubuntu 18.04 as clients


Steps To Reproduce1. Run an smb server with kernel 3.10.0-957.21.3.el7
2. Create a share with sufficient data to trigger the issue (a few hundred files less than 10MB total was sufficient for me)
3. Mount the share from another computer
4. (Rapidly) access files (e.g. grep "something" * -R)
5. Observe client acting up (might hang) will eventually log CIFS VFS: Server *** has not responded in 120 seconds. Reconnecting...
Tags3.10.0-957.21.3.el7, cifs
abrt_hash
URL

Activities

sentos

sentos

2019-07-11 21:03

reporter   ~0034811

I have the same problem with Windows (7 and 10) clients hanging.
They eventually return an error "the specifed network name is no longer available"

Completely patched Centos 7 server.
Rebooted into 3.10.0-957.12.2.el7.x86_64 kernel and everything works normally.
pavelonline

pavelonline

2019-07-12 07:07

reporter   ~0034812

Same here. We have a custom system running on that same kernel and when it sends a 15Kb message (over any interface, even loopback) the data gets stuck in Send-Q (netstat report) and it never reaches the destination socket buffers, nor I see that data in tcpdump. Shorter messages are ok. I could not reproduce it in a straightforward way, though.
pavelonline

pavelonline

2019-07-12 08:51

reporter   ~0034814

I've managed to reproduce that issue. See the attached python script (version 3 is required).

centosbug.py (595 bytes)
TrevorH

TrevorH

2019-07-12 10:39

manager   ~0034816

CentOS is a rebuild of the sources used to create RHEL. We do not modify anything except to remove branding and logos. You will need to submit your request to Redhat via bugzilla.redhat.com and if/when RH accepts it and incorporates it into RHEL and releases a patched version, then CentOS will pick it up and rebuild it.
emrvb

emrvb

2019-07-12 11:38

reporter   ~0034818

@TrevorH

I understand that, but I currently do not have any systems running RHEL to test with. Any suggestions on how to get this upstream?
pavelonline

pavelonline

2019-07-12 11:47

reporter   ~0034819

I've filed a but in Redhat's bug tracker:
https://bugzilla.redhat.com/show_bug.cgi?id=1729482
TrevorH

TrevorH

2019-07-12 11:57

manager   ~0034820

emvrb: By raising a bugzilla request, which has now been done.

Pavelonline: it's marked as private, probably because it is filed under the kernel and all kernel bugs are automatically marked private. For others to be able to see it, you would need to add their email addresses to the cc list in the bz.
pavelonline

pavelonline

2019-07-12 12:05

reporter   ~0034821

@TrevorH
I can only see a username in user details.
TrevorH

TrevorH

2019-07-12 12:07

manager   ~0034822

Yeah. People will need to provide their email addresses to you to get added or raise their own bz entry. Unfortunately not something we have any influence over.
holyspectral

holyspectral

2019-07-15 20:35

reporter   ~0034831

I'm wondering if it's related to this commit, which is added into 3.10.0-957.21.3.
https://github.com/torvalds/linux/commit/f070ef2ac66716357066b683fb0baf55f8191a2e

Does anyone else have TCPWqueueTooBig counter increased after your connection is dropped?
holyspectral

holyspectral

2019-07-15 21:38

reporter   ~0034832

Increasing SO_SNDBUF has been proved to help in my case.

Issue History

Date Modified Username Field Change
2019-07-05 14:14 emrvb New Issue
2019-07-05 14:14 emrvb Tag Attached: cifs
2019-07-05 14:14 emrvb Tag Attached: 3.10.0-957.21.3.el7
2019-07-11 21:03 sentos Note Added: 0034811
2019-07-12 07:07 pavelonline Note Added: 0034812
2019-07-12 08:51 pavelonline File Added: centosbug.py
2019-07-12 08:51 pavelonline Note Added: 0034814
2019-07-12 10:39 TrevorH Note Added: 0034816
2019-07-12 11:38 emrvb Note Added: 0034818
2019-07-12 11:47 pavelonline Note Added: 0034819
2019-07-12 11:57 TrevorH Note Added: 0034820
2019-07-12 12:05 pavelonline Note Added: 0034821
2019-07-12 12:07 TrevorH Note Added: 0034822
2019-07-15 20:35 holyspectral Note Added: 0034831
2019-07-15 21:38 holyspectral Note Added: 0034832