View Issue Details

IDProjectCategoryView StatusLast Update
0002678CentOS-5sambapublic2008-08-11 23:08
Reporteryafrank Assigned To 
PrioritynormalSeverityminorReproducibilityalways
Status feedbackResolutionopen 
Product Version5.1 
Summary0002678: XP sp2 clients report "Delayed write failed" after upgrading to samba-3.0.25b-1.el5_1.4
DescriptionAfter upgrading to Centos-5.1 (i386), some XP sp2 clients report intermittently "Delayed write failed" error when writing to the samba share, read access is OK as before. The source is MRxSmb, event ID is 50 and status is c000020c according to XP. Microsoft's KB suggest bad hardware or network connection, but it's likely here. I finally managed to reproduce the problem in one of our XPs when compressing roughly 300M files in the share and store the compressed to the same one using Winrar. Compressing aborts at the middle of the process and raises the same "Delayed write failed" error. It can be reproduced only if the XP client boot after the Centos-5.1 server. Same operation subsequently will be successful. After downgrading to the samba-3.0.23c-2.el5.2.0.2, I can't reproduce it anymore.
Additional Informationsmb.conf
# Global parameters
[global]
        netbios name = XXXX
        workgroup = XXXXXX
        server string = XXXXXXXXXXXXXXX
        log file = /var/log/samba/%m.log
        max log size = 500
        dns proxy = No
        local master = Yes
        hosts allow = 127.0.0.1, 168.8.8.0/24
        hosts deny = 0.0.0.0/0
        interfaces = br0 lo
        bind interfaces only = Yes
        load printers = No
        show add printer wizard = No
        max smbd processes = 300
        max connections = 300
        deadtime = 15
        config file = /etc/samba/smb.conf.%G
        passdb backend = tdbsam:/etc/samba/passdb.tdb
        log level = 2
        username map = /etc/samba/smbusers
        printcap name = /dev/null
        disable spoolss = yes
        smb ports = 139
        socket options = tcp_nodelay so_keepalive so_sndbuf=16384 so_rcvbuf=16384

[dep]
        comment = Department share
        path = /var/smb/%G
        valid users= +%G
        read only = No
TagsNo tags attached.

Activities

jsharper

jsharper

2008-04-06 17:37

reporter   ~0007102

I believe I am seeing this same or a very similar issue on two different sets of boxes -- one CentOS 5 on i386 and WinXP SP2 as the client, the other CentOS 5 on x86_64 and Win Server 2003 SP1 as the client (although 2k3 handles the error reporting to the user a little differently). It is intermittent and hard to pin down details, but I am working on trying to document things before trying to downgrade samba (or try a stock samba.org install). I can't be sure exactly when my problems started, but it seems to have been right around when I upgraded to samba-3.0.25b-1.el5_1.4 in January. I know my report is vague and without many details, but I wanted to let you know you may not be alone while I try to pin this down further.
jsharper

jsharper

2008-04-06 17:42

reporter   ~0007103

I will add that the client-side errors often correspond with messages similar to these in the server logs. Also, I believe I may be seeing it on both writes and reads.

Apr 6 10:16:50 randal smbd[20393]: [2008/04/06 10:16:50, 0] lib/util_sock.c:read_data(534)
Apr 6 10:16:50 randal smbd[20393]: read_data: read failure for 4 bytes to client 10.10.89.10. Error = Connection reset by peer

Apr 3 21:32:20 titan smbd[26897]: [2008/04/03 21:32:20, 0] lib/util_sock.c:read_data(534)
Apr 3 21:32:20 titan smbd[26897]: read_data: read failure for 4 bytes to client 10.10.98.10. Error = Connection reset by peer
Apr 3 21:32:20 titan smbd[26897]: [2008/04/03 21:32:20, 0] lib/util_sock.c:write_data(562)
Apr 3 21:32:20 titan smbd[26897]: write_data: write failure in writing to client 10.10.98.10. Error Broken pipe
Apr 3 21:32:20 titan smbd[26897]: [2008/04/03 21:32:20, 0] lib/util_sock.c:send_smb(769)
Apr 3 21:32:20 titan smbd[26897]: Error writing 75 bytes to client. -1. (Broken pipe)
kmyerqsv

kmyerqsv

2008-04-08 02:29

reporter   ~0007112

Similar problems here as well. Happens only when copying a file from one location to another. PC to server and server to PC works fine, but copying a file in Folder A on server to anywhere else on the server generates the disconnect. After a few of these, users are able to copy the files that previously given them errors, but obviously with some level of annoyance.

Verified with remote PC, running Windows XP, SP2 and updates, as well as a copy of Win2K3 Enterprise, running in VMWare Server on the file server itself, so that would seem to preclude it being a hardware error, or a data cabling issue.
dcp

dcp

2008-04-16 16:20

reporter   ~0007136

I have been seeing this on new CentOS 5.1 running
Samba Version 3.0.25b-1.el5_1.4

Typical log entries are similar to the above description. Such as:

[2008/04/07 17:29:13, 1] smbd/service.c:close_cnum(1230)
  cindy (192.168.20.128) closed connection to service public
[2008/04/07 17:46:42, 0] lib/util_sock.c:read_data(534)
  read_data: read failure for 4 bytes to client 192.168.20.128. Error = No route to host

Most of the day she can save from PC (typically MS Word) direct to the server without incident. I just completed 5000 pings from server to PC with 1000 byte packets and 200ms interval. 0% packet loss. May not be an exhaustive but sanity test perhaps.

3 other PC's on our LAN have seen the error. It nearly always has a false "Disk Full" error just before the "Delayed Write Error" popup.
We've also seen (at least once or twice), fail to open a file (shows up blank), fail to open "path cannot be found".

The machine with the most failures (a few per day) has Firewall turned off and no Virus scan (turned off for this test). All are at XP/Pro SP2 level. As guessed here and in a similar thread in 2005
http://lists.samba.org/archive/samba/2005-February/099764.html
It appears to come and go depending on Samba/Linux/Server rather than the client Windows. (I've researched this for 3 weeks with many descriptions on the web and no good analysis/solution.) I'm suspicious that a mix of different errors are showing up as a single error (Disk Full, Delayed Write Error) on Windows thus masking the cause. That explains why a mix of different things tried seem to fix it.
Since this has been reported off and on for 4 years or so (see google) it would really help to either diagnose it, develope a test, or possible a solution which work even though we may not know why.)
Anyway, I've not appended anywhere else, partly because descriptions were years old. This, at lest is 2008 and even the centos 5.1 match.
If I get a chance I'll 1) move server under desk of on PC (eliminate nearly all LAN switch issues 2) reload FC7 and samba to see if different.
At some point, I hope developers get involved to describe this further.
Suggestions?
dcp

dcp

2008-04-16 16:24

reporter   ~0007137

Here's an excerpt from smb.conf of the settings I have collected from others on the "Disk Full" error:

# things to try to stop the disk full error on word, excel, 97 and 2003
smb ports = 139
strict allocate = yes

inherit acls = yes
inherit permissions = yes

#later for disk full error
max disk size = 1000

One person postulated that perhaps a network snafu occurs (verified by log entry of "4 byte write failure" and "disconnect" at that same time as the failure.) that windows successfully completes an operation, gets a result or returncode it does not understand, repeats the operation with the "delay error". That would match some instances where we see a save that failed but the saves are actually there, save ends with "Disk Full" but works perfectly five minutes later (just leaving window open a while), and some open but with "blank result", and "path not found" errors that are ok when retried.
kmyerqsv

kmyerqsv

2008-04-16 17:18

reporter   ~0007138

Responding to dcp, note 007137:

Per my testing, the problem exists even when running from a VMWare Server instance that is running as a process on the same server that the Samba process is running. So I am seeing the issue on what amounts to a loopback interface - no cabling, network equipment involved at all, just a network stack. I saw this with both iptables enabled and disabled on the server. And the OS running in the VMWare Server was Win2K3 Enterprise, so that either means there is a common network component to XP and Win2K3 that exposes the bug, or its a Samba bug.
edwardvdv

edwardvdv

2008-04-21 11:42

reporter   ~0007158

I can confirm this exact same problem. No hardware/cabling issues exists, because the WXP machine runs under Xen on the same machine the Samba server is on.

If needed, I can supply more technical details.
kmyerqsv

kmyerqsv

2008-04-21 18:25

reporter   ~0007162

Upstream issue (I think):
https://bugzilla.redhat.com/show_bug.cgi?id=435316

Further upstream issue with SAMBA:
https://bugzilla.samba.org/show_bug.cgi?id=4763

I will be testing the patch on the SAMBA bug tracker tonight, and Red Hat shows that bug ID as release pending, so I suppose a fix might be in the pipeline.
dcp

dcp

2008-05-29 22:29

reporter   ~0007367

Switching to Fedora 7, error has not reoccurred.
Back with some results which may help others with circumventions. Upon reading from some that moving from CentOS 5.1 to Fedora 7, we decided to try it. After 4 days of average use on the Fedora 7 the error has not reappeared. It tended to occur one or a few times per day for 3 users. No other pattern could be recognized. The Samba on CentOS 5.1 was 3.0.25b. Samba on F7 is 3.0.28. Tough way to circumvent and mysterious. But that's the results so far. BTW, following the bugzilla links above provide other's experience and progress on diagnosis.
(Hardware is all XP/SP2 32bit, mix of old/new PC's. Error occurred on brand new fast PC. Server is simple core2duo, Asus, 1Gig, 160Gig box. About 10 users, only 3 heavy smb users.)

user430

2008-05-30 14:04

  ~0007368

Regarding the last comment: CentOS 5.2 will come out with Samba 3.0.28 - so you might have to wait a little while until that comes out and test with the version there.
agentblueuk

agentblueuk

2008-06-26 12:42

reporter   ~0007485

Issue still exists in samba-3.0.25b-1.el4_6.5
agentblueuk

agentblueuk

2008-06-26 12:43

reporter   ~0007486

Why have centos not yet applied this update downstream ? http://rhn.redhat.com/errata/RHBA-2008-0372.html

user430

2008-06-26 13:37

  ~0007487

Because we (and our mirror network) are lagging a bit behind due to the 5.2 release.

The samba update has just been released, so it should be on a mirror near you in a few hours.
agentblueuk

agentblueuk

2008-06-29 18:10

reporter   ~0007519

no sign of the update yet
agentblueuk

agentblueuk

2008-07-17 08:23

reporter   ~0007655

please note that I am actually referring to the same samba bug, but am looking for a fix for el4 not el5

user430

2008-07-17 09:59

  ~0007666

Yes, but as upstream hasn't released anything for 4.6, we won't either.
ipguy

ipguy

2008-08-11 23:08

reporter   ~0007828

any update on when this bug will get resolved ?

Issue History

Date Modified Username Field Change
2008-02-14 06:52 yafrank New Issue
2008-04-06 17:37 jsharper Note Added: 0007102
2008-04-06 17:42 jsharper Note Added: 0007103
2008-04-08 02:29 kmyerqsv Note Added: 0007112
2008-04-16 16:20 dcp Note Added: 0007136
2008-04-16 16:24 dcp Note Added: 0007137
2008-04-16 17:18 kmyerqsv Note Added: 0007138
2008-04-21 11:42 edwardvdv Note Added: 0007158
2008-04-21 18:25 kmyerqsv Note Added: 0007162
2008-05-29 22:29 dcp Note Added: 0007367
2008-05-30 14:04 user430 Note Added: 0007368
2008-06-26 12:42 agentblueuk Note Added: 0007485
2008-06-26 12:43 agentblueuk Note Added: 0007486
2008-06-26 13:37 user430 Note Added: 0007487
2008-06-26 22:55 kbsingh@karan.org Status new => feedback
2008-06-29 18:10 agentblueuk Note Added: 0007519
2008-07-17 08:23 agentblueuk Note Added: 0007655
2008-07-17 09:59 user430 Note Added: 0007666
2008-08-11 23:08 ipguy Note Added: 0007828