2017-10-11 09:46 UTC

View Issue Details Jump to Notes ]
IDProjectCategoryView StatusLast Update
0004089CentOS-5kernelpublic2010-01-14 00:33
Reporterantonl 
PrioritynormalSeverityblockReproducibilityalways
StatusresolvedResolutionfixed 
Product Version5.4 
Target VersionFixed in Version 
Summary0004089: After update from 5.2 to 5.4 XFS modules crashes "on fly" causing mounted RAID to disappear
DescriptionConfiguration:
10xSATA units 1.5 TB each combined in RAID 5 (~ 15Tb). X86_84 version
Because EXT3 doesn't support so large FS CentOS Plus kernel was used with enabled XFS. Yesterday after performing upgrade from 5.2 to 5.4 , consequently upgrading the kernel, I received complaints that people can no longer access the central storage. After reboot functionality would be restored for a few (4-6 hours) and then error "access denied" would return
The reason is this:

Dec 22 02:18:36 athena kernel: 00000000: 79 1c be c8 90 5f fd b9 69 92 e8 96 9d c7 50 76 y.ŸÈ._ý¹i.è..ÇPv
Dec 22 02:18:36 athena kernel: Filesystem "md0": XFS internal error xfs_da_do_buf(2) at line 2112 of file fs/xfs/xfs_da_btree.c. Caller 0xffffffff88532826
Dec 22 02:18:36 athena kernel:
Dec 22 02:18:36 athena kernel: Call Trace:
Dec 22 02:18:36 athena kernel: [<ffffffff88532725>] :xfs:xfs_da_do_buf+0x503/0x5b1
Dec 22 02:18:36 athena kernel: [<ffffffff88532826>] :xfs:xfs_da_read_buf+0x16/0x1b
Dec 22 02:18:36 athena kernel: [<ffffffff88532826>] :xfs:xfs_da_read_buf+0x16/0x1b
Dec 22 02:18:36 athena kernel: [<ffffffff88537b04>] :xfs:xfs_dir2_leaf_getdents+0x354/0x5ec
Dec 22 02:18:36 athena kernel: [<ffffffff88537b04>] :xfs:xfs_dir2_leaf_getdents+0x354/0x5ec
Dec 22 02:18:36 athena kernel: [<ffffffff88560d84>] :xfs:xfs_hack_filldir+0x0/0x5b
Dec 22 02:18:36 athena kernel: [<ffffffff88560d84>] :xfs:xfs_hack_filldir+0x0/0x5b
Dec 22 02:18:36 athena kernel: [<ffffffff88534860>] :xfs:xfs_readdir+0xa7/0xb6
Dec 22 02:18:36 athena kernel: [<ffffffff88561419>] :xfs:xfs_file_readdir+0xff/0x14c
Dec 22 02:18:36 athena kernel: [<ffffffff80025689>] filldir+0x0/0xb7
Dec 22 02:18:36 athena kernel: [<ffffffff80025689>] filldir+0x0/0xb7
Dec 22 02:18:36 athena kernel: [<ffffffff8003527d>] vfs_readdir+0x77/0xa9
Dec 22 02:18:36 athena kernel: [<ffffffff80038b32>] sys_getdents+0x75/0xbd
Dec 22 02:18:36 athena kernel: [<ffffffff8005d229>] tracesys+0x71/0xe0
Dec 22 02:18:36 athena kernel: [<ffffffff8005d28d>] tracesys+0xd5/0xe0
Dec 22 02:18:36 athena kernel:
Dec 22 02:18:36 athena kernel: 00000000: 79 1c be c8 90 5f fd b9 69 92 e8 96 9d c7 50 76 y.ŸÈ._ý¹i.è..ÇPv
Dec 22 02:18:36 athena kernel: Filesystem "md0": XFS internal error xfs_da_do_buf(2) at line 2112 of file fs/xfs/xfs_da_btree.c. Caller 0xffffffff88532826
Dec 22 02:18:36 athena kernel:
Dec 22 02:18:36 athena kernel: Call Trace:
Dec 22 02:18:36 athena kernel: [<ffffffff88532725>] :xfs:xfs_da_do_buf+0x503/0x5b1
Dec 22 02:18:36 athena kernel: [<ffffffff88532826>] :xfs:xfs_da_read_buf+0x16/0x1b
Dec 22 02:18:36 athena kernel: [<ffffffff88532826>] :xfs:xfs_da_read_buf+0x16/0x1b
Dec 22 02:18:36 athena kernel: [<ffffffff88537b04>] :xfs:xfs_dir2_leaf_getdents+0x354/0x5ec
Dec 22 02:18:36 athena kernel: [<ffffffff88537b04>] :xfs:xfs_dir2_leaf_getdents+0x354/0x5ec
Dec 22 02:18:36 athena kernel: [<ffffffff88560d84>] :xfs:xfs_hack_filldir+0x0/0x5b
Dec 22 02:18:36 athena kernel: [<ffffffff88560d84>] :xfs:xfs_hack_filldir+0x0/0x5b
Dec 22 02:18:36 athena kernel: [<ffffffff88534860>] :xfs:xfs_readdir+0xa7/0xb6
Dec 22 02:18:36 athena kernel: [<ffffffff88561419>] :xfs:xfs_file_readdir+0xff/0x14c
Dec 22 02:18:36 athena kernel: [<ffffffff80025689>] filldir+0x0/0xb7
Dec 22 02:18:36 athena kernel: [<ffffffff80025689>] filldir+0x0/0xb7
Dec 22 02:18:36 athena kernel: [<ffffffff8003527d>] vfs_readdir+0x77/0xa9
Dec 22 02:18:36 athena kernel: [<ffffffff80038b32>] sys_getdents+0x75/0xbd
Dec 22 02:18:36 athena kernel: [<ffffffff8005d229>] tracesys+0x71/0xe0
Dec 22 02:18:36 athena kernel: [<ffffffff8005d28d>] tracesys+0xd5/0xe0
Dec 22 02:18:36 athena kernel:
Dec 22 02:18:36 athena kernel: 00000000: 79 1c be c8 90 5f fd b9 69 92 e8 96 9d c7 50 76 y.ŸÈ._ý¹i.è..ÇPv
Dec 22 02:18:36 athena kernel: Filesystem "md0": XFS internal error xfs_da_do_buf(2) at line 2112 of file fs/xfs/xfs_da_btree.c. Caller 0xffffffff88532826
Dec 22 02:18:36 athena kernel:
Dec 22 02:18:36 athena kernel: Call Trace:
Dec 22 02:18:36 athena kernel: [<ffffffff88532725>] :xfs:xfs_da_do_buf+0x503/0x5b1
Dec 22 02:18:36 athena kernel: [<ffffffff88532826>] :xfs:xfs_da_read_buf+0x16/0x1b
Dec 22 02:18:36 athena kernel: [<ffffffff88532826>] :xfs:xfs_da_read_buf+0x16/0x1b
Dec 22 02:18:36 athena kernel: [<ffffffff88537b04>] :xfs:xfs_dir2_leaf_getdents+0x354/0x5ec
Dec 22 02:18:36 athena kernel: [<ffffffff88537b04>] :xfs:xfs_dir2_leaf_getdents+0x354/0x5ec
Dec 22 02:18:36 athena kernel: [<ffffffff88560d84>] :xfs:xfs_hack_filldir+0x0/0x5b
Dec 22 02:18:36 athena kernel: [<ffffffff88560d84>] :xfs:xfs_hack_filldir+0x0/0x5b
Dec 22 02:18:36 athena kernel: [<ffffffff88534860>] :xfs:xfs_readdir+0xa7/0xb6
Dec 22 02:18:36 athena kernel: [<ffffffff88561419>] :xfs:xfs_file_readdir+0xff/0x14c
Dec 22 02:18:36 athena kernel: [<ffffffff80025689>] filldir+0x0/0xb7
Dec 22 02:18:36 athena kernel: [<ffffffff80025689>] filldir+0x0/0xb7
Dec 22 02:18:36 athena kernel: [<ffffffff8003527d>] vfs_readdir+0x77/0xa9
Dec 22 02:18:36 athena kernel: [<ffffffff80038b32>] sys_getdents+0x75/0xbd
Dec 22 02:18:36 athena kernel: [<ffffffff8005d229>] tracesys+0x71/0xe0
Dec 22 02:18:36 athena kernel: [<ffffffff8005d28d>] tracesys+0xd5/0xe0
Dec 22 02:18:36 athena kernel:
Dec 22 02:18:36 athena kernel: 00000000: 79 1c be c8 90 5f fd b9 69 92 e8 96 9d c7 50 76 y.ŸÈ._ý¹i.è..ÇPv
Dec 22 02:18:36 athena kernel: Filesystem "md0": XFS internal error xfs_da_do_buf(2) at line 2112 of file fs/xfs/xfs_da_btree.c. Caller 0xffffffff88532826
Dec 22 02:18:36 athena kernel:
Dec 22 02:18:36 athena kernel: Call Trace:
Dec 22 02:18:36 athena kernel: [<ffffffff88532725>] :xfs:xfs_da_do_buf+0x503/0x5b1
Dec 22 02:18:36 athena kernel: [<ffffffff88532826>] :xfs:xfs_da_read_buf+0x16/0x1b
Dec 22 02:18:36 athena kernel: [<ffffffff88532826>] :xfs:xfs_da_read_buf+0x16/0x1b
Dec 22 02:18:36 athena kernel: [<ffffffff885388d5>] :xfs:xfs_dir2_leaf_addname+0x3ae/0x761
Dec 22 02:18:36 athena kernel: [<ffffffff885388d5>] :xfs:xfs_dir2_leaf_addname+0x3ae/0x761
Dec 22 02:18:36 athena kernel: [<ffffffff88534e67>] :xfs:xfs_dir_createname+0x132/0x14e
Dec 22 02:18:36 athena kernel: [<ffffffff8855a66b>] :xfs:xfs_create+0x2be/0x45c
Dec 22 02:18:36 athena kernel: [<ffffffff8851fd3f>] :xfs:xfs_attr_get+0x8e/0x9f
Dec 22 02:18:36 athena kernel: [<ffffffff88563e50>] :xfs:xfs_vn_mknod+0x144/0x215
Dec 22 02:18:36 athena kernel: [<ffffffff8003a5ce>] vfs_create+0xe6/0x158
Dec 22 02:18:36 athena kernel: [<ffffffff8001aeed>] open_namei+0x19d/0x6d5
Dec 22 02:18:36 athena kernel: [<ffffffff8002732f>] do_filp_open+0x1c/0x38
Dec 22 02:18:36 athena kernel: [<ffffffff80019d02>] do_sys_open+0x44/0xbe
Dec 22 02:18:36 athena kernel: [<ffffffff8005d28d>] tracesys+0xd5/0xe0
Dec 22 02:18:36 athena kernel:
Dec 22 02:18:36 athena kernel: Filesystem "md0": XFS internal error xfs_trans_cancel at line 1164 of file fs/xfs/xfs_trans.c. Caller 0xffffffff8855a783
Dec 22 02:18:36 athena kernel:
Dec 22 02:18:36 athena kernel: Call Trace:
Dec 22 02:18:36 athena kernel: [<ffffffff88555b2f>] :xfs:xfs_trans_cancel+0x55/0xfa
Dec 22 02:18:36 athena kernel: [<ffffffff8855a783>] :xfs:xfs_create+0x3d6/0x45c
Dec 22 02:18:36 athena kernel: [<ffffffff8851fd3f>] :xfs:xfs_attr_get+0x8e/0x9f
Dec 22 02:18:36 athena kernel: [<ffffffff88563e50>] :xfs:xfs_vn_mknod+0x144/0x215
Dec 22 02:18:36 athena kernel: [<ffffffff8003a5ce>] vfs_create+0xe6/0x158
Dec 22 02:18:36 athena kernel: [<ffffffff8001aeed>] open_namei+0x19d/0x6d5
Dec 22 02:18:36 athena kernel: [<ffffffff8002732f>] do_filp_open+0x1c/0x38
Dec 22 02:18:36 athena kernel: [<ffffffff80019d02>] do_sys_open+0x44/0xbe
Dec 22 02:18:36 athena kernel: [<ffffffff8005d28d>] tracesys+0xd5/0xe0
Dec 22 02:18:36 athena kernel:
Dec 22 02:18:36 athena kernel: xfs_force_shutdown(md0,0x8) called from line 1165 of file fs/xfs/xfs_trans.c. Return address = 0xffffffff88555b48
TagsNo tags attached.
Attached Files

-Relationships
+Relationships

-Notes

~0010583

antonl (reporter)

Kernel version - 2.6.18-164.9.1.el5.centos.plus #1 SMP Wed Dec 16 11:24:24 EST 2009 x86_64 x86_64 x86_64 GNU/Linux

"Problem free" was Linux version 2.6.18-92.1.22.el5.centos.plus

~0010584

toracat (manager)

Under CentOS 5.2, the xfs kernel module was provided through an external package, kmod-xfs. As of CentOS 5.4 (kernel >= 2.6.18-164), xfs is enabled in the kernel itself and also it is a newer version.

Could you show us the output returned by:

rpm -qa kmod\*

ls -l `find /lib/modules -name xfs.ko`

~0010585

antonl (reporter)

Currently booted under 2.6.18-92.1.22.el5.centos.plus

[root@athena ~]# rpm -qa kmod\*
kmod-xfs-0.4-2

[root@athena ~]# ls -l `find /lib/modules -name xfs.ko`
-rwxr--r-- 1 root root 694704 Dec 16 13:01 /lib/modules/2.6.18-164.9.1.el5.centos.plus/kernel/fs/xfs/xfs.ko
lrwxrwxrwx 1 root root 48 Dec 21 05:32 /lib/modules/2.6.18-164.9.1.el5.centos.plus/weak-updates/xfs/xfs.ko -> /lib/modules/2.6.18-92.1.13.el5/extra/xfs/xfs.ko
-rw-r--r-- 1 root root 697232 Oct 3 2008 /lib/modules/2.6.18-92.1.13.el5/extra/xfs/xfs.ko
lrwxrwxrwx 1 root root 48 Mar 12 2009 /lib/modules/2.6.18-92.1.22.el5.centos.plus/weak-updates/xfs/xfs.ko -> /lib/modules/2.6.18-92.1.13.el5/extra/xfs/xfs.ko
lrwxrwxrwx 1 root root 48 Mar 12 2009 /lib/modules/2.6.18-92.el5/weak-updates/xfs/xfs.ko -> /lib/modules/2.6.18-92.1.13.el5/extra/xfs/xfs.ko

Hope it helps.

~0010586

toracat (manager)

Possibly related to https://bugzilla.redhat.com/show_bug.cgi?id=512552 .

You may want to try dzickus' test kernel referenced in that bugzilla:

http://people.redhat.com/dzickus/el5/

~0010587

toracat (manager)

If that test kernel fixes the problem you are seeing and if the patch is not going to be included in the upstream kernel for a while, we can provide it in the centosplus kernel.

~0010588

antonl (reporter)

I just installed it. Do you want me to remove XFS module?

[root@athena ~]# ls -l `find /lib/modules -name xfs.ko`
-rwxr--r-- 1 root root 694832 Dec 15 21:55 /lib/modules/2.6.18-182.el5/kernel/fs/xfs/xfs.ko
lrwxrwxrwx 1 root root 48 Dec 23 10:57 /lib/modules/2.6.18-182.el5/weak-updates/xfs/xfs.ko -> /lib/modules/2.6.18-92.1.13.el5/extra/xfs/xfs.ko
-rw-r--r-- 1 root root 697232 Oct 3 2008 /lib/modules/2.6.18-92.1.13.el5/extra/xfs/xfs.ko
lrwxrwxrwx 1 root root 48 Mar 12 2009 /lib/modules/2.6.18-92.1.22.el5.centos.plus/weak-updates/xfs/xfs.ko -> /lib/modules/2.6.18-92.1.13.el5/extra/xfs/xfs.ko
lrwxrwxrwx 1 root root 48 Mar 12 2009 /lib/modules/2.6.18-92.el5/weak-updates/xfs/xfs.ko -> /lib/modules/2.6.18-92.1.13.el5/extra/xfs/xfs.ko

~0010589

toracat (manager)

You can leave it for now. The test kernel should be using the in-kernel xfs module. Check with:

/sbin/modinfo xfs

~0010590

antonl (reporter)

[root@athena ~]# modinfo xfs
filename: /lib/modules/2.6.18-182.el5/kernel/fs/xfs/xfs.ko
license: GPL
description: SGI XFS with ACLs, security attributes, large block/inode numbers, no debug enabled
author: Silicon Graphics, Inc.
srcversion: DE0AE7E45DF5E1EA03F6EC6
depends:
vermagic: 2.6.18-182.el5 SMP mod_unload gcc-4.1
module_sig: 883f3504b284765a0558d65b054fb5711254140a09b3827f295ecd199b12d268c9b24353b74efdb0a09270ef3f8498244c1fb5ef669ea5d17058b4a

I will let you know if the issue is resolved in a couple of days.

~0010591

toracat (manager)

Hope that fixes the problem.

~0010609

antonl (reporter)

I consider that the problem is fixed. At least I did not see a crash even after 8 hours long upload process. If the problem returns I will repost it here.
Current kernel is from here http://people.redhat.com/dzickus/el5/ build 182

~0010610

toracat (manager)

Good news. Keep us posted. I will see if I can get the patch(es) into the centosplus kernel.

~0010611

toracat (manager)

It is this patch that fixes the issue (appeared in test kernel -179 and newer):

2312 Dec 10 17:56 linux-2.6-md-raid5-mark-cancelled-readahead-bios-with-eio.patch

From: Eric Sandeen <sandeen@redhat.com>
Date: Tue, 1 Dec 2009 23:24:14 -0500
Subject: [md] raid5: mark cancelled readahead bios with -EIO
Message-id: <4B15A59E.6040602@redhat.com>
Patchwork-id: 21627
O-Subject: [PATCH RHEL5.5] md raid5: mark cancelled readahead bios with -EIO
        error
Bugzilla: 512552
RH-Acked-by: Doug Ledford <dledford@redhat.com>

This is for bug
512552 - Can't write to XFS mount during raid5 resync

~0010617

toracat (manager)

The patch referenced in note 10611 will be added to the next centosplus update if:

(1) it fixes the issue reported here.
(2) it does not appear in the distro kernel until 5.5.

I have built the cplus kernel with the patch and made it available from:

http://centos.toracat.org/kernel/centos5/centosplus-testing/x86_64/

The name is kernel-2.6.18-164.9.1.kvmmd.el5.ayplus . It was built on top of the previous test cplus kernel with kvm fixes (see bug #4058).

So, please test if you can.

~0010619

antonl (reporter)

I am going to install your kernel on a computer with almost identical configuration (it was not upgraded yet due to the failure of the first one). Let us give it a week for testing.

~0010620

antonl (reporter)

[root@vstorage ~]# uname -a
Linux vstorage 2.6.18-164.9.1.kvmmd.el5.ayplus #1 SMP Sat Dec 26 12:28:00 PST 2009 x86_64 x86_64 x86_64 GNU/Linux

~0010621

toracat (manager)

@antonl,

Thanks for testing. We look forward to seeing the result in a week or so. Hope that is all the fix needed.

~0010726

antonl (reporter)

I consider the issue resolved. Since my last note both servers worked fine without any problems reported. Thank you toracat for the quick resolution!

~0010727

toracat (manager)

Thanks for the good news. As if we were waiting for this moment ... the patch is now included in the centosplus kernel update released today ( kernel-2.6.18-164.10.1.el5.centos.plus ). The cplus kernel will continue to provide the fix until the patch finally appears in the distro kernel (possibly in CentOS 5.5).

~0010767

toracat (manager)

Changing the status to "resolved".
+Notes

-Issue History
Date Modified Username Field Change
2009-12-22 22:36 antonl New Issue
2009-12-23 12:47 antonl Note Added: 0010583
2009-12-23 13:06 toracat Note Added: 0010584
2009-12-23 13:52 antonl Note Added: 0010585
2009-12-23 15:35 toracat Note Added: 0010586
2009-12-23 15:41 toracat Status new => acknowledged
2009-12-23 15:41 toracat Category CentOS-5-Plus => kernel
2009-12-23 15:48 toracat Note Added: 0010587
2009-12-23 16:03 antonl Note Added: 0010588
2009-12-23 16:16 toracat Note Added: 0010589
2009-12-23 16:20 antonl Note Added: 0010590
2009-12-23 16:32 toracat Note Added: 0010591
2009-12-25 23:02 antonl Note Added: 0010609
2009-12-26 01:49 toracat Note Added: 0010610
2009-12-26 12:10 toracat Note Added: 0010611
2009-12-28 16:16 toracat Note Added: 0010617
2009-12-28 17:54 antonl Note Added: 0010619
2009-12-28 18:12 antonl Note Added: 0010620
2009-12-28 19:00 toracat Note Added: 0010621
2010-01-09 03:44 antonl Note Added: 0010726
2010-01-09 04:53 toracat Note Added: 0010727
2010-01-14 00:33 toracat Note Added: 0010767
2010-01-14 00:33 toracat Status acknowledged => resolved
2010-01-14 00:33 toracat Resolution open => fixed
+Issue History