View Issue Details

IDProjectCategoryView StatusLast Update
0001201CentOS-4kernelpublic2006-09-21 14:35
Reportersliqua Assigned To 
PrioritynormalSeverityminorReproducibilityalways
Status assignedResolutionopen 
Product Version4.2 - i386 
Summary0001201: Kernel panic - not syncing: fs/block_dev.c
DescriptionOn multiple CentOS 4.2-i386 machines that i'm running with the latest kernel (2.6.9-22.0.2.EL #1) we're experiencing random crashes that require us to reboot the machines at least once a day. I finally kept a serial console session open to try to figure out what error was being outputted at the console-level, as nothing was getting reported in /var/log and the error shown is "root@ttyS0 log # Kernel panic - not syncing: fs/block_dev.c:396: spin_lock(fs/block_dev.c:c035fc80) already locked by fs/block_dev.c/396".
TagsNo tags attached.

Activities

sliqua

sliqua

2006-02-06 16:44

reporter   ~0003139

Oops, forgot to change this bug to "crash" status rather than low.
kbsingh@karan.org

kbsingh@karan.org

2006-02-06 16:57

reporter   ~0003140

on a Test machine, you could try the CentOS4.3Beta kernels. Hosted here :
http://www.karan.org/mock/centos/c4b3kernel/
sliqua

sliqua

2006-02-07 00:21

reporter   ~0003144

Ok, i'm running the Centos 4.3 beta kernels on both machines - i'm already noticing an improvement but will let you know if the panics come back. I'm monitoring both over serial console now and will continue to for the next few days.

2006-02-07 18:17

 

2.6.9-22.0.2.EL.panic.txt (864 bytes)   
[<c01e63b1>] __delay+0x9/0xa
[<c0120ba2>] panic+0x13b/0x13d
[<c017140a>] nr_blockdev_pages+0x6f/0xfa
[<c014c546>] si_meminfo+0x1f/0x3b
[<d8b61d7b>] update_defense_level+0xf/0x332 [ip_vs]
[<d8b6209e>] defense_timer_handler+0x0/0x29 [ip_vs]
[<d8b620a3>] defense_timer_handler+0x5/0x29 [ip_vs]
[<c012b273>] run_timer_softirq+0x1eb/0x2d4
[<co126781>] __do_softirq+0x35/0x79
[<c0109350>] do_softirq+0x46/0x4d
=======================
[<c0108914>] do_IRQ+0x2b3/0x2bf
[<c03102d8>] common_interrupt+0x18/0x20
[<c017144a>] nr_blockdev_pages+0xaf/0xfa
[<c014c546>] si_meminfo+0x1f/0x3b
[<c01a8593>] meminfo_read_proc+0x41/0x191
[<c018189f>] dput+0x33/0x423
[<c014bf59>] buffered_rmqueue+0x1c4/0x1e7
[<c01fc051>] __alloc_pages+0xd5/0x2f7
[<c01a636b>] proc_file_read+0x115/0x269
[<c0168812>] vfs_read+0xb6/0xe2
[<c0168a25>] sys_read+0x3c/0x62
[<c0310193>] syscall_call+0x7/0xb
2.6.9-22.0.2.EL.panic.txt (864 bytes)   

2006-02-07 18:17

 

2.6.9-27.EL-beta.panic.txt (913 bytes)   
[<c01e67d5>] __delay+0x9/0xa
[<c0120f26>] panic+0x13b/0x13d
[<c01717ae>] nr_blockdev_pages+0x6f/0xfa
[<c014c7f6>] si_meminfo+0x1f/0x3b
[<d097dd7b>] update_defense_level+0xf/0x332 [ip_vs]
[<c011c50c>] activate_task+0x53/0x5f
[<d097e09e>] defense_timer_handler+0x0/0x29 [ip_vs]
[<d097e0a3>] defense_timer_handler+0x5/0x29 [ip_vs]
[<c012b61b>] run_timer_softirq+0x1eb/0x2d4
[<c0126b29>] __do_softirq+0x35/0x79
[<c010934c>] do_softirq+0x46/0x4d
=======================
[<c0108910>] do_IRQ+0x2b3/0x2bf
[<c03116f4>] common_interrupt+0x18/0x20
[<c01717eb>] nr_blockdev_pages+0xac/0xfa
[<c014c7f6>] si_meminfo+0x1f/0x3b
[<c01a8caf>] meminfo_read_proc+0x41/0x191
[<c0178ddb>] __link_path_walk+0xce0/0xd98
[<c014c209>] buffered_rmqueue+0x1c4/0x1e7
[<c014c301>] __alloc_pages+0xd5/0x2f7
[<c01a691f>] proc_file_read+0xd1/0x225
[<c0168bb2>] vfs_read+0xb6/0xe2
[<c0168dc5>] sys_read+0x3c/0x62
[>c0311fa5>] syscall_call+0x7/0xb
2.6.9-27.EL-beta.panic.txt (913 bytes)   
socheat

socheat

2006-02-07 18:28

reporter   ~0003153

I installed the beta kernel (2.6.9-27) on one of our boxes, and the same kernel panic happened last night. Another box, running the most recent stable kernel (2.6.9-22.0.2) also panicked again last night. I did an Alt+SysRq+q on both boxes, and was able to get the last page of output. I've attached those two files.

This seems to be similar to the following reports I've found:
http://archive.linuxvirtualserver.org/html/lvs-users/2005-11/msg00280.html
http://lkml.org/lkml/2004/11/24/375

Original post on lkml.org: http://lkml.org/lkml/2004/10/4/11
socheat

socheat

2006-02-13 23:34

reporter   ~0003189

Update: from the previous lkml.org thread I posted, I found the corresponding patches, and incorporated them into a custom kernel built from the official CentOS 2.6.9-22 source RPM. We've put it on several client boxes and about 10 of our own, and they have all been running crash free for over a week now. Here are the two patches I used:

http://kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.9/2.6.9-mm1/broken-out/ipvs-deadlock-fix.patch
http://kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.9/2.6.9-mm1/broken-out/cancel_rearming_delayed_work.patch

I wasn't sure if these patches were included in the beta kernel Karan suggested or not. Just an FYI.
sliqua

sliqua

2006-02-19 17:23

reporter   ~0003206

I experienced a kernel panic after using the kernel from http://www.karan.org/mock/centos/c4b3kernel/, but wasn't able to catch the output. It works fine on a p4 2.8ghz machine, but panics on my Opteron machines (running in 32bit mode). After installing the kernel Socheat set me up with, i've experienced no problems at this point on the Opterons.

-Alex
tmellon

tmellon

2006-04-20 05:13

reporter   ~0003417

We're having a number of issues regarding spin_lock errors on v4.3 - with the 2.6.9-34EL kernel (latest from yum).

Anyone have a solution for this?
kbsingh@karan.org

kbsingh@karan.org

2006-04-20 10:44

reporter   ~0003419

tmellon, consider opening a new bug report for a new issue. its easier to track there.
JohnnyHughes

JohnnyHughes

2006-04-20 11:25

administrator   ~0003420

OK ... one thing I want to make clear here ...

The upstream kernel is rebuilt for the standard CentOS-4, we won't be changing that.

We can help the upstream people by providing patches that fix certain problems to them for inclusion in upcoming kernels.

We already know about a couple of spinlock problems.

Let me see what I can build ....
JohnnyHughes

JohnnyHughes

2006-04-20 12:24

administrator   ~0003421

concerning #0003189

I am compiling a new test kernel (2.6.9-34.19.EL) that contains many patches ... including

1. ipvs-deadlock-fix.patch is included in linux-2.6.12-network.patch
2. cancel_rearming_delayed_work.patch .. this is not addressed, but please try the new kernel and see if it is still a problem.

concerning #0003417

It contains a change to linux-2.6.9-spinlock-define.patch as well.

And many other changes ... I'll post the fixes to an e-mail to the CentOS-Devel mailing list and this bug.
SagoMax

SagoMax

2006-04-25 00:31

reporter   ~0003434

Last edited: 2006-04-25 19:20

I work for a dedicated server data center. We easily house thousands of servers with various operating systems. We first encountered this issue back in February with one of our customers. We have found no solution to this issue yet.

Recently the problem has begun to crop up more and more and is becoming a really big problem. Any help would be greatly appreciated in solving this! We have attempted to apply the patch listed here with no luck.

Any word from CentOS when we might see an official fix for this issue?

socheat

socheat

2006-09-21 14:35

reporter   ~0004002

I've confirmed that this problem has been resolved in the 2.6.9-42.0.2.EL kernel. Thanks!

Issue History

Date Modified Username Field Change
2006-02-06 16:43 sliqua New Issue
2006-02-06 16:43 sliqua Status new => assigned
2006-02-06 16:44 sliqua Note Added: 0003139
2006-02-06 16:57 kbsingh@karan.org Note Added: 0003140
2006-02-07 00:21 sliqua Note Added: 0003144
2006-02-07 18:17 socheat File Added: 2.6.9-22.0.2.EL.panic.txt
2006-02-07 18:17 socheat File Added: 2.6.9-27.EL-beta.panic.txt
2006-02-07 18:28 socheat Note Added: 0003153
2006-02-13 23:34 socheat Note Added: 0003189
2006-02-19 17:23 sliqua Note Added: 0003206
2006-04-20 05:13 tmellon Note Added: 0003417
2006-04-20 10:44 kbsingh@karan.org Note Added: 0003419
2006-04-20 11:25 JohnnyHughes Note Added: 0003420
2006-04-20 12:24 JohnnyHughes Note Added: 0003421
2006-04-25 00:31 SagoMax Note Added: 0003434
2006-04-25 19:20 SagoMax Note Edited: 0003434
2006-09-21 14:35 socheat Note Added: 0004002