View Issue Details

IDProjectCategoryView StatusLast Update
0004515CentOS-5kernelpublic2013-02-18 08:58
Reporteraajn 
PrioritynormalSeveritycrashReproducibilityunable to reproduce
Status newResolutionopen 
Product Version5.5 
Target VersionFixed in Version 
Summary0004515: Server hangs, processes being blocked for more than 120 seconds
DescriptionServer hangs, processes being blocked for more than 120 seconds
hp dl385 g6, hardware raid on cciss

Sep 19 02:06:10 xxx kernel: INFO: task kjournald:2296 blocked for more than 120 seconds.

Sep 19 02:06:10 xxx kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

Sep 19 02:06:10 xxx kernel: kjournald D ffff810001025e20 0 2296 97 2557 2287 (L-TLB)

Sep 19 02:06:10 xxx kernel: ffff81022f071cf0 0000000000000046 0000000000000086 ffffffff8003da83

Sep 19 02:06:10 xxx kernel: 0000000000000000 000000000000000a ffff81022aa3a100 ffff81022fe2a080

Sep 19 02:06:10 xxx kernel: 0001c465c04872e3 000000000000202d ffff81022aa3a2e8 000000042fa7a478

Sep 19 02:06:10 xxx kernel: Call Trace:

Sep 19 02:06:10 xxx kernel: [<ffffffff8003da83>] lock_timer_base+0x1b/0x3c

Sep 19 02:06:10 xxx kernel: [<ffffffff8006e1db>] do_gettimeofday+0x40/0x90

Sep 19 02:06:10 xxx kernel: [<ffffffff8001552b>] sync_buffer+0x0/0x3f

Sep 19 02:06:10 xxx kernel: [<ffffffff800637ea>] io_schedule+0x3f/0x67

Sep 19 02:06:10 xxx kernel: [<ffffffff80015566>] sync_buffer+0x3b/0x3f

Sep 19 02:06:10 xxx kernel: [<ffffffff80063a16>] __wait_on_bit+0x40/0x6e

Sep 19 02:06:10 xxx kernel: [<ffffffff8001552b>] sync_buffer+0x0/0x3f

Sep 19 02:06:10 xxx kernel: [<ffffffff80063ab0>] out_of_line_wait_on_bit+0x6c/0x78

Sep 19 02:06:10 xxx kernel: [<ffffffff800a0a06>] wake_bit_function+0x0/0x23

Sep 19 02:06:10 xxx kernel: [<ffffffff880339a5>] :jbd:journal_commit_transaction+0x543/0x1066

Sep 19 02:06:10 xxx kernel: [<ffffffff8003da83>] lock_timer_base+0x1b/0x3c

Sep 19 02:06:10 xxx kernel: [<ffffffff8004b132>] try_to_del_timer_sync+0x7f/0x88

Sep 19 02:06:10 xxx kernel: [<ffffffff880375d3>] :jbd:kjournald+0xc1/0x213

Sep 19 02:06:10 xxx kernel: [<ffffffff800a09d8>] autoremove_wake_function+0x0/0x2e

Sep 19 02:06:10 xxx kernel: [<ffffffff800a07c0>] keventd_create_kthread+0x0/0xc4

Sep 19 02:06:10 xxx kernel: [<ffffffff88037512>] :jbd:kjournald+0x0/0x213

Sep 19 02:06:10 xxx kernel: [<ffffffff800a07c0>] keventd_create_kthread+0x0/0xc4

Sep 19 02:06:10 xxx kernel: [<ffffffff8003290a>] kthread+0xfe/0x132

Sep 19 02:06:10 xxx kernel: [<ffffffff8005dfb1>] child_rip+0xa/0x11

Sep 19 02:06:10 xxx kernel: [<ffffffff800a07c0>] keventd_create_kthread+0x0/0xc4

Sep 19 02:06:10 xxx kernel: [<ffffffff8003280c>] kthread+0x0/0x132

Sep 19 02:06:10 xxx kernel: [<ffffffff8005dfa7>] child_rip+0x0/0x11

Sep 19 02:06:10 xxx kernel:



Sep 19 02:08:10 xxx kernel: INFO: task kjournald:2296 blocked for more than 120 seconds.

Sep 19 02:08:10 xxx kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

Sep 19 02:08:10 xxx kernel: kjournald D ffff810001025e20 0 2296 97 2557 2287 (L-TLB)

Sep 19 02:08:10 xxx kernel: ffff81022f071cf0 0000000000000046 0000000000000086 ffffffff8003da83

Sep 19 02:08:10 xxx kernel: 0000000000000000 000000000000000a ffff81022aa3a100 ffff81022fe2a080

Sep 19 02:08:10 xxx kernel: 0001c465c04872e3 000000000000202d ffff81022aa3a2e8 000000042fa7a478

Sep 19 02:08:10 xxx kernel: Call Trace:

Sep 19 02:08:10 xxx kernel: [<ffffffff8003da83>] lock_timer_base+0x1b/0x3c

Sep 19 02:08:10 xxx kernel: [<ffffffff8006e1db>] do_gettimeofday+0x40/0x90

Sep 19 02:08:10 xxx kernel: [<ffffffff8001552b>] sync_buffer+0x0/0x3f

Sep 19 02:08:10 xxx kernel: [<ffffffff800637ea>] io_schedule+0x3f/0x67

Sep 19 02:08:10 xxx kernel: [<ffffffff80015566>] sync_buffer+0x3b/0x3f

Sep 19 02:08:10 xxx kernel: [<ffffffff80063a16>] __wait_on_bit+0x40/0x6e

Sep 19 02:08:10 xxx kernel: [<ffffffff8001552b>] sync_buffer+0x0/0x3f

Sep 19 02:08:10 xxx kernel: [<ffffffff80063ab0>] out_of_line_wait_on_bit+0x6c/0x78

Sep 19 02:08:10 xxx kernel: [<ffffffff800a0a06>] wake_bit_function+0x0/0x23

Sep 19 02:08:10 xxx kernel: [<ffffffff880339a5>] :jbd:journal_commit_transaction+0x543/0x1066

Sep 19 02:08:10 xxx kernel: [<ffffffff8003da83>] lock_timer_base+0x1b/0x3c

Sep 19 02:08:10 xxx kernel: [<ffffffff8004b132>] try_to_del_timer_sync+0x7f/0x88

Sep 19 02:08:10 xxx kernel: [<ffffffff880375d3>] :jbd:kjournald+0xc1/0x213

Sep 19 02:08:10 xxx kernel: [<ffffffff800a09d8>] autoremove_wake_function+0x0/0x2e

Sep 19 02:08:11 xxx kernel: [<ffffffff800a07c0>] keventd_create_kthread+0x0/0xc4

Sep 19 02:08:11 xxx kernel: [<ffffffff88037512>] :jbd:kjournald+0x0/0x213

Sep 19 02:08:11 xxx kernel: [<ffffffff800a07c0>] keventd_create_kthread+0x0/0xc4

Sep 19 02:08:11 xxx kernel: [<ffffffff8003290a>] kthread+0xfe/0x132

Sep 19 02:08:11 xxx kernel: [<ffffffff8005dfb1>] child_rip+0xa/0x11

Sep 19 02:08:11 xxx kernel: [<ffffffff800a07c0>] keventd_create_kthread+0x0/0xc4

Sep 19 02:08:11 xxx kernel: [<ffffffff8003280c>] kthread+0x0/0x132

Sep 19 02:08:11 xxx kernel: [<ffffffff8005dfa7>] child_rip+0x0/0x11

Sep 19 02:08:11 xxx kernel:
TagsNo tags attached.

Activities

Spock

Spock

2010-09-27 01:44

reporter   ~0011893

I can confirm this, I've had it happen a couple of times over the past several days.

According to similar bug reports on other distros it is a problem in the kernel scheduler that needs patching.
intel352

intel352

2010-10-05 13:47

reporter   ~0011904

Same issue for me, using kernel 2.6.18-194.11.3.el5xen
toracat

toracat

2010-10-05 15:53

manager   ~0011906

Related bugzilla entries:

https://bugzilla.redhat.com/show_bug.cgi?id=605444
https://bugzilla.redhat.com/show_bug.cgi?id=605207

Users with the affected system may want to have their voice heard?
intel352

intel352

2010-10-05 18:08

reporter   ~0011907

Posted at the 2nd link (as that seemed most related).
Thanks toracat.
aajn

aajn

2010-11-01 09:53

reporter   ~0011993

now im using 2.6.18-194.17.1.el5, it's not solved the smae problem again:
Oct 31 21:37:08 xxx kernel: INFO: task kjournald:2291 blocked for more than 120 seconds.
Oct 31 21:37:08 xxx kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 31 21:37:08 xxx kernel: kjournald D ffff81022fac3820 0 2291 97 2543 2288 (L-TLB)
Oct 31 21:37:08 xxx kernel: ffff81022b693cf0 0000000000000046 ffff81022e78d7a0 ffff81019a9010c0
Oct 31 21:37:08 xxx kernel: 00057f0daaa843e8 000000000000000a ffff81022e78d7a0 ffff81022fac3820
Oct 31 21:37:08 xxx kernel: 00057f0ffe11e747 0000000000002451 ffff81022e78d988 000000022c64bc78
Oct 31 21:37:08 xxx kernel: Call Trace:
Oct 31 21:37:08 xxx kernel: [<ffffffff8006e1d7>] do_gettimeofday+0x40/0x90
Oct 31 21:37:08 xxx kernel: [<ffffffff8005a7d6>] getnstimeofday+0x10/0x28
Oct 31 21:37:08 xxx kernel: [<ffffffff8001552b>] sync_buffer+0x0/0x3f
Oct 31 21:37:08 xxx kernel: [<ffffffff800637ea>] io_schedule+0x3f/0x67
Oct 31 21:37:08 xxx kernel: [<ffffffff80015566>] sync_buffer+0x3b/0x3f
Oct 31 21:37:08 xxx kernel: [<ffffffff80063a16>] __wait_on_bit+0x40/0x6e
Oct 31 21:37:08 xxx kernel: [<ffffffff8001552b>] sync_buffer+0x0/0x3f
Oct 31 21:37:08 xxx kernel: [<ffffffff80063ab0>] out_of_line_wait_on_bit+0x6c/0x78
Oct 31 21:37:08 xxx kernel: [<ffffffff800a0a02>] wake_bit_function+0x0/0x23
Oct 31 21:37:08 xxx kernel: [<ffffffff880339a5>] :jbd:journal_commit_transaction+0x543/0x1066
Oct 31 21:37:08 xxx kernel: [<ffffffff8003da83>] lock_timer_base+0x1b/0x3c
Oct 31 21:37:08 xxx kernel: [<ffffffff8004b132>] try_to_del_timer_sync+0x7f/0x88
Oct 31 21:37:08 xxx kernel: [<ffffffff880375d3>] :jbd:kjournald+0xc1/0x213
Oct 31 21:37:08 xxx kernel: [<ffffffff800a09d4>] autoremove_wake_function+0x0/0x2e
Oct 31 21:37:08 xxx kernel: [<ffffffff800a07bc>] keventd_create_kthread+0x0/0xc4
Oct 31 21:37:08 xxx kernel: [<ffffffff88037512>] :jbd:kjournald+0x0/0x213
Oct 31 21:37:08 xxx kernel: [<ffffffff800a07bc>] keventd_create_kthread+0x0/0xc4
Oct 31 21:37:08 xxx kernel: [<ffffffff8003290a>] kthread+0xfe/0x132
Oct 31 21:37:08 xxx kernel: [<ffffffff8005dfb1>] child_rip+0xa/0x11
Oct 31 21:37:08 xxx kernel: [<ffffffff800a07bc>] keventd_create_kthread+0x0/0xc4
Oct 31 21:37:08 xxx kernel: [<ffffffff8003280c>] kthread+0x0/0x132
Oct 31 21:37:08 xxx kernel: [<ffffffff8005dfa7>] child_rip+0x0/0x11
Oct 31 21:37:08 xxx kernel:
robinbowes

robinbowes

2010-11-06 22:32

reporter   ~0012025

"me too"

We're seeing similar lock ups on kernel-2.6.18-194.17.1.el5

Using cfq scheduler. Have switched one machine to noop to see if that makes any difference.
dhzh

dhzh

2011-01-07 05:40

reporter   ~0012297

I have just observed this as well. Had to reboot server.
devurandom

devurandom

2011-01-26 13:05

reporter   ~0012371

CentOS release 5.5 (Final)
Kernel 2.6.18-194.el5xen on an x86_64

INFO: task yum-complete-tr:1709 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
yum-complete- D ffff88001f500e00 0 1709 1684 (NOTLB)
 ffff880003a53d88 0000000000000286 ffff880003a53ce8 ffffffff80288a48
 0000000000000007 ffff88001cd27080 ffff88001f4217e0 0000000000001852
 ffff88001cd27268 ffffffff802490c3
Call Trace:
 [<ffffffff80288a48>] __activate_task+0x56/0x6d
 [<ffffffff802490c3>] try_to_wake_up+0x392/0x3a4
 [<ffffffff88036d5a>] :jbd:log_wait_commit+0xa3/0xf5
 [<ffffffff8029c60a>] autoremove_wake_function+0x0/0x2e
 [<ffffffff8803178a>] :jbd:journal_stop+0x1cf/0x1ff
 [<ffffffff8023138e>] __writeback_single_inode+0x1e9/0x328
 [<ffffffff802e555b>] sync_inode+0x24/0x33
 [<ffffffff8804c36d>] :ext3:ext3_sync_file+0xc9/0xdc
 [<ffffffff80252276>] do_fsync+0x52/0xa4
 [<ffffffff802d37f5>] __do_fsync+0x23/0x36
 [<ffffffff802602f9>] tracesys+0xab/0xb6

INFO: task syslogd:1229 blocked for more than 120 seconds.

"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syslogd D ffff88001fbdf7a0 0 1229 1 1232 1221 (NOTLB)
 ffff880016b55d88 0000000000000286 0000000000000000 0000000000000001
 0000000000000009 ffff88001fbdf7a0 ffff88001cd27080 00000000000038cd
 ffff88001fbdf988 ffff88001f4217e0
Call Trace:
 [<ffffffff88036d5a>] :jbd:log_wait_commit+0xa3/0xf5
 [<ffffffff8029c60a>] autoremove_wake_function+0x0/0x2e
 [<ffffffff8803178a>] :jbd:journal_stop+0x1cf/0x1ff
 [<ffffffff8023138e>] __writeback_single_inode+0x1e9/0x328
 [<ffffffff802d2ff1>] do_readv_writev+0x26e/0x291
 [<ffffffff802e555b>] sync_inode+0x24/0x33
 [<ffffffff8804c36d>] :ext3:ext3_sync_file+0xc9/0xdc
 [<ffffffff80252276>] do_fsync+0x52/0xa4
 [<ffffffff802d37f5>] __do_fsync+0x23/0x36
 [<ffffffff802602f9>] tracesys+0xab/0xb6

INFO: task yum-complete-tr:1709 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
yum-complete- D ffff88000182b460 0 1709 1684 (NOTLB)
 ffff880003a53d88 0000000000000286 ffff880003a53d98 ffffffff80262dd3
 0000000000000007 ffff88001cd27080 ffffffff804f4b80 0000000000001393
 ffff88001cd27268 ffff88001fbdf7a0
Call Trace:
 [<ffffffff80262dd3>] thread_return+0x6c/0x113
 [<ffffffff88036d5a>] :jbd:log_wait_commit+0xa3/0xf5
 [<ffffffff8029c60a>] autoremove_wake_function+0x0/0x2e
 [<ffffffff8803178a>] :jbd:journal_stop+0x1cf/0x1ff
 [<ffffffff8023138e>] __writeback_single_inode+0x1e9/0x328
 [<ffffffff802e555b>] sync_inode+0x24/0x33
 [<ffffffff8804c36d>] :ext3:ext3_sync_file+0xc9/0xdc
 [<ffffffff80252276>] do_fsync+0x52/0xa4
 [<ffffffff802d37f5>] __do_fsync+0x23/0x36
 [<ffffffff802602f9>] tracesys+0xab/0xb6

INFO: task yum:2957 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
yum D ffff88000182b460 0 2957 1684 (NOTLB)
 ffff88000d325d88 0000000000000286 ffff88000d325d98 ffffffff80262dd3

 0000000000000007 ffff88001cd27080 ffffffff804f4b80 0000000000000600
 ffff88001cd27268 000000000000c85e
Call Trace:
 [<ffffffff80262dd3>] thread_return+0x6c/0x113
 [<ffffffff88036d5a>] :jbd:log_wait_commit+0xa3/0xf5
 [<ffffffff8029c60a>] autoremove_wake_function+0x0/0x2e
 [<ffffffff8803178a>] :jbd:journal_stop+0x1cf/0x1ff
 [<ffffffff8023138e>] __writeback_single_inode+0x1e9/0x328
 [<ffffffff802e555b>] sync_inode+0x24/0x33
 [<ffffffff8804c36d>] :ext3:ext3_sync_file+0xc9/0xdc
 [<ffffffff80252276>] do_fsync+0x52/0xa4
 [<ffffffff802d37f5>] __do_fsync+0x23/0x36
 [<ffffffff802602f9>] tracesys+0xab/0xb6
ngochai

ngochai

2011-03-02 04:28

reporter   ~0012465

I'm seeing this as well

uname -a
Linux ec2-175-41-141-122.ap-southeast-1.compute.amazonaws.com 2.6.18-194.el5xen #1 SMP Fri Apr 2 15:34:40 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux

=======================

INFO: task java:26116 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
java D 001964078659265a 0 26116 1 28137 (NOTLB)
 ffff880393351d88 0000000000000286 0000000000000000 0000000000000001
 0000000000000008 ffff8801e6e9d860 ffff880013bbb080 00000000000468d3
 ffff8801e6e9da48 0000000000000000
Call Trace:
 [<ffffffff880d8d5a>] :jbd:log_wait_commit+0xa3/0xf5
 [<ffffffff8029c60a>] autoremove_wake_function+0x0/0x2e
 [<ffffffff880d378a>] :jbd:journal_stop+0x1cf/0x1ff
 [<ffffffff8023138e>] __writeback_single_inode+0x1e9/0x328
 [<ffffffff802e555b>] sync_inode+0x24/0x33
 [<ffffffff880ee36d>] :ext3:ext3_sync_file+0xc9/0xdc
 [<ffffffff80252276>] do_fsync+0x52/0xa4
 [<ffffffff802d37f5>] __do_fsync+0x23/0x36
 [<ffffffff802602f9>] tracesys+0xab/0xb6

INFO: task java:26116 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
java D 001964078659265a 0 26116 1 26694 28137 (NOTLB)
 ffff880393351d88 0000000000000286 0000000000000000 0000000000000001
 0000000000000008 ffff8801e6e9d860 ffff880013bbb080 00000000000468d3
 ffff8801e6e9da48 0000000000000000
Call Trace:
 [<ffffffff880d8d5a>] :jbd:log_wait_commit+0xa3/0xf5
 [<ffffffff8029c60a>] autoremove_wake_function+0x0/0x2e
 [<ffffffff880d378a>] :jbd:journal_stop+0x1cf/0x1ff
 [<ffffffff8023138e>] __writeback_single_inode+0x1e9/0x328
 [<ffffffff802e555b>] sync_inode+0x24/0x33
 [<ffffffff880ee36d>] :ext3:ext3_sync_file+0xc9/0xdc
 [<ffffffff80252276>] do_fsync+0x52/0xa4
 [<ffffffff802d37f5>] __do_fsync+0x23/0x36
 [<ffffffff802602f9>] tracesys+0xab/0xb6

INFO: task postmaster:5379 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
postmaster D 001e58b2abe47efa 0 5379 4470 5382 4478 (NOTLB)
 ffff88010e831ec8 0000000000000286 ffff88000ebf5a30 ffff88000fb88118
 0000000000000009 ffff88027ba90820 ffffffff804f4b80 0000000000001026
 ffff88027ba90a08 ffff88000a695f78
Call Trace:
 [<ffffffff80263a7e>] __mutex_lock_slowpath+0x60/0x9b
 [<ffffffff80263ac8>] .text.lock.mutex+0xf/0x14
 [<ffffffff80252265>] do_fsync+0x41/0xa4
 [<ffffffff802d37f5>] __do_fsync+0x23/0x36
 [<ffffffff802602f9>] tracesys+0xab/0xb6

INFO: task postmaster:5576 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
postmaster D ffffffff80214687 0 5576 4470 5573 (NOTLB)
 ffff880056e8fea8 0000000000000286 00000000394449f8 ffff8803be62d540
 0000000000000009 ffff88012b2900c0 ffff8801d21e17e0 000000000023f742
 ffff88012b2902a8 00000000ffffff9c
Call Trace:
 [<ffffffff802676cd>] do_page_fault+0xfae/0x12e0
 [<ffffffff80263a7e>] __mutex_lock_slowpath+0x60/0x9b
 [<ffffffff80263ac8>] .text.lock.mutex+0xf/0x14
 [<ffffffff802146b1>] generic_file_llseek+0x2a/0x8b
 [<ffffffff80225ca6>] sys_lseek+0x40/0x60
 [<ffffffff80260295>] tracesys+0x47/0xb6
 [<ffffffff802602f9>] tracesys+0xab/0xb6

INFO: task kjournald:406 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kjournald D 0021737bc22fc011 0 406 7 428 380 (L-TLB)
 ffff8803be825cf0 0000000000000246 ffff88000000a040 ffff8803bf6f8048
 000000000000000a ffff8803bed9f040 ffffffff804f4b80 000000000000115e
 ffff8803bed9f228 ffffffff80264931
Call Trace:
 [<ffffffff80264931>] _spin_lock_irqsave+0x9/0x14
 [<ffffffff80215caa>] sync_buffer+0x0/0x3f
 [<ffffffff802635f9>] io_schedule+0x3f/0x67
 [<ffffffff80215ce5>] sync_buffer+0x3b/0x3f
 [<ffffffff80263825>] __wait_on_bit+0x40/0x6e
 [<ffffffff80215caa>] sync_buffer+0x0/0x3f
 [<ffffffff802638bf>] out_of_line_wait_on_bit+0x6c/0x78
 [<ffffffff8029c638>] wake_bit_function+0x0/0x23
 [<ffffffff880d5971>] :jbd:journal_commit_transaction+0x543/0x106a
 [<ffffffff80264931>] _spin_lock_irqsave+0x9/0x14
 [<ffffffff8023fa20>] lock_timer_base+0x1b/0x3c
 [<ffffffff8024d52c>] try_to_del_timer_sync+0x7f/0x88
 [<ffffffff880d95a3>] :jbd:kjournald+0xc1/0x213
 [<ffffffff8029c60a>] autoremove_wake_function+0x0/0x2e
 [<ffffffff8029c3f2>] keventd_create_kthread+0x0/0xc4
 [<ffffffff880d94e2>] :jbd:kjournald+0x0/0x213
 [<ffffffff8029c3f2>] keventd_create_kthread+0x0/0xc4
 [<ffffffff80233e47>] kthread+0xfe/0x132
 [<ffffffff80260b2c>] child_rip+0xa/0x12
 [<ffffffff8029c3f2>] keventd_create_kthread+0x0/0xc4
 [<ffffffff80233d49>] kthread+0x0/0x132
 [<ffffffff80260b22>] child_rip+0x0/0x12

INFO: task kjournald:877 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kjournald D ffff880001027760 0 877 7 1338 829 (L-TLB)
 ffff8803be077cf0 0000000000000246 ffff8800bd80f140 000000001e0d2037
 000000000000000a ffff8803bf6df820 ffff88011f5ec7e0 000000000000119a
 ffff8803bf6dfa08 ffffffff80264931
Call Trace:
 [<ffffffff80264931>] _spin_lock_irqsave+0x9/0x14
 [<ffffffff8026f753>] do_gettimeofday+0x1f8/0x213
 [<ffffffff80215caa>] sync_buffer+0x0/0x3f
 [<ffffffff8025d117>] getnstimeofday+0x10/0x28
 [<ffffffff80215caa>] sync_buffer+0x0/0x3f
 [<ffffffff802635f9>] io_schedule+0x3f/0x67
 [<ffffffff80215ce5>] sync_buffer+0x3b/0x3f
 [<ffffffff80263825>] __wait_on_bit+0x40/0x6e
 [<ffffffff80215caa>] sync_buffer+0x0/0x3f
 [<ffffffff802638bf>] out_of_line_wait_on_bit+0x6c/0x78
 [<ffffffff8029c638>] wake_bit_function+0x0/0x23
 [<ffffffff880d5971>] :jbd:journal_commit_transaction+0x543/0x106a
 [<ffffffff80264931>] _spin_lock_irqsave+0x9/0x14
 [<ffffffff8023fa20>] lock_timer_base+0x1b/0x3c
 [<ffffffff8024d52c>] try_to_del_timer_sync+0x7f/0x88
 [<ffffffff880d95a3>] :jbd:kjournald+0xc1/0x213
 [<ffffffff8029c60a>] autoremove_wake_function+0x0/0x2e
 [<ffffffff8029c3f2>] keventd_create_kthread+0x0/0xc4
 [<ffffffff880d94e2>] :jbd:kjournald+0x0/0x213
 [<ffffffff8029c3f2>] keventd_create_kthread+0x0/0xc4
 [<ffffffff80233e47>] kthread+0xfe/0x132
 [<ffffffff80260b2c>] child_rip+0xa/0x12
 [<ffffffff8029c3f2>] keventd_create_kthread+0x0/0xc4
 [<ffffffff80233d49>] kthread+0x0/0x132
 [<ffffffff80260b22>] child_rip+0x0/0x12

INFO: task postmaster:14341 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
postmaster D 0000000000000000 0 14341 4470 14387 14146 (NOTLB)
 ffff8802814cfd88 0000000000000286 ffff8803bf572040 00000700000001ec
 000000000000000a ffff88026cfee0c0 ffff88024c776820 000000000007dbbf
 ffff88026cfee2a8 ffffffff80264931
Call Trace:
 [<ffffffff80264931>] _spin_lock_irqsave+0x9/0x14
 [<ffffffff802291d0>] sync_page+0x0/0x43
 [<ffffffff8025d117>] getnstimeofday+0x10/0x28
 [<ffffffff802291d0>] sync_page+0x0/0x43
 [<ffffffff802635f9>] io_schedule+0x3f/0x67
 [<ffffffff8022920e>] sync_page+0x3e/0x43
 [<ffffffff80263825>] __wait_on_bit+0x40/0x6e
 [<ffffffff8023695b>] wait_on_page_bit+0x6c/0x72
 [<ffffffff8029c638>] wake_bit_function+0x0/0x23
 [<ffffffff8024a136>] pagevec_lookup_tag+0x1a/0x21
 [<ffffffff8024c49c>] wait_on_page_writeback_range+0x62/0x12e
 [<ffffffff8025d56b>] do_writepages+0x29/0x2f
 [<ffffffff80251b4a>] __filemap_fdatawrite_range+0x50/0x5b
 [<ffffffff80252294>] do_fsync+0x70/0xa4
 [<ffffffff802d37f5>] __do_fsync+0x23/0x36
 [<ffffffff802602f9>] tracesys+0xab/0xb6

INFO: task kjournald:406 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kjournald D 0021737bc22fc011 0 406 7 428 380 (L-TLB)
 ffff8803be825cf0 0000000000000246 ffff88000000a040 ffff8803bf6f8048
 000000000000000a ffff8803bed9f040 ffffffff804f4b80 000000000000115e
 ffff8803bed9f228 ffffffff80264931
Call Trace:
 [<ffffffff80264931>] _spin_lock_irqsave+0x9/0x14
 [<ffffffff80215caa>] sync_buffer+0x0/0x3f
 [<ffffffff802635f9>] io_schedule+0x3f/0x67
 [<ffffffff80215ce5>] sync_buffer+0x3b/0x3f
 [<ffffffff80263825>] __wait_on_bit+0x40/0x6e
 [<ffffffff80215caa>] sync_buffer+0x0/0x3f
 [<ffffffff802638bf>] out_of_line_wait_on_bit+0x6c/0x78
 [<ffffffff8029c638>] wake_bit_function+0x0/0x23
 [<ffffffff880d5971>] :jbd:journal_commit_transaction+0x543/0x106a
 [<ffffffff80264931>] _spin_lock_irqsave+0x9/0x14
 [<ffffffff8023fa20>] lock_timer_base+0x1b/0x3c
 [<ffffffff8024d52c>] try_to_del_timer_sync+0x7f/0x88
 [<ffffffff880d95a3>] :jbd:kjournald+0xc1/0x213
 [<ffffffff8029c60a>] autoremove_wake_function+0x0/0x2e
 [<ffffffff8029c3f2>] keventd_create_kthread+0x0/0xc4
 [<ffffffff880d94e2>] :jbd:kjournald+0x0/0x213
 [<ffffffff8029c3f2>] keventd_create_kthread+0x0/0xc4
 [<ffffffff80233e47>] kthread+0xfe/0x132
 [<ffffffff80260b2c>] child_rip+0xa/0x12
 [<ffffffff8029c3f2>] keventd_create_kthread+0x0/0xc4
 [<ffffffff80233d49>] kthread+0x0/0x132
 [<ffffffff80260b22>] child_rip+0x0/0x12

INFO: task kjournald:877 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kjournald D ffff880001027760 0 877 7 1338 829 (L-TLB)
 ffff8803be077cf0 0000000000000246 ffff8800bd80f140 000000001e0d2037
 000000000000000a ffff8803bf6df820 ffff88011f5ec7e0 000000000000119a
 ffff8803bf6dfa08 ffffffff80264931
Call Trace:
 [<ffffffff80264931>] _spin_lock_irqsave+0x9/0x14
 [<ffffffff8026f753>] do_gettimeofday+0x1f8/0x213
 [<ffffffff80215caa>] sync_buffer+0x0/0x3f
 [<ffffffff8025d117>] getnstimeofday+0x10/0x28
 [<ffffffff80215caa>] sync_buffer+0x0/0x3f
 [<ffffffff802635f9>] io_schedule+0x3f/0x67
 [<ffffffff80215ce5>] sync_buffer+0x3b/0x3f
 [<ffffffff80263825>] __wait_on_bit+0x40/0x6e
 [<ffffffff80215caa>] sync_buffer+0x0/0x3f
 [<ffffffff802638bf>] out_of_line_wait_on_bit+0x6c/0x78
 [<ffffffff8029c638>] wake_bit_function+0x0/0x23
 [<ffffffff880d5971>] :jbd:journal_commit_transaction+0x543/0x106a
 [<ffffffff80264931>] _spin_lock_irqsave+0x9/0x14
 [<ffffffff8023fa20>] lock_timer_base+0x1b/0x3c
 [<ffffffff8024d52c>] try_to_del_timer_sync+0x7f/0x88
 [<ffffffff880d95a3>] :jbd:kjournald+0xc1/0x213
 [<ffffffff8029c60a>] autoremove_wake_function+0x0/0x2e
 [<ffffffff8029c3f2>] keventd_create_kthread+0x0/0xc4
 [<ffffffff880d94e2>] :jbd:kjournald+0x0/0x213
 [<ffffffff8029c3f2>] keventd_create_kthread+0x0/0xc4
 [<ffffffff80233e47>] kthread+0xfe/0x132
 [<ffffffff80260b2c>] child_rip+0xa/0x12
 [<ffffffff8029c3f2>] keventd_create_kthread+0x0/0xc4
 [<ffffffff80233d49>] kthread+0x0/0x132
 [<ffffffff80260b22>] child_rip+0x0/0x12

INFO: task crond:1758 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
crond D 00217387317ee73e 0 1758 1 1773 1744 (NOTLB)
 ffff8803bfeb9ca8 0000000000000286 ffff8803be858094 0000000000000000
 000000000000000a ffff8803bf52d7a0 ffff880013bbb080 00000000000181ef
 ffff8803bf52d988 ffffffff80264931
Call Trace:
 [<ffffffff80264931>] _spin_lock_irqsave+0x9/0x14
 [<ffffffff80215caa>] sync_buffer+0x0/0x3f
 [<ffffffff80215caa>] sync_buffer+0x0/0x3f
 [<ffffffff802635f9>] io_schedule+0x3f/0x67
 [<ffffffff80215ce5>] sync_buffer+0x3b/0x3f
 [<ffffffff8026373d>] __wait_on_bit_lock+0x36/0x66
 [<ffffffff80215caa>] sync_buffer+0x0/0x3f
 [<ffffffff802637d9>] out_of_line_wait_on_bit_lock+0x6c/0x78
 [<ffffffff8029c638>] wake_bit_function+0x0/0x23
 [<ffffffff880d3812>] :jbd:do_get_write_access+0x58/0x4fc
 [<ffffffff8021a068>] __getblk+0x25/0x236
 [<ffffffff880d3cd8>] :jbd:journal_get_write_access+0x22/0x33
 [<ffffffff880efcbc>] :ext3:ext3_reserve_inode_write+0x38/0x90
 [<ffffffff880efd35>] :ext3:ext3_mark_inode_dirty+0x21/0x3c
 [<ffffffff880f2c1c>] :ext3:ext3_dirty_inode+0x63/0x7b
 [<ffffffff8021452e>] __mark_inode_dirty+0x29/0x16e
 [<ffffffff80225f67>] filldir+0x0/0xb7
 [<ffffffff802369ed>] vfs_readdir+0x8c/0xa9
 [<ffffffff8023a20b>] sys_getdents+0x75/0xbd
 [<ffffffff80260295>] tracesys+0x47/0xb6
 [<ffffffff802602f9>] tracesys+0xab/0xb6
aajn

aajn

2011-03-08 21:14

reporter   ~0012499

the same

Linux xxx.xx 2.6.18-194.32.1.el5 #1 SMP Wed Jan 5 17:52:25 EST 2011 x86_64 x86_64 x86_64 GNU/Linux

Mar 8 18:05:18 web kernel: INFO: task kjournald:2292 blocked for more than 120 seconds.
Mar 8 18:05:18 web kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 8 18:05:18 web kernel: kjournald D ffff810107a9aa48 0 2292 97 2558 2289 (L-TLB)
Mar 8 18:05:18 web kernel: ffff810228e53cf0 0000000000000046 0000000000000086 ffffffff8003db0d
Mar 8 18:05:18 web kernel: 0000000000000000 000000000000000a ffff81022b8a4860 ffff81001c442080
Mar 8 18:05:18 web kernel: 00010bc0714505f7 000000000000124f ffff81022b8a4a48 000000022fa7a3b8
Mar 8 18:05:18 web kernel: Call Trace:
Mar 8 18:05:18 web kernel: [<ffffffff8003db0d>] lock_timer_base+0x1b/0x3c
Mar 8 18:05:18 web kernel: [<ffffffff8006e1d7>] do_gettimeofday+0x40/0x90
Mar 8 18:05:18 web kernel: [<ffffffff8005a866>] getnstimeofday+0x10/0x28
Mar 8 18:05:18 web kernel: [<ffffffff80015505>] sync_buffer+0x0/0x3f
Mar 8 18:05:18 web kernel: [<ffffffff800637ea>] io_schedule+0x3f/0x67
Mar 8 18:05:18 web kernel: [<ffffffff80015540>] sync_buffer+0x3b/0x3f
Mar 8 18:05:18 web kernel: [<ffffffff80063a16>] __wait_on_bit+0x40/0x6e
Mar 8 18:05:18 web kernel: [<ffffffff80015505>] sync_buffer+0x0/0x3f
Mar 8 18:05:18 web kernel: [<ffffffff80063ab0>] out_of_line_wait_on_bit+0x6c/0x78
Mar 8 18:05:18 web kernel: [<ffffffff800a0b8d>] wake_bit_function+0x0/0x23
Mar 8 18:05:18 web kernel: [<ffffffff880339a5>] :jbd:journal_commit_transaction+0x543/0x1062
Mar 8 18:05:18 web kernel: [<ffffffff8003db0d>] lock_timer_base+0x1b/0x3c
Mar 8 18:05:18 web kernel: [<ffffffff8004b1bc>] try_to_del_timer_sync+0x7f/0x88
Mar 8 18:05:18 web kernel: [<ffffffff880375cf>] :jbd:kjournald+0xc1/0x213
Mar 8 18:05:18 web kernel: [<ffffffff800a0b5f>] autoremove_wake_function+0x0/0x2e
Mar 8 18:05:18 web kernel: [<ffffffff800a0947>] keventd_create_kthread+0x0/0xc4
Mar 8 18:05:18 web kernel: [<ffffffff8803750e>] :jbd:kjournald+0x0/0x213
Mar 8 18:05:18 web kernel: [<ffffffff800a0947>] keventd_create_kthread+0x0/0xc4
Mar 8 18:05:18 web kernel: [<ffffffff8003296e>] kthread+0xfe/0x132
Mar 8 18:05:18 web kernel: [<ffffffff8005dfb1>] child_rip+0xa/0x11
Mar 8 18:05:18 web kernel: [<ffffffff800a0947>] keventd_create_kthread+0x0/0xc4
Mar 8 18:05:18 web kernel: [<ffffffff80032870>] kthread+0x0/0x132
Mar 8 18:05:18 web kernel: [<ffffffff8005dfa7>] child_rip+0x0/0x11
Mar 8 18:05:18 web kernel:
Mar 8 18:05:18 web kernel: INFO: task tcprules:23733 blocked for more than 120 seconds.
Mar 8 18:05:18 web kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 8 18:05:18 web kernel: tcprules D ffff81022b8a4860 0 23733 23732 (NOTLB)
Mar 8 18:05:18 web kernel: ffff81013140bd88 0000000000000086 000000023140bd98 0000000000000001
Mar 8 18:05:18 web kernel: 000000023140bd18 0000000000000007 ffff8102158fa7e0 ffff81022b8a4860
Mar 8 18:05:18 web kernel: 00010bbf3d825a82 0000000000001104 ffff8102158fa9c8 0000000200000096
Mar 8 18:05:18 web kernel: Call Trace:
Mar 8 18:05:18 web kernel: [<ffffffff88036d86>] :jbd:log_wait_commit+0xa3/0xf5
Mar 8 18:05:18 web kernel: [<ffffffff800a0b5f>] autoremove_wake_function+0x0/0x2e
Mar 8 18:05:18 web kernel: [<ffffffff8803178a>] :jbd:journal_stop+0x1cf/0x1ff
Mar 8 18:05:18 web kernel: [<ffffffff8002fcd2>] __writeback_single_inode+0x1e9/0x328
Mar 8 18:05:18 web kernel: [<ffffffff80066b88>] do_page_fault+0x4fe/0x874
Mar 8 18:05:18 web kernel: [<ffffffff800f3575>] sync_inode+0x24/0x33
Mar 8 18:05:18 web kernel: [<ffffffff8804c36d>] :ext3:ext3_sync_file+0xc9/0xdc
Mar 8 18:05:18 web kernel: [<ffffffff80050246>] do_fsync+0x52/0xa4
Mar 8 18:05:18 web kernel: [<ffffffff800e11ad>] __do_fsync+0x23/0x36
Mar 8 18:05:18 web kernel: [<ffffffff8005d28d>] tracesys+0xd5/0xe0
Mar 8 18:05:18 web kernel:
Mar 8 18:05:18 web kernel: INFO: task tcprules:23754 blocked for more than 120 seconds.
Mar 8 18:05:18 web kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 8 18:05:18 web kernel: tcprules D ffff810001015120 0 23754 23745 (NOTLB)
Mar 8 18:05:18 web kernel: ffff8100c8af5d88 0000000000000086 ffff8100c8af5d98 ffffffff80062ff8
Mar 8 18:05:18 web kernel: ffff8100c8af5d18 0000000000000008 ffff810216e13080 ffff81022ff27080
Mar 8 18:05:18 web kernel: 00010bc448f0102d 000000000000128c ffff810216e13268 000000020017fc2e
Mar 8 18:05:18 web kernel: Call Trace:
Mar 8 18:05:18 web kernel: [<ffffffff80062ff8>] thread_return+0x62/0xfe
Mar 8 18:05:18 web kernel: [<ffffffff88036d86>] :jbd:log_wait_commit+0xa3/0xf5
Mar 8 18:05:18 web kernel: [<ffffffff800a0b5f>] autoremove_wake_function+0x0/0x2e
Mar 8 18:05:18 web kernel: [<ffffffff8803178a>] :jbd:journal_stop+0x1cf/0x1ff
Mar 8 18:05:18 web kernel: [<ffffffff8002fcd2>] __writeback_single_inode+0x1e9/0x328
Mar 8 18:05:18 web kernel: [<ffffffff80066b88>] do_page_fault+0x4fe/0x874
Mar 8 18:05:18 web kernel: [<ffffffff800f3575>] sync_inode+0x24/0x33
Mar 8 18:05:18 web kernel: [<ffffffff8804c36d>] :ext3:ext3_sync_file+0xc9/0xdc
Mar 8 18:05:18 web kernel: [<ffffffff80050246>] do_fsync+0x52/0xa4
Mar 8 18:05:18 web kernel: [<ffffffff800e11ad>] __do_fsync+0x23/0x36
Mar 8 18:05:18 web kernel: [<ffffffff8005d28d>] tracesys+0xd5/0xe0
Mar 8 18:05:18 web kernel:
hola

hola

2011-04-26 17:17

reporter   ~0012675

Any news on this? I am having the same issue on ASL kernel in Centos
Linux 2.6.32.28-1.art.x86_64 #1 SMP Mon Feb 14 11:06:49 EST 2011 x86_64 x86_64 x86_64 GNU/Linux
and on kernel-2.6.18-194.26.1.el5 standard Centos.
Really odd to see a bug that old not solved yet.
darklca

darklca

2011-05-24 19:09

reporter   ~0012778

Well i will refresh this topic :) same here...
2.6.18-238.9.1.el5

May 22 04:02:09 test syslogd 1.4.1: restart.
May 22 04:40:52 test kernel: INFO: task qemu-kvm:1576 blocked for more than 120 seconds.
May 22 04:40:52 test kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 22 04:40:52 test kernel: qemu-kvm D ffffffff8015389d 0 1576 1 1766 1528 (NOTLB)
May 22 04:40:52 test kernel: ffff81032ca73b98 0000000000000086 0000000000000400 ffffffff8001c41d
May 22 04:40:52 test kernel: ffff810283dec240 000000000000000a ffff81033f3d20c0 ffff8101bc1487e0
May 22 04:40:52 test kernel: 000037be7ef9876a 0000000000004f6d ffff81033f3d22a8 00000005fffffff4
May 22 04:40:52 test kernel: Call Trace:
May 22 04:40:52 test kernel: [<ffffffff8001c41d>] generic_make_request+0x211/0x228
May 22 04:40:52 test kernel: [<ffffffff8006ec4e>] do_gettimeofday+0x40/0x90
May 22 04:40:52 test kernel: [<ffffffff8005a9f9>] getnstimeofday+0x10/0x28
May 22 04:40:52 test kernel: [<ffffffff800637ca>] io_schedule+0x3f/0x67
May 22 04:40:52 test kernel: [<ffffffff800f74cd>] __blockdev_direct_IO+0x8da/0xa82
May 22 04:42:24 test kernel: [<ffffffff800e7fc3>] blkdev_direct_IO+0x32/0x37
May 22 04:42:24 test kernel: [<ffffffff800e7efb>] blkdev_get_blocks+0x0/0x96
May 22 04:42:24 test kernel: [<ffffffff800c8791>] generic_file_direct_IO+0x108/0x122
May 22 04:42:24 test kernel: [<ffffffff8001f0d6>] generic_file_direct_write+0x60/0xf2
May 22 04:42:24 test kernel: [<ffffffff800166b8>] __generic_file_aio_write_nolock+0x2b8/0x3b6
May 22 04:42:24 test kernel: [<ffffffff8003e5cd>] do_futex+0x2c2/0xce3
May 22 04:42:24 test kernel: [<ffffffff80047148>] try_to_wake_up+0x472/0x484
May 22 04:42:24 test kernel: [<ffffffff800c8916>] generic_file_aio_write_nolock+0x20/0x6c
May 22 04:42:24 test kernel: [<ffffffff800c8ced>] generic_file_write_nolock+0x8f/0xa8
May 22 04:42:24 test kernel: [<ffffffff800a28f3>] autoremove_wake_function+0x0/0x2e
May 22 04:42:24 test kernel: [<ffffffff8009b656>] kill_proc_info+0x48/0x61
May 22 04:42:24 test kernel: [<ffffffff800e6d77>] blkdev_file_write+0x1a/0x1f
May 22 04:42:24 test kernel: [<ffffffff80016b71>] vfs_write+0xce/0x174
May 22 04:42:24 test kernel: [<ffffffff80044174>] sys_pwrite64+0x50/0x70
May 22 04:42:24 test kernel: [<ffffffff8005d116>] system_call+0x7e/0x83
nadaoneal

nadaoneal

2011-06-20 16:47

reporter   ~0012834

I'm still seeing this on HP BL360c machines with the latest HP firmware and drivers, and CentOS 5.6 / 2.6.18-238.9.1.el5xen .

I just took another look at some RedHat bugs and was able to see that some people can fix this, on certain HP raids, just by updating the firmware and driver:
https://bugzilla.redhat.com/show_bug.cgi?id=615543
(see comment 28+ especially)... but this is only working for some of the raids, not all of them.

Also, I've gotten some comments on my blog suggesting that people with Dell hardware are getting the same thing, and for them a downgrade to 2.6.18-194 fixes the issue, though personally I still had the issue with 2.6.18-194xen kernels. The only fix that seems to work for me, with an HP hardware raid, is setting the scheduler on both the dom0 and the domU to noop, and capping /proc/sys/dev/raid/speed_limit_max to 50K, again on both.
aajn

aajn

2011-06-20 18:39

reporter   ~0012835

i have set the io scheduler to noop

echo noop > /sys/block/cciss\!c0d0/queue/scheduler
echo noop > /sys/block/cciss\!c0d1/queue/scheduler
echo noop > /sys/block/cciss\!c0d2/queue/scheduler

uptime record: 20:38:22 up 28 days, 21:18,

:-)

but is there a possible way to put the io scheduler value into a config file ?
nadaoneal

nadaoneal

2011-06-20 19:23

reporter   ~0012836

Yep, edit /etc/grub.conf and put elevator=noop at the end of the kernel line, like so:

title CentOS (2.6.18-238.12.1.el5xen)
        root (hd0,0)
        kernel /vmlinuz-2.6.18-238.12.1.el5xen ro root=/dev/VolGroup00/LogVol00 elevator=noop
        initrd /initrd-2.6.18-238.12.1.el5xen.img

... then you will also be using noop when you reboot.
aajn

aajn

2011-06-20 21:55

reporter   ~0012837

thanx nadaoneal!
seanfulton

seanfulton

2011-07-09 11:54

reporter   ~0012919

I believe I have this issue as well. Relevant info is that it started for me with the 194 kernels, and I am having the problem on non-HP/non-Dell hardware in both software raid1 configurations and single-drive configurations. We are running OpenVZ and so I thought it was an OpenVZ issue until I saw this thread. We have been using both deadline and cfq schedulers.

The way to reproduce is to try to duplicate a large mysql database with:

mysqldump largedb | mysql newlargedb

Pushes the system load to more than 100 (8-core server) and blocks all IO. Also get a bunch of error messages like the original poster with tasks being blocked for more than 300 seconds (not 120).

As I say, this started for me last fall, and at the time I noted that it did not happen with kernels prior to 194. It has seemed to kind of come and go with releases. On OpenVZ, some releases work fine, others trigger the lock-up, which is what led me to believe it was OpenVZ related.

sean
antimatter

antimatter

2011-09-14 18:19

reporter   ~0013260

I am seeing this issue on openvz with kernel 2.6.18-238.

2.6.18-238.19.1.el5.028stab092.2 #1 SMP Thu Jul 21 19:23:22 MSD 2011 x86_64
vzctl version 3.0.28.3

Sep 13 01:48:01 kernel: INFO: task vzctl:217710 blocked for more than 300 seconds.
Sep 13 01:48:01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 13 01:48:01 kernel: vzctl D ffff8103090da920 0 217710 217709 (NOTLB)
Sep 13 01:48:01 kernel: ffff8103c1215cc8 0000000000000086 00042e9d02d2eab9 00097975d48e894e
Sep 13 01:48:01 kernel: ffff8103090da920 ffffffff80316b20 00042e9d02d2ecff 00097975d48e8e7c
Sep 13 01:48:01 kernel: ffff8103090dab28 00000000c1215c78 0000000000000000 ffffffff8049f280
Sep 13 01:48:01 kernel: Call Trace:
Sep 13 01:48:01 kernel: [<ffffffff800646dd>] wait_for_completion+0x79/0xa2
Sep 13 01:48:01 kernel: [<ffffffff8008b605>] default_wake_function+0x0/0xe
Sep 13 01:48:01 kernel: [<ffffffff8008df50>] vsched_mvpr+0x6b/0x9b
Sep 13 01:48:01 kernel: [<ffffffff800a156b>] synchronize_rcu+0x30/0x36
Sep 13 01:48:01 kernel: [<ffffffff800a123a>] wakeme_after_rcu+0x0/0x9
Sep 13 01:48:01 kernel: [<ffffffff884e4831>] :vzmon:__ve_move_task+0xf8/0x1a9
Sep 13 01:48:01 kernel: [<ffffffff884e48eb>] :vzmon:ve_move_task+0x9/0x93
Sep 13 01:48:01 kernel: [<ffffffff884e5dbf>] :vzmon:real_env_create+0x1d8/0x24d
Sep 13 01:48:01 kernel: [<ffffffff884e606e>] :vzmon:vzcalls_ioctl+0x23a/0x71e
Sep 13 01:48:01 kernel: [<ffffffff8002daa2>] mntput_no_expire+0x19/0x81
Sep 13 01:48:01 kernel: [<ffffffff884d716f>] :vzdev:vzctl_ioctl+0x34/0x50
Sep 13 01:48:01 kernel: [<ffffffff80044319>] do_ioctl+0x21/0x6b
Sep 13 01:48:01 kernel: [<ffffffff80031842>] vfs_ioctl+0x457/0x4b9
Sep 13 01:48:01 kernel: [<ffffffff800c3571>] audit_syscall_entry+0x1a4/0x1cf
Sep 13 01:48:01 kernel: [<ffffffff8004f100>] sys_ioctl+0x3c/0x5c
Sep 13 01:48:01 kernel: [<ffffffff800602dd>] tracesys+0xd5/0xe0

The message repeats several times for various running services and then the logs stop as the server hung.

Has happened during the overnight backup process a couple of times in the last month since an upgrade to this kernel / vzctl version.

The previous kernel before this 2.6.18-164.15.1.el5.028stab068.9 did not have this issue.
paszczak000

paszczak000

2012-03-13 15:59

reporter   ~0014667

I have the same issue at virtual and phisical machines:

2.6.18-238.9.1.el5 (RH 5.6)
2.6.18-194.11.4.el5xen (RH 5.8)

I've read that to fix this problem it is recommended to use noop scheduler. But it won't fix a problem.


BTW: https://bugzilla.redhat.com/show_bug.cgi?id=605444
Status: CLOSED NOTABUG
michaeleconomy

michaeleconomy

2012-03-23 16:53

reporter   ~0014723

I'm having this issue:



Mar 23 04:32:55 db2 kernel: INFO: task mysqld:7149 blocked for more than 120 seconds.
Mar 23 07:05:01 db2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 23 07:05:01 db2 kernel: mysqld D ffffffff801563dc 0 7149 7099 7150 7148 (NOTLB)
Mar 23 07:05:01 db2 kernel: ffff812838fc7d28 0000000000000086 ffff81183ffbc190 ffff812838fc7d98
Mar 23 07:05:01 db2 kernel: ffff812838fc7e08 0000000000000009 ffff81283fde87a0 ffff8118783c37a0
Mar 23 07:05:01 db2 kernel: 000026a68589352a 00000000000074da ffff81283fde8988 0000000b8588d054
Mar 23 07:05:01 db2 kernel: Call Trace:
Mar 23 07:05:01 db2 kernel: [<ffffffff80063c63>] __mutex_lock_slowpath+0x60/0x9b
Mar 23 07:05:01 db2 kernel: [<ffffffff80063cad>] .text.lock.mutex+0xf/0x14
Mar 23 07:05:01 db2 kernel: [<ffffffff80021874>] generic_file_aio_write+0x50/0xc3
Mar 23 07:05:01 db2 kernel: [<ffffffff8804c1c2>] :ext3:ext3_file_write+0x16/0x91
Mar 23 07:05:01 db2 kernel: [<ffffffff800183c0>] do_sync_write+0xc7/0x104
Mar 23 07:05:01 db2 kernel: [<ffffffff800a34a8>] autoremove_wake_function+0x0/0x2e
Mar 23 07:05:01 db2 kernel: [<ffffffff80016b3d>] vfs_write+0xce/0x174
Mar 23 07:05:01 db2 kernel: [<ffffffff80043be0>] sys_pwrite64+0x50/0x70
Mar 23 07:05:01 db2 kernel: [<ffffffff8005d229>] tracesys+0x71/0xe0
Mar 23 07:05:01 db2 kernel: [<ffffffff8005d28d>] tracesys+0xd5/0xe0





replicated with 2.6.18-308.1.1.el5, 2.6.18-238.19.1.el5, and 2.6.18-238.el5



:(
rusini

rusini

2012-03-26 15:42

reporter   ~0014736

I'm afraid to seem a bit pessimistic, I saw these symptoms reported all over there, maybe the cause is different but for me it shows up:

1. With any io scheduler
2. With any filesystem, tried ext3, etx4, xfs
3. With different combinations of SAS controller, RAID and disks behind the scenes
4. With journaling on a separate spindle
5. When the system cannot catch up with io, just have hit the issue when creating 12 LVM snapshots (yeah, I'd rather need ZFS...)

With 12 snapshots, the io transfer rate is not so bad - up to 6 MB/s, but it blocks eventually with async io mode. When the FS is mounted sync, everything is ok, but of course it may be slow depending on the buffer size used in userspace.

Strangely, when I mount that volume via NFS, just to take advantage of cache on the client side, the application on the client side is hard blocked anyway.

Anyone from Linux kernel team, will this (rather tricky) issue be ever fixed?
rusini

rusini

2012-03-26 15:44

reporter   ~0014737

Of course, this does not seem like kernel release/version-specific, at least for CentOS 5.x.
michaeleconomy

michaeleconomy

2012-03-26 19:00

reporter   ~0014741

Ok, I figured out my problem, it wasn't a centos problem.

I had a few ram chips not properly installed, so they system was running with like 30gb less memory than it should have been. Since i didn't notice, mysql was the configured to use WAY to much memory (innodb_buffer_pool_size).


Sorry for misreporting, but i hope that helps someone out there!
foggy

foggy

2012-04-13 11:55

reporter   ~0014829

Hi,
I have same problem.
My server has a panic.
It begins when appears high load on disks.
it's last stable kernel for Centos5.8
I installed least stable kernel for Centos5.8.
But problem exists yet.
I read many pages about this problem, and understand only - problem don't solved

INFO: task nfsd:6619 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
nfsd D 0000000000000002 0 6619 1 6615 6613 (L-TLB)
 ffff81032abafb10 0000000000000046 ffff81062f189928 ffffffff80152f55
 0000000000000000 0000000000000009 ffff81032ea9f7a0 ffff81032f4ac820
 000006faba8e7377 000000000001e54b ffff81032ea9f988 0000000280148e7a
Call Trace:
 [<ffffffff80152f55>] cfq_dispatch_requests+0x332/0x639
 [<ffffffff8006ecd9>] do_gettimeofday+0x40/0x90
 [<ffffffff8005a3ad>] getnstimeofday+0x10/0x29
 [<ffffffff80028bcb>] sync_page+0x0/0x43
 [<ffffffff800637de>] io_schedule+0x3f/0x67
 [<ffffffff80028c09>] sync_page+0x3e/0x43
 [<ffffffff80063922>] __wait_on_bit_lock+0x36/0x66
 [<ffffffff8003f99b>] __lock_page+0x5e/0x64
 [<ffffffff800a34d6>] wake_bit_function+0x0/0x23
 [<ffffffff8000c439>] do_generic_mapping_read+0x1df/0x359
 [<ffffffff88861eb4>] :nfsd:nfsd_read_actor+0x0/0xed

Apr 13 14:55:24 ss01 kernel: sd 1:2:6:0: megasas: RESET -18026589 cmd=2a retries=0
Apr 13 14:55:24 ss01 kernel: megaraid_sas: HBA reset handler invoked without an internal reset condition.
Apr 13 14:55:24 ss01 kernel: megasas: [ 0]waiting for 2 commands to complete
Apr 13 14:55:24 ss01 kernel: megaraid_sas: no more pending commands remain after reset handling.
Apr 13 14:55:24 ss01 kernel: megasas: reset successful

uname -a
Linux ss01 2.6.18-308.1.1.el5 #1 SMP Wed Mar 7 04:16:51 EST 2012 x86_64 x86_64 x86_64 GNU/Linux
foggy

foggy

2012-05-05 06:53

reporter   ~0015025

I have solved problem.
I have downgraded my kernel to 2.6.18-194.32.1
it helped.
Nobody from Redhat and CentOS didn't write it yet, but they know about it.
Have a nice day :)
giesen

giesen

2012-06-05 18:42

reporter   ~0015218

I too have this problem on a HP DL160 G6 that I recently upgraded from 5.5 to 5.8, running kernel 308.8.1.el5 (x86_64).

Receiving the following messages:

INFO: task httpd:28903 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
httpd D ffff81000100caa0 0 28903 9235 29111 28735 (NOTLB)
 ffff81001e51bd28 0000000000000082 ffff81011e73b2a8 ffff810020294378
 ffff810020294378 0000000000000009 ffff81011e73b0c0 ffff810104709100
 000212b93652edd1 000000000000a007 ffff81011e73b2a8 000000018002b44d Call Trace:
 [<ffffffff80063c63>] __mutex_lock_slowpath+0x60/0x9b [<ffffffff8003f980>] __lock_page+0x5e/0x64 [<ffffffff80063cad>] .text.lock.mutex+0xf/0x14 [<ffffffff800d5bab>] swap_info_get+0x6a/0xa3 [<ffffffff8002185b>] generic_file_aio_write+0x50/0xc3 [<ffffffff8804c1c2>] :ext3:ext3_file_write+0x16/0x91 [<ffffffff800183ac>] do_sync_write+0xc7/0x104 [<ffffffff80067235>] do_page_fault+0x4cc/0x842 [<ffffffff800a34a7>] autoremove_wake_function+0x0/0x2e [<ffffffff80033902>] do_setitimer+0x5b4/0x689 [<ffffffff8009c5f3>] recalc_sigpending_and_wake+0x9/0x1a
 [<ffffffff80028eef>] do_sigaction+0x14d/0x199 [<ffffffff80016b29>] vfs_write+0xce/0x174 [<ffffffff800173f2>] sys_write+0x45/0x6e [<ffffffff8005d28d>] tracesys+0xd5/0xe0


I've changed the scheduler from cfq to noop, and updated the firmware on the GB0500EAFYL drives from version HPG2 to HPG4. We'll see if the issue reoccurs.
giesen

giesen

2012-06-05 18:43

reporter   ~0015219

Just a quick note that I'm using the B110i controller (which does use not the cciss module), and running mdraid (not the HP fake raid).
giesen

giesen

2012-06-06 00:44

reporter   ~0015220

Looks like changing the scheduler had no effect, as the box just crashed again...

Also, it looks like my firmware update for the hard drives didn't take, on more careful examination of the release notes the updater only works through a P series RAID controller, not the B110i.
itmcontact

itmcontact

2012-07-06 01:03

reporter   ~0015370

I have the same problem and I had tried with different kernels and also change the main board, add RAM, etc. but issue persist.

Linux 2.6.18-308.8.2.el5PAE
Centos5.8

INFO: task syslogd:2465 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syslogd D 000032D4 2064 2465 1 2468 2066 (NOTLB)
       f7ba6ed0 00000082 95feac36 000032d4 000032d1 0000000e 00000000 00000009
       f7debaa0 95feb09d 000032d4 00000467 00000002 f7debbac c5410588 f7a58740
       c56e9068 00000000 00000000 f7ba6ecc c041f0c8 00000000 00000000 ffffffff
Call Trace:
 [<c041f0c8>] __wake_up+0x2a/0x3d
 [<c04374f3>] prepare_to_wait+0x24/0x46
 [<f886a2e5>] log_wait_commit+0x80/0xc7 [jbd]
 [<c04373a7>] autoremove_wake_function+0x0/0x2d
 [<f8865679>] journal_stop+0x196/0x1bb [jbd]
 [<c0495f86>] __writeback_single_inode+0x199/0x2a5
 [<c045d76c>] do_writepages+0x2b/0x32
 [<c045923f>] __filemap_fdatawrite_range+0x66/0x72
 [<c0496624>] sync_inode+0x19/0x24
 [<f889d019>] ext3_sync_file+0xb1/0xdc [ext3]
 [<c0479065>] do_fsync+0x41/0x83
 [<c04790c4>] __do_fsync+0x1d/0x2b
 [<c0404ee1>] sysenter_past_esp+0x56/0x79
 =======================
INFO: task MailScanner:1510 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
MailScanner D 000032D4 2348 1510 2983 4147 (NOTLB)
       dc5a1f18 00000086 9654d1d6 000032d4 000032d4 00000002 00000000 00000007
       f06ecaa0 9654d3fe 000032d4 00000228 00000000 f06ecbac c5402900 f796dac0
       c56e9068 00000000 00000001 dc5a1f14 c041f0c8 00000000 00000000 ffffffff
Call Trace:
 [<c041f0c8>] __wake_up+0x2a/0x3d
 [<f886a2e5>] log_wait_commit+0x80/0xc7 [jbd]
 [<c04373a7>] autoremove_wake_function+0x0/0x2d
 [<f8865679>] journal_stop+0x196/0x1bb [jbd]
 [<f889cfe8>] ext3_sync_file+0x80/0xdc [ext3]
 [<c06237e2>] mutex_lock+0xb/0x19
 [<c0479065>] do_fsync+0x41/0x83
 [<c04790c4>] __do_fsync+0x1d/0x2b
 [<c0404ee1>] sysenter_past_esp+0x56/0x79
 =======================
INFO: task pop3:11309 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
pop3 D 000032D4 2720 11309 2883 11344 2886 (NOTLB)
       e87a0ebc 00000082 9654ce1b 000032d4 000032d4 0000000e 00000000 00000007
       f76b1550 9654d1d6 000032d4 000003bb 00000000 f76b165c c5402900 f7880740
       c56e9068 00000000 00000000 e87a0eb8 c041f0c8 00000000 00000000 00000003
Call Trace:
 [<c041f0c8>] __wake_up+0x2a/0x3d
 [<c04374f3>] prepare_to_wait+0x24/0x46
 [<f886a2e5>] log_wait_commit+0x80/0xc7 [jbd]
 [<c04373a7>] autoremove_wake_function+0x0/0x2d
 [<f8865679>] journal_stop+0x196/0x1bb [jbd]
 [<c0495f86>] __writeback_single_inode+0x199/0x2a5
 [<c045d76c>] do_writepages+0x2b/0x32
 [<c045923f>] __filemap_fdatawrite_range+0x66/0x72
 [<c0496624>] sync_inode+0x19/0x24
 [<f889d019>] ext3_sync_file+0xb1/0xdc [ext3]
 [<c0479065>] do_fsync+0x41/0x83
 [<c046a388>] sys_msync+0x100/0x16c
 [<c0404ee1>] sysenter_past_esp+0x56/0x79
 =======================
INFO: task pop3:11344 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
pop3 D 000032D4 2612 11344 2883 11349 11309 (NOTLB)
       e78f4ebc 00000082 96130418 000032d4 000032d2 0000000e 00000000 00000007
       d83c1000 961309f8 000032d4 000005e0 00000003 d83c110c c54173cc f7a15580
       c56e9068 00000000 00000000 e78f4eb8 c041f0c8 00000000 c5417410 ffffffff
Call Trace:
 [<c041f0c8>] __wake_up+0x2a/0x3d
 [<c04374f3>] prepare_to_wait+0x24/0x46
 [<f886a2e5>] log_wait_commit+0x80/0xc7 [jbd]
 [<c04373a7>] autoremove_wake_function+0x0/0x2d
 [<f8865679>] journal_stop+0x196/0x1bb [jbd]
 [<c0495f86>] __writeback_single_inode+0x199/0x2a5
 [<c045d76c>] do_writepages+0x2b/0x32
 [<c045923f>] __filemap_fdatawrite_range+0x66/0x72
 [<c0496624>] sync_inode+0x19/0x24
 [<f889d019>] ext3_sync_file+0xb1/0xdc [ext3]
 [<c0479065>] do_fsync+0x41/0x83
 [<c046a388>] sys_msync+0x100/0x16c
 [<c0404ee1>] sysenter_past_esp+0x56/0x79
 =======================
INFO: task pop3:11349 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
pop3 D 000032D4 2720 11349 2883 11377 11344 (NOTLB)
       d67f7ebc 00000086 961b3ed1 000032d4 000032d2 0000000e 00000000 00000007
       f76da000 961b42bd 000032d4 000003ec 00000002 f76da10c c5410588 c55b1200
       c56e9068 00000000 00000001 d67f7eb8 c041f0c8 00000000 00000000 ffffffff
Call Trace:
 [<c041f0c8>] __wake_up+0x2a/0x3d
 [<f886a2e5>] log_wait_commit+0x80/0xc7 [jbd]
 [<c04373a7>] autoremove_wake_function+0x0/0x2d
 [<f8865679>] journal_stop+0x196/0x1bb [jbd]
 [<c0495f86>] __writeback_single_inode+0x199/0x2a5
 [<c045d76c>] do_writepages+0x2b/0x32
 [<c045923f>] __filemap_fdatawrite_range+0x66/0x72
 [<c0496624>] sync_inode+0x19/0x24
 [<f889d019>] ext3_sync_file+0xb1/0xdc [ext3]
 [<c0479065>] do_fsync+0x41/0x83
 [<c046a388>] sys_msync+0x100/0x16c
 [<c0404ee1>] sysenter_past_esp+0x56/0x79
 =======================
INFO: task pop3:11377 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
pop3 D 000032D4 2580 11377 2883 11467 11349 (NOTLB)
       e8480ebc 00000082 95fead47 000032d4 000032d3 0000000e 00000000 00000009
       f75a1550 95feb385 000032d4 0000063e 00000003 f75a165c c54173cc ebe65580
       c56e9068 00000000 00000000 e8480eb8 c041f0c8 00000000 00000000 ffffffff
Call Trace:
 [<c041f0c8>] __wake_up+0x2a/0x3d
 [<c04374f3>] prepare_to_wait+0x24/0x46
 [<f886a2e5>] log_wait_commit+0x80/0xc7 [jbd]
 [<c04373a7>] autoremove_wake_function+0x0/0x2d
 [<f8865679>] journal_stop+0x196/0x1bb [jbd]
 [<c0495f86>] __writeback_single_inode+0x199/0x2a5
 [<c045d76c>] do_writepages+0x2b/0x32
 [<c045923f>] __filemap_fdatawrite_range+0x66/0x72
 [<c0496624>] sync_inode+0x19/0x24
 [<f889d019>] ext3_sync_file+0xb1/0xdc [ext3]
 [<c0479065>] do_fsync+0x41/0x83
 [<c046a388>] sys_msync+0x100/0x16c
 [<c0404ee1>] sysenter_past_esp+0x56/0x79
 =======================
INFO: task MailScanner:12610 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
MailScanner D 00003475 2212 12610 2983 15390 (NOTLB)
       edeaff38 00000086 cbe31ab2 00003475 00000000 edeaff3c c0484434 00000009
       f7f05000 cca34b76 00003475 00c030c4 00000001 f7f0510c c5409744 f7b2b900
       eeafb0c0 00000000 c0475404 c5633dc0 f02cccc8 eeafb0c0 f88ac200 ffffffff
Call Trace:
 [<c0484434>] may_open+0x125/0x207
 [<c0475404>] __dentry_open+0xea/0x1ab
 [<c0623907>] __mutex_lock_slowpath+0x4d/0x7c
 [<c0623945>] .text.lock.mutex+0xf/0x14
 [<c047732c>] generic_file_llseek+0x2a/0xd2
 [<c0477302>] generic_file_llseek+0x0/0xd2
 [<c0476645>] vfs_llseek+0x30/0x34
 [<c0477535>] sys_llseek+0x43/0x84
 [<c0487a45>] sys_ioctl+0x48/0x5f
 [<c0404ee1>] sysenter_past_esp+0x56/0x79
 =======================
INFO: task MailScanner:15390 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
MailScanner D 00003475 2160 15390 2983 23439 12610 (NOTLB)
       dbd52f38 00000086 31bac104 00003475 00000000 dbd52f3c c0484434 00000009
       f76ce550 31f201b9 00003475 003740b5 00000001 f76ce65c c5409744 f75c8200
       f78e9740 00000000 c0475404 c5633dc0 f02cccc8 f78e9740 f88ac200 ffffffff
Call Trace:
 [<c0484434>] may_open+0x125/0x207
 [<c0475404>] __dentry_open+0xea/0x1ab
 [<c0623907>] __mutex_lock_slowpath+0x4d/0x7c
 [<c0623945>] .text.lock.mutex+0xf/0x14
 [<c047732c>] generic_file_llseek+0x2a/0xd2
 [<c0477302>] generic_file_llseek+0x0/0xd2
 [<c0476645>] vfs_llseek+0x30/0x34
 [<c0477535>] sys_llseek+0x43/0x84
 [<c0404ee1>] sysenter_past_esp+0x56/0x79
 =======================
INFO: task MailScanner:23439 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
MailScanner D 00003474 2348 23439 2983 29689 15390 (NOTLB)
       ed793f38 00000082 ae822723 00003474 00000000 ed793f3c c0484434 00000007
       f7b60000 ae8d2ff4 00003474 000b08d1 00000003 f7b6010c c54173cc c5621740
       e9dbc840 00000000 c0475404 c5633dc0 f02cccc8 e9dbc840 f88ac200 ffffffff
Call Trace:
 [<c0484434>] may_open+0x125/0x207
 [<c0475404>] __dentry_open+0xea/0x1ab
 [<c0623907>] __mutex_lock_slowpath+0x4d/0x7c
 [<c0623945>] .text.lock.mutex+0xf/0x14
 [<c047732c>] generic_file_llseek+0x2a/0xd2
 [<c0477302>] generic_file_llseek+0x0/0xd2
 [<c0476645>] vfs_llseek+0x30/0x34
 [<c0477535>] sys_llseek+0x43/0x84
 [<c0487a45>] sys_ioctl+0x48/0x5f
 [<c0404ee1>] sysenter_past_esp+0x56/0x79
 =======================
INFO: task MailScanner:31602 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
MailScanner D 00003475 2444 31602 2983 31848 29689 (NOTLB)
       dcfe6f38 00000082 227bb357 00003475 00000000 dcfe6f3c c0484434 0000000a
       f7ad5000 2286c0f1 00003475 000b0d9a 00000000 f7ad510c c5402900 f755e580
       db9e79c0 00000000 c0475404 c5633dc0 f02cccc8 db9e79c0 f88ac200 ffffffff
Call Trace:
 [<c0484434>] may_open+0x125/0x207
 [<c0475404>] __dentry_open+0xea/0x1ab
 [<c0623907>] __mutex_lock_slowpath+0x4d/0x7c
 [<c0623945>] .text.lock.mutex+0xf/0x14
 [<c047732c>] generic_file_llseek+0x2a/0xd2
 [<c0477302>] generic_file_llseek+0x0/0xd2
 [<c0476645>] vfs_llseek+0x30/0x34
 [<c0477535>] sys_llseek+0x43/0x84
 [<c0487a45>] sys_ioctl+0x48/0x5f
 [<c0404ee1>] sysenter_past_esp+0x56/0x79
paolo_losi

paolo_losi

2012-09-30 14:13

reporter   ~0015860

Same here. kernel version Linux oracle2.enuan.com x86_64

INFO: task syslogd:1657 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syslogd D ffff8100d1aa7390 0 1657 1 1660 1644 (NOTLB)
 ffff810114659d98 0000000000000082 ffff810114659da8 ffffffff80063002
 0000000000000000 0000000000000009 ffff81011d2337e0 ffff8100d1e80100
 00000170e40a90bf 000000000000189d ffff81011d2339c8 0000000000001082
Call Trace:
 [<ffffffff80063002>] thread_return+0x62/0xfe
 [<ffffffff88036e61>] :jbd:log_wait_commit+0xa3/0xf5
 [<ffffffff800a34ab>] autoremove_wake_function+0x0/0x2e
 [<ffffffff880317a1>] :jbd:journal_stop+0x1d3/0x203
 [<ffffffff8002fa35>] __writeback_single_inode+0x1dd/0x31c
 [<ffffffff800e4121>] do_readv_writev+0x26e/0x291
 [<ffffffff800f7619>] sync_inode+0x24/0x33
 [<ffffffff8804c37e>] :ext3:ext3_sync_file+0xce/0xf8
 [<ffffffff8004ffba>] do_fsync+0x52/0xa4
 [<ffffffff800e49ae>] __do_fsync+0x23/0x36
 [<ffffffff8005d28d>] tracesys+0xd5/0xe0
toracat

toracat

2012-10-02 16:59

manager   ~0015872

Would any of you be able to try the kernel from kernel.org and see if the issue is resolved there? This test can be done by installing kernel-ml from ELRepo:

http://elrepo.org/tiki/kernel-ml

Because it installs in parallel with the existing kernel, it's easy to install/uninstall without any conflicts. Currently the kernel version is 3.0.43 for EL5.
dunch

dunch

2013-01-07 17:17

reporter   ~0016245

Hi Folks,
Is there any sign of a resolution for this issue? Has anyone had any results with the newer kernel?
We're seeing this issue on a number of customer sites, all HP Blades, all running Oracle.


> INFO: task oracle:31350 blocked for more than 120 seconds
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> oracle D ffff81085c13b7a0 0 31350 1 31352 31348 (NOTLB)
> ffff8110b539dcb8 0000000000000082 ffff81091bbe72c8 ffffffff880baa3d
> 0000000000000007 0000000000000009 ffff8110c80b8860 ffff81203fe76080
> 00025dc85dc14470 000000000028e9e7 ffff8110c80b8a48 000000123c654088
> Call Trace:
> [<ffffffff880baa3d>] :cciss:do_cciss_request+0x32/0x4ed
> [<ffffffff8006ece7>] do_gettimeofday+0x40/0x90 [<ffffffff800637de>]
> io_schedule+0x3f/0x67 [<ffffffff800f900d>]
> __blockdev_direct_IO+0x8d5/0xa88 [<ffffffff800e944e>]
> blkdev_direct_IO+0x32/0x37 [<ffffffff800e9386>]
> blkdev_get_blocks+0x0/0x96 [<ffffffff8000c657>]
> __generic_file_aio_read+0xb8/0x198
> [<ffffffff800c9eca>] generic_file_read+0xac/0xc5
> [<ffffffff8001ce98>] vma_link+0xd0/0xfd [<ffffffff800a34b7>]
> autoremove_wake_function+0x0/0x2e [<ffffffff8000e283>]
> do_mmap_pgoff+0x615/0x780 [<ffffffff800ba777>]
> audit_syscall_entry+0x1a8/0x1d3 [<ffffffff8000b721>]
> vfs_read+0xcb/0x171 [<ffffffff800136e6>] sys_pread64+0x50/0x70
> [<ffffffff8005d229>] tracesys+0x71/0xe0 [<ffffffff8005d28d>]
> tracesys+0xd5/0xe0 {noformat}
toracat

toracat

2013-01-07 17:44

manager   ~0016246

Looks like in some (if not all) cases, firmware upgrade seems to help fix the issue. Please try if you are not at the latest version.
jonry

jonry

2013-02-17 18:17

reporter   ~0016500

Hi & thanks to toracat!

I got this issue when I upgraded to CentOS 6. I thought it was related to a disk or my SW raid or something - I had to restart server every 2-3 days or so. - Before this I had my server running on CentOS 5 for years without hiccups, but I had compiled my own kernel, so that's probably why it worked.

I followed your suggestion and installed the kernel from kernel.org. Current "uname -r":
     3.7.4-1.el6.elrepo.x86_64

It's been running 22 days without any hiccups now, which to me means that it works... - Thanks a lot for the suggestion toracat - I'm now 100% convinced that it is the kernel.
toracat

toracat

2013-02-17 18:42

manager   ~0016501

@jonry

Good hear that the mainline kernel from kernel.org (by using ELRepo's kernel-ml) has fixed the issue for you. But that is CentOS 6.

For whose who are running CentOS 5 may want to try kernel-lt ( http://elrepo.org/tiki/kernel-lt ) to test the latest stable kernel from kernel.org.
jonry

jonry

2013-02-18 08:58

reporter   ~0016502

@toracat

Ahh, yes, you're right that I'm now on CentOS 6, but my HW has not changed since I ran CentOS 5, and the error messages I got in CentOS 6 led me to this post - more or less exactly the same errors as reported here.

AND before, when I had this HW running for years on CentOS 5, it was running on a kernel from kernel.org (although I then had compiled it on my own, and not downloaded it through ELRepo).

Your suggestion worked on CentOS 6, and the issue was the same, so I just thought it was relevant information. - And btw. I'm not on any HP HW.

So anyway: Thanks!

Issue History

Date Modified Username Field Change
2010-09-20 04:45 aajn New Issue
2010-09-27 01:44 Spock Note Added: 0011893
2010-10-05 13:47 intel352 Note Added: 0011904
2010-10-05 15:53 toracat Note Added: 0011906
2010-10-05 18:08 intel352 Note Added: 0011907
2010-11-01 09:53 aajn Note Added: 0011993
2010-11-06 22:32 robinbowes Note Added: 0012025
2011-01-07 05:40 dhzh Note Added: 0012297
2011-01-26 13:05 devurandom Note Added: 0012371
2011-03-02 04:28 ngochai Note Added: 0012465
2011-03-08 21:14 aajn Note Added: 0012499
2011-04-26 17:17 hola Note Added: 0012675
2011-05-24 19:09 darklca Note Added: 0012778
2011-06-20 16:47 nadaoneal Note Added: 0012834
2011-06-20 18:39 aajn Note Added: 0012835
2011-06-20 19:23 nadaoneal Note Added: 0012836
2011-06-20 21:55 aajn Note Added: 0012837
2011-07-09 11:54 seanfulton Note Added: 0012919
2011-09-14 18:19 antimatter Note Added: 0013260
2012-03-13 15:59 paszczak000 Note Added: 0014667
2012-03-23 16:53 michaeleconomy Note Added: 0014723
2012-03-26 15:42 rusini Note Added: 0014736
2012-03-26 15:44 rusini Note Added: 0014737
2012-03-26 19:00 michaeleconomy Note Added: 0014741
2012-04-13 11:55 foggy Note Added: 0014829
2012-05-05 06:53 foggy Note Added: 0015025
2012-06-05 18:42 giesen Note Added: 0015218
2012-06-05 18:43 giesen Note Added: 0015219
2012-06-06 00:44 giesen Note Added: 0015220
2012-07-06 01:03 itmcontact Note Added: 0015370
2012-09-30 14:13 paolo_losi Note Added: 0015860
2012-10-02 16:59 toracat Note Added: 0015872
2013-01-07 17:17 dunch Note Added: 0016245
2013-01-07 17:44 toracat Note Added: 0016246
2013-02-17 18:17 jonry Note Added: 0016500
2013-02-17 18:42 toracat Note Added: 0016501
2013-02-18 08:58 jonry Note Added: 0016502