CentOS Bug Tracker
CentOS Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0004018CentOS-5mdadmpublic2009-11-19 23:122010-02-15 17:07
Reporterguezim 
PrioritynormalSeveritymajorReproducibilityalways
StatusnewResolutionopen 
PlatformOSOS Version
Product Version5.4 
Target VersionFixed in Version 
Summary0004018: BUG: soft lockup - CPU#0 stuck for 10s! [md2_raid10:567]
DescriptionEvery 30 seconds I get following message in my messages file and dmesg.

BUG: soft lockup - CPU#0 stuck for 10s! [md2_raid10:567]
CPU 0:
Modules linked in: dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod shpchp ahci ata_piix libata sd_
mod scsi_mod raid10 ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 567, comm: md2_raid10 Not tainted 2.6.18-164.6.1.el5xen #1
RIP: e030:[<ffffffff8020be55>] [<ffffffff8020be55>] memcmp+0xf/0x22
RSP: e02b:ffff8801db1e1dc8 EFLAGS: 00000246
RAX: 0000000000000000 RBX: ffff8800001e5840 RCX: 0000000000000a7c
RDX: 0000000000000000 RSI: ffff8801daa16584 RDI: ffff880000fd4584
RBP: ffff8801dc330880 R08: ffff8801dc330918 R09: ffff880000fdf2c0
R10: ffff8801dc330880 R11: ffffffff80246ce3 R12: ffff880000fcfb40
R13: 0000000000000000 R14: 0000000000000001 R15: ffff8801db69b400
FS: 0000000000000000(0000) GS:ffffffff805ca000(0000) knlGS:0000000000000000
CS: e033 DS: 0000 ES: 0000

Call Trace:
 [<ffffffff8807653f>] :raid10:raid10d+0x226/0x9ba
 [<ffffffff8026082b>] error_exit+0x0/0x6e
 [<ffffffff8029b9ec>] keventd_create_kthread+0x0/0xc4
 [<ffffffff8029b9ec>] keventd_create_kthread+0x0/0xc4
 [<ffffffff8026361e>] schedule_timeout+0x1e/0xad
 [<ffffffff8029b9ec>] keventd_create_kthread+0x0/0xc4
 [<ffffffff80407389>] md_thread+0xf8/0x10e
 [<ffffffff8029bc04>] autoremove_wake_function+0x0/0x2e
 [<ffffffff80407291>] md_thread+0x0/0x10e
 [<ffffffff80233bcd>] kthread+0xfe/0x132
 [<ffffffff80260b2c>] child_rip+0xa/0x12
 [<ffffffff8029b9ec>] keventd_create_kthread+0x0/0xc4
 [<ffffffff80233acf>] kthread+0x0/0x132
 [<ffffffff80260b22>] child_rip+0x0/0x12
Additional InformationSystem
CPU: Intel core i5
RAM: 8 GB
Mainboard: Gigabyte GA-P55-UD3R
HD: 8x Samsung SpinPoint F3 1000GB, SATA II (HD103SJ)
Lan: Intel Quad Port Server Adapter PCIe (4x1000)

all disks are patitioned the same way:
[root@vserver log]# fdisk -l /dev/sda

Platte /dev/sda: 1000.2 GByte, 1000204886016 Byte
255 heads, 63 sectors/track, 121601 cylinders
Einheiten = Zylinder von 16065 × 512 = 8225280 Bytes

   Gerät boot. Anfang Ende Blöcke Id System
/dev/sda1 * 1 17 136521 fd Linux raid autodetect
/dev/sda2 18 267 2008125 fd Linux raid autodetect
/dev/sda3 268 121601 974615355 fd Linux raid autodetect

[root@vserver log]# cat /proc/mdstat
Personalities : [raid10] [raid1]
md0 : active raid1 sda1[0] sdb1[1] sdc1[2] sdd1[3] sde1[4] sdf1[5] sdg1[6] sdh1[7]
      136448 blocks [8/8] [UUUUUUUU]
      
md2 : active raid10 sdh3[7] sdg3[6] sdf3[5] sde3[4] sdd3[3] sdc3[2] sdb3[1] sda3[0]
      3898460928 blocks 4K chunks 2 near-copies [8/8] [UUUUUUUU]
      [>....................] resync = 2.5% (99184648/3898460928) finish=603.5min speed=104916K/sec
      
md1 : active raid10 sdh2[7] sdg2[6] sdf2[5] sde2[4] sdd2[3] sdc2[2] sdb2[1] sda2[0]
      8031232 blocks 256K chunks 2 near-copies [8/8] [UUUUUUUU]
      
unused devices: <none>

Load seems to be a bit high
top - 00:10:26 up 26 min, 1 user, load average: 6.01, 5.91, 4.60
Tasks: 137 total, 6 running, 131 sleeping, 0 stopped, 0 zombie
Cpu0 : 0.0%us, 95.3%sy, 0.0%ni, 4.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu1 : 0.0%us, 1.0%sy, 0.0%ni, 92.7%id, 6.0%wa, 0.0%hi, 0.0%si, 0.3%st
Cpu2 : 0.0%us, 14.3%sy, 0.0%ni, 85.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu3 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 7805952k total, 536304k used, 7269648k free, 34228k buffers
Swap: 0k total, 0k used, 0k free, 163368k cached

[root@vserver log]# grep BUG messages
Nov 19 23:37:37 vserver kernel: BUG: soft lockup - CPU#0 stuck for 10s! [md2_raid10:567]
Nov 19 23:37:37 vserver kernel: BUG: soft lockup - CPU#0 stuck for 10s! [md2_raid10:567]
Nov 19 23:37:37 vserver kernel: BUG: soft lockup - CPU#0 stuck for 10s! [md2_raid10:567]
Nov 19 23:37:37 vserver kernel: BUG: soft lockup - CPU#0 stuck for 10s! [md2_raid10:567]
Nov 19 23:37:37 vserver kernel: BUG: soft lockup - CPU#0 stuck for 10s! [md2_raid10:567]
Nov 19 23:37:37 vserver kernel: BUG: soft lockup - CPU#0 stuck for 10s! [md2_raid10:567]
Nov 19 23:37:37 vserver kernel: BUG: soft lockup - CPU#0 stuck for 10s! [md2_raid10:567]
Nov 19 23:37:37 vserver kernel: BUG: soft lockup - CPU#0 stuck for 10s! [md2_raid10:567]
Nov 19 23:37:37 vserver kernel: BUG: soft lockup - CPU#0 stuck for 10s! [md2_raid10:567]
Nov 19 23:37:37 vserver kernel: BUG: soft lockup - CPU#0 stuck for 10s! [md2_raid10:567]
Nov 19 23:37:37 vserver kernel: BUG: soft lockup - CPU#0 stuck for 10s! [md2_raid10:567]
Nov 19 23:37:37 vserver kernel: BUG: soft lockup - CPU#0 stuck for 10s! [md2_raid10:567]
Nov 19 23:46:39 vserver kernel: BUG: soft lockup - CPU#0 stuck for 10s! [md2_raid10:567]
Nov 19 23:46:39 vserver kernel: BUG: soft lockup - CPU#0 stuck for 10s! [md2_raid10:567]
Nov 19 23:51:34 vserver kernel: BUG: soft lockup - CPU#0 stuck for 10s! [md2_raid10:567]
Nov 19 23:52:34 vserver kernel: BUG: soft lockup - CPU#0 stuck for 10s! [md2_raid10:567]
Nov 19 23:52:45 vserver kernel: BUG: soft lockup - CPU#0 stuck for 10s! [md2_raid10:567]
Nov 19 23:53:58 vserver kernel: BUG: soft lockup - CPU#0 stuck for 10s! [swapper:0]
Nov 19 23:54:19 vserver kernel: BUG: soft lockup - CPU#0 stuck for 10s! [events/0:14]
Nov 19 23:54:39 vserver kernel: BUG: soft lockup - CPU#0 stuck for 10s! [migration/0:2]
Nov 19 23:54:57 vserver kernel: BUG: soft lockup - CPU#0 stuck for 10s! [events/0:14]
Nov 19 23:55:24 vserver kernel: BUG: soft lockup - CPU#0 stuck for 10s! [md2_raid10:567]
Nov 19 23:55:34 vserver kernel: BUG: soft lockup - CPU#0 stuck for 10s! [md2_raid10:567]
Nov 19 23:57:16 vserver kernel: BUG: soft lockup - CPU#0 stuck for 10s! [md2_raid10:567]
Nov 19 23:58:16 vserver kernel: BUG: soft lockup - CPU#0 stuck for 10s! [md2_raid10:567]
Nov 19 23:58:38 vserver kernel: BUG: soft lockup - CPU#0 stuck for 10s! [events/0:14]
Nov 19 23:59:54 vserver kernel: BUG: soft lockup - CPU#0 stuck for 10s! [kondemand/0:2652]
Nov 20 00:00:34 vserver kernel: BUG: soft lockup - CPU#0 stuck for 10s! [md2_raid10:567]
Nov 20 00:00:52 vserver kernel: BUG: soft lockup - CPU#0 stuck for 10s! [events/0:14]
Nov 20 00:01:20 vserver kernel: BUG: soft lockup - CPU#0 stuck for 10s! [md2_raid10:567]
Nov 20 00:01:34 vserver kernel: BUG: soft lockup - CPU#0 stuck for 10s! [md2_raid10:567]
Nov 20 00:01:54 vserver kernel: BUG: soft lockup - CPU#0 stuck for 10s! [md2_raid10:567]
Nov 20 00:02:04 vserver kernel: BUG: soft lockup - CPU#0 stuck for 10s! [md2_raid10:567]
Nov 20 00:02:27 vserver kernel: BUG: soft lockup - CPU#0 stuck for 10s! [events/0:14]
Nov 20 00:02:59 vserver kernel: BUG: soft lockup - CPU#0 stuck for 10s! [kblockd/0:27]
Nov 20 00:03:25 vserver kernel: BUG: soft lockup - CPU#0 stuck for 10s! [watchdog/0:4]
Nov 20 00:04:22 vserver kernel: BUG: soft lockup - CPU#0 stuck for 10s! [migration/0:2]
Nov 20 00:04:49 vserver kernel: BUG: soft lockup - CPU#0 stuck for 10s! [watchdog/0:4]
Nov 20 00:05:06 vserver kernel: BUG: soft lockup - CPU#0 stuck for 10s! [watchdog/0:4]
Nov 20 00:05:34 vserver kernel: BUG: soft lockup - CPU#0 stuck for 10s! [kondemand/0:2652]
Nov 20 00:05:49 vserver kernel: BUG: soft lockup - CPU#0 stuck for 10s! [swapper:0]
Nov 20 00:06:01 vserver kernel: BUG: soft lockup - CPU#0 stuck for 10s! [md2_raid10:567]
Nov 20 00:06:19 vserver kernel: BUG: soft lockup - CPU#0 stuck for 10s! [md2_raid10:567]
Nov 20 00:06:37 vserver kernel: BUG: soft lockup - CPU#0 stuck for 10s! [md2_raid10:567]
Nov 20 00:07:18 vserver kernel: BUG: soft lockup - CPU#0 stuck for 10s! [swapper:0]
Nov 20 00:07:57 vserver kernel: BUG: soft lockup - CPU#0 stuck for 10s! [kondemand/0:2652]
Nov 20 00:09:19 vserver kernel: BUG: soft lockup - CPU#0 stuck for 10s! [swapper:0]
Nov 20 00:09:39 vserver kernel: BUG: soft lockup - CPU#0 stuck for 10s! [swapper:0]
Nov 20 00:10:01 vserver kernel: BUG: soft lockup - CPU#0 stuck for 10s! [swapper:0]
Nov 20 00:10:41 vserver kernel: BUG: soft lockup - CPU#0 stuck for 10s! [md2_raid10:567]
Nov 20 00:11:22 vserver kernel: BUG: soft lockup - CPU#0 stuck for 10s! [events/0:14]
TagsNo tags attached.
Attached Files? file icon dmesg.bug [^] (38,519 bytes) 2009-11-19 23:12

- Relationships

-  Notes
(0010386)
guezim (reporter)
2009-11-20 07:42

I increased softlockup treshhold to 60 secs because I found that value in a running Ubuntu. There are no more BUG messages anymore.
echo 60 > /proc/sys/kernel/softlockup_thresh
(0010397)
t--om (reporter)
2009-11-23 08:02

Seeing this too with RAID1 in CentOS 5.4 (2.6.18-164.6.1.el5). It should be relatively harmless. Apparently fixed in later kernels by giving reschedule possibility during long memcmp in RAID code [1].

[1] http://neil.brown.name/git?p=md;a=commitdiff;h=1d9d52416c0445019ccc1f0fddb9a227456eb61b [^]
(0010398)
t--om (reporter)
2009-11-23 08:04

BUG: soft lockup - CPU#0 stuck for 10s! [md2_raid1:506]
CPU 0:
Modules linked in: ipt_MASQUERADE iptable_nat ip_nat bridge autofs4 lockd sunrpc r8169 mii ip_conntrack_netbios_ns ipt_REJECT xt_state ip_conntrack nfnetlink xt_tcpudp iptable_filter ip_tables x_tables ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic ipv6 xfrm_nalgo crypto_api uio cxgb3i cxgb3 8021q libiscsi_tcp libiscsi2 scsi_transport_iscsi2 scsi_transport_iscsi cpufreq_ondemand acpi_cpufreq freq_table dm_mirror dm_multipath scsi_dh video hwmon backlight sbs i2c_ec button battery asus_acpi acpi_memhotplug ac parport_pc lp parport ksm(U) kvm_intel(U) kvm(U) snd_hda_intel snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd_page_alloc snd_hwdep snd i2c_i801 soundcore i2c_core shpchp pcspkr serio_raw sg dm_raid45 dm_message dm_region_hash dm_log dm_mod dm_mem_cache usb_storage ahci libata sd_mod scsi_mod raid1 ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 506, comm: md2_raid1 Tainted: G 2.6.18-164.6.1.el5 #1
RIP: 0010:[<ffffffff8000b90f>] [<ffffffff8000b90f>] memcmp+0x14/0x22
RSP: 0000:ffff81022eb23d98 EFLAGS: 00000286
RAX: 0000000000000000 RBX: 000000000000000b RCX: 0000000000000014
RDX: 0000000000000000 RSI: ffff8101fa3a0fec RDI: ffff81020d429fed
RBP: 0000000000000000 R08: 000000020d429000 R09: ffff810001000208
R10: 0000000000000000 R11: 00000000000002ff R12: ffff81022fbe2a00
R13: 0000000000000000 R14: 0000000000000001 R15: ffff81022fbe2a08
FS: 0000000000000000(0000) GS:ffffffff803c1000(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000002a99c4f000 CR3: 0000000000201000 CR4: 00000000000026e0

Call Trace:
 [<ffffffff88076e47>] :raid1:raid1d+0x2c2/0xc6b
 [<ffffffff8009f9f0>] keventd_create_kthread+0x0/0xc4
 [<ffffffff8006380f>] schedule_timeout+0x1e/0xad
 [<ffffffff8009f9f0>] keventd_create_kthread+0x0/0xc4
 [<ffffffff8021590d>] md_thread+0xf8/0x10e
 [<ffffffff8009fc08>] autoremove_wake_function+0x0/0x2e
 [<ffffffff80215815>] md_thread+0x0/0x10e
 [<ffffffff8003297c>] kthread+0xfe/0x132
 [<ffffffff8005dfb1>] child_rip+0xa/0x11
 [<ffffffff8009f9f0>] keventd_create_kthread+0x0/0xc4
 [<ffffffff8003287e>] kthread+0x0/0x132
 [<ffffffff8005dfa7>] child_rip+0x0/0x11
(0010401)
t--om (reporter)
2009-11-23 08:25

This started happening after expanding system memory from 2GB to 8GB, maybe thus giving possibility to compare larger chunks of related RAID1 disks.
(0010907)
Marcuccio (reporter)
2010-02-01 11:01

Hi,
I have an almost identical issue on a centos 5.4 PC witch software raid 1 partitions.
I get the error during a resync operation (centos 5.4 has introduced a weekly scheduled 'check' of all the md partitions, the script is named '99-raid-check')

This is the message in the log, repeated several times:

--------------------
kernel: BUG: soft lockup - CPU#1 stuck for 10s! [md1_raid1:488]
kernel: CPU 1:
kernel: Modules linked in: autofs4 vmnet(U) vsock(FU) vmci(U) vmmon(U) lockd sunrpc ip_conntrack_netbios_ns ipt_REJECT xt_state ip_conntrack nfnetlink xt_tcpudp iptable_filter ip_tables x_tables cpufreq_ondemand acpi_cpufreq freq_table dm_multipath scsi_dh video hwmon backlight sbs i2c_ec button battery asus_acpi acpi_memhotplug ac parport_pc lp parport floppy sr_mod cdrom sg snd_hda_intel nvidia(PU) snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device pcspkr shpchp i2c_i801 i2c_core tpm_infineon e1000e tpm tpm_bios serio_raw snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd_page_alloc snd_hwdep snd soundcore dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod ahci libata sd_mod scsi_mod raid1 ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Jan 29 10:41:48 g328 kernel: Pid: 488, comm: md1_raid1 Tainted: PF 2.6.18-164.11.1.el5 #1
kernel: RIP: 0010:[<ffffffff8000b903>] [<ffffffff8000b903>] memcmp+0x8/0x22
kernel: RSP: 0018:ffff81012cacdd98 EFLAGS: 00000202
kernel: RAX: 0000000000000000 RBX: 0000000000000007 RCX: 0000000000000d32
kernel: RDX: 0000000000000000 RSI: ffff8101069372ce RDI: ffff81010d17f2ce
kernel: RBP: 0000000000000000 R08: 000000010d17f000 R09: ffff810001000108
kernel: R10: ffff81012cacdbc0 R11: 000000000000032a R12: ffff81012d3ea940
kernel: R13: 0000000000000000 R14: 0000000000000001 R15: ffff81012d3ea948
kernel: FS: 0000000000000000(0000) GS:ffff810104394840(0000) knlGS:0000000000000000
kernel: CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
kernel: CR2: 00002aab0263700a CR3: 0000000037f90000 CR4: 00000000000006e0
kernel:
kernel: Call Trace:
kernel: [<ffffffff88076e47>] :raid1:raid1d+0x2c2/0xc6b
kernel: [<ffffffff8009fe9f>] keventd_create_kthread+0x0/0xc4
kernel: [<ffffffff8006383f>] schedule_timeout+0x1e/0xad
kernel: [<ffffffff8009fe9f>] keventd_create_kthread+0x0/0xc4
kernel: [<ffffffff80216160>] md_thread+0xf8/0x10e
kernel: [<ffffffff800a00b7>] autoremove_wake_function+0x0/0x2e
kernel: [<ffffffff80216068>] md_thread+0x0/0x10e
kernel: [<ffffffff80032950>] kthread+0xfe/0x132
kernel: [<ffffffff8005dfb1>] child_rip+0xa/0x11
kernel: [<ffffffff8009fe9f>] keventd_create_kthread+0x0/0xc4
kernel: [<ffffffff80032852>] kthread+0x0/0x132
kernel: [<ffffffff8005dfa7>] child_rip+0x0/0x11
--------------------
(0011006)
simpfeld (reporter)
2010-02-15 17:07

We are seeing this issue now on a RH5.4 system. This was already an issue on Fedora several months back (which I saw on my home machine), really annoying that some back port to RH5 has introduced this bug without bringing the fix from Fedora with it.

This was a mailing list item about it:
http://kerneltrap.org/mailarchive/linux-raid/2009/9/21/6436583/thread [^]

This bug looks very similar :
https://bugzilla.redhat.com/show_bug.cgi?id=501075 [^]

I've not had a chance to test the test kernels and the RH bug reporter hasn't replied if it works.

Sadly I can't find the original Fedora bug that was tracking this problem on that RH could be directed to to fix in RH5.

- Issue History
Date Modified Username Field Change
2009-11-19 23:12 guezim New Issue
2009-11-19 23:12 guezim File Added: dmesg.bug
2009-11-20 07:42 guezim Note Added: 0010386
2009-11-23 08:02 t--om Note Added: 0010397
2009-11-23 08:04 t--om Note Added: 0010398
2009-11-23 08:25 t--om Note Added: 0010401
2010-02-01 11:01 Marcuccio Note Added: 0010907
2010-02-15 17:07 simpfeld Note Added: 0011006


Copyright © 2000 - 2014 MantisBT Team
Powered by Mantis Bugtracker