CentOS Bug Tracker
CentOS Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0005823CentOS-6kernelpublic2012-07-12 09:082014-05-19 15:30
Reporteryounik_1 
PriorityhighSeveritycrashReproducibilityalways
StatusnewResolutionopen 
PlatformNEC EXPRESS 6800 120 RH-1OSCentOSOS Version6.3
Product Version6.3 
Target VersionFixed in Version 
Summary0005823: BUG: soft lockup - CPU#1 stuck for 67s! [scsi_eh_8:334]
DescriptionWhen connecting DELL TL2000 tape library (with SAS cable and serial attached SCSI controller Adaptec ASC-1405 Unified Serial HBA), the system (kernel 2.6.32-279.1.1.el6.x86_64) hang up.


Jul 11 19:23:25 backup92a kernel: BUG: soft lockup - CPU#1 stuck for 67s! [scsi_eh_8:334]
Jul 11 19:23:25 backup92a kernel: Modules linked in: ch osst st autofs4 bnx2fc fcoe 8021q libfcoe garp stp libfc llc scsi_transport_fc scsi_tgt sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables i2c_i801 i2c_core sg iTCO_wdt iTCO_vendor_support i5000_edac edac_core i5k_amb ioatdma dca shpchp ext4 mbcache jbd2 sd_mod crc_t10dif e1000e megaraid_sas sr_mod cdrom mvsas libsas scsi_transport_sas ahci pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod be2iscsi bnx2i cnic uio ipv6 cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi [last unloaded: scsi_wait_scan]
Jul 11 19:23:25 backup92a kernel: CPU 1
Jul 11 19:23:25 backup92a kernel: Modules linked in: ch osst st autofs4 bnx2fc fcoe 8021q libfcoe garp stp libfc llc scsi_transport_fc scsi_tgt sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables i2c_i801 i2c_core sg iTCO_wdt iTCO_vendor_support i5000_edac edac_core i5k_amb ioatdma dca shpchp ext4 mbcache jbd2 sd_mod crc_t10dif e1000e megaraid_sas sr_mod cdrom mvsas libsas scsi_transport_sas ahci pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod be2iscsi bnx2i cnic uio ipv6 cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi [last unloaded: scsi_wait_scan]
Jul 11 19:23:25 backup92a kernel:
Jul 11 19:23:25 backup92a kernel: Pid: 334, comm: scsi_eh_8 Not tainted 2.6.32-279.1.1.el6.x86_64 0000001 NEC Express5800/120Rh-1 [N8100-1387E]/MS-9192-01S
Jul 11 19:23:25 backup92a kernel: RIP: 0010:[<ffffffff81500128>] [<ffffffff81500128>] _spin_lock+0x18/0x30
Jul 11 19:23:25 backup92a kernel: RSP: 0018:ffff880125f29c70 EFLAGS: 00000202
Jul 11 19:23:25 backup92a kernel: RAX: 0000000000000044 RBX: ffff880125f29c70 RCX: 000000000000797c
Jul 11 19:23:25 backup92a kernel: RDX: 0000000000000045 RSI: 0000000000000046 RDI: ffff880125f40008
Jul 11 19:23:25 backup92a kernel: RBP: ffffffff8100bc0e R08: 00000000000101ce R09: 00000000ffffffff
Jul 11 19:23:25 backup92a kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff880125f29c40
Jul 11 19:23:25 backup92a kernel: R13: 0000000000000000 R14: ffff880127523a00 R15: ffff880125489000
Jul 11 19:23:25 backup92a kernel: FS: 0000000000000000(0000) GS:ffff880028300000(0000) knlGS:0000000000000000
Jul 11 19:23:25 backup92a kernel: CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
Jul 11 19:23:25 backup92a kernel: CR2: 00007feab404fc30 CR3: 0000000125bf4000 CR4: 00000000000006e0
Jul 11 19:23:25 backup92a kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jul 11 19:23:25 backup92a kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Jul 11 19:23:25 backup92a kernel: Process scsi_eh_8 (pid: 334, threadinfo ffff880125f28000, task ffff880125d96080)
Jul 11 19:23:25 backup92a kernel: Stack:
Jul 11 19:23:25 backup92a kernel: ffff880125f29ce0 ffffffffa0222f8a 0000000000000060 ffff880125534bd8
Jul 11 19:23:25 backup92a kernel: <d> 0000000000000000 0000000000000001 ffff880125f514e0 0000000075792f95
Jul 11 19:23:25 backup92a kernel: <d> ffff8801268971c0 0000000000000000 ffff880125534bc0 ffff880125f40000
Jul 11 19:23:25 backup92a kernel: Call Trace:
Jul 11 19:23:25 backup92a kernel: [<ffffffffa0222f8a>] ? mvs_slot_complete+0x17a/0x4f0 [mvsas]
Jul 11 19:23:25 backup92a kernel: [<ffffffffa0224e44>] ? mvs_abort_task_set+0x194/0x1d0 [mvsas]
Jul 11 19:23:25 backup92a kernel: [<ffffffff810096f0>] ? __switch_to+0xd0/0x320
Jul 11 19:23:25 backup92a kernel: [<ffffffffa0161a39>] ? sas_scsi_recover_host+0x389/0xdd0 [libsas]
Jul 11 19:23:25 backup92a kernel: [<ffffffff8136945a>] ? scsi_error_handler+0x13a/0x6d0
Jul 11 19:23:25 backup92a kernel: [<ffffffff81369320>] ? scsi_error_handler+0x0/0x6d0
Jul 11 19:23:25 backup92a kernel: [<ffffffff81091d66>] ? kthread+0x96/0xa0
Jul 11 19:23:25 backup92a kernel: [<ffffffff8100c14a>] ? child_rip+0xa/0x20
Jul 11 19:23:25 backup92a kernel: [<ffffffff81091cd0>] ? kthread+0x0/0xa0
Jul 11 19:23:25 backup92a kernel: [<ffffffff8100c140>] ? child_rip+0x0/0x20
Jul 11 19:23:25 backup92a kernel: Code: 44 00 00 f0 81 2f 00 00 00 01 74 05 e8 e2 e3 d7 ff c9 c3 55 48 89 e5 0f 1f 44 00 00 b8 00 00 01 00 f0 0f c1 07 0f b7 d0 c1 e8 10 <39> c2 74 0e f3 90 0f b7 17 eb f5 83 3f 00 75 f4 eb df c9 c3 0f
Jul 11 19:23:25 backup92a kernel: Call Trace:
Jul 11 19:23:25 backup92a kernel: [<ffffffffa0222f8a>] ? mvs_slot_complete+0x17a/0x4f0 [mvsas]
Jul 11 19:23:25 backup92a kernel: [<ffffffffa0224e44>] ? mvs_abort_task_set+0x194/0x1d0 [mvsas]
Jul 11 19:23:25 backup92a kernel: [<ffffffff810096f0>] ? __switch_to+0xd0/0x320
Jul 11 19:23:25 backup92a kernel: [<ffffffffa0161a39>] ? sas_scsi_recover_host+0x389/0xdd0 [libsas]
Jul 11 19:23:25 backup92a kernel: [<ffffffff8136945a>] ? scsi_error_handler+0x13a/0x6d0
Jul 11 19:23:25 backup92a kernel: [<ffffffff81369320>] ? scsi_error_handler+0x0/0x6d0
Jul 11 19:23:25 backup92a kernel: [<ffffffff81091d66>] ? kthread+0x96/0xa0
Jul 11 19:23:25 backup92a kernel: [<ffffffff8100c14a>] ? child_rip+0xa/0x20
Jul 11 19:23:25 backup92a kernel: [<ffffffff81091cd0>] ? kthread+0x0/0xa0
Jul 11 19:23:25 backup92a kernel: [<ffffffff8100c140>] ? child_rip+0x0/0x20
Jul 11 19:23:38 backup92a kernel: INFO: task scsi_id:1632 blocked for more than 120 seconds.
TagsNo tags attached.
Attached Files

- Relationships
related to 0005945newIssue Tracker BUG: soft lockup CPU#X stuck for XXs! [NetworkManager,auditd,rcpbind,irqbalance,ip6tables] 

-  Notes
(0015737)
msz59 (reporter)
2012-09-04 12:07

I got same problem on a Supermicro X7DCL-3 motherboard system with additional Marvell 88SE6440 SAS controller for an IBM LTO-4 tape drive. I was unable to install CentOS 6.3, getting those Soft Lockups and CPU stucks during booting process off Installation DVD. It worked fine with 6.2 version (2.6.32-220.el6.x86_64 kernel). When system was updated (including 2.6.32-279.5.2.el6.x86_64 kernel), it hung at first attempt to use the tape drive, giving similar messages on-screen:
BUG: soft lockup - CPU#4 stuck for 67s! [scsi_eh_7:XXX]
I rebooted with the old kernel and everything seems to work fine again.
(0015758)
vacantserver (reporter)
2012-09-08 17:38

I am having a similar problem with a Dell 2650 after upgrade to 6.3.

System boots up OK, but ALWAYS crashes on reboot or shutdown.
BUG: soft lockup - CPU#X stuck for XXs!

Will file my own report after turning on console log...
(0015957)
byrnejb (reporter)
2012-10-18 19:26

We have encountered this problem on a ComPaq D510 running CentOS-5.8. This system is used as a fax server and since 2012-Oct-02 we have had four incidents where the server has become non-responsive. On the first three occasions we were unable to determine the cause and so rebooted to clear the problem, On today's incident we instead shut down the external fax modems and the system returned to normal operation.

dmesg showed us this:

BUG: soft lockup - CPU#0 stuck for 60s! [watchdog/0:4]

Pid: 4, comm: watchdog/0
EIP: 0060:[<c05606fd>] CPU: 0
EIP is at serial8250_interrupt+0x8b/0xdd
 EFLAGS: 00000246 Not tainted (2.6.18-308.16.1.el5 0000001)
EAX: c08555f8 EBX: c08555f8 ECX: 00000246 EDX: 00000200
ESI: c0854dd8 EDI: c0855694 EBP: 00000246 DS: 007b ES: 007b
CR0: 8005003b CR2: 0a354fc8 CR3: 32a64000 CR4: 000006d0
 [<c04507f1>] handle_IRQ_event+0x45/0x8c
 [<c0450900>] __do_IRQ+0xc8/0x118
 [<c0450838>] __do_IRQ+0x0/0x118
 [<c04074d8>] do_IRQ+0x9b/0xc3
 [<c040597a>] common_interrupt+0x1a/0x20
 [<c042e5da>] run_timer_softirq+0x11f/0x1d2
 [<f8d69eab>] death_by_timeout+0x0/0x63 [ip_conntrack]
 [<c042a96d>] __do_softirq+0x87/0x114
 [<c04073f9>] do_softirq+0x4e/0x92
 [<c0450838>] __do_IRQ+0x0/0x118
 [<c04074f4>] do_IRQ+0xb7/0xc3
 [<c040597a>] common_interrupt+0x1a/0x20
 [<c062406b>] schedule+0x9e7/0xa57
 [<c040597a>] common_interrupt+0x1a/0x20
 [<c045040c>] watchdog+0x0/0x58
 [<c045040c>] watchdog+0x0/0x58
 [<c0450457>] watchdog+0x4b/0x58
 [<c0436f22>] kthread+0xc0/0xee
 [<c0436e62>] kthread+0x0/0xee
 [<c0405c87>] kernel_thread_helper+0x7/0x10
 =======================

Linux inet01.hamilton.harte-lyne.ca 2.6.18-308.16.1.el5 0000001 SMP Tue Oct 2 22:01:37 EDT 2012 i686 i686 i386 GNU/Linux

The date of the first recorded incident corresponds fairly closely with the most recent kernel update. Coincidence?
(0015975)
Tobias Braeutigam (reporter)
2012-10-22 07:41

I get this bug too. Here's my setup:

Windows 7 pro (V 6.1 Build 7601) with SP1, x64
Virtualbox 4.2.0 r80737 x64, 4096MB RAM, VBOXADDITIONS_4.2.0_80737, .vdi on SATA
Centos 6.3 x64

$uname -a
Linux orcl.syntegris06.syntegris.de 2.6.32-279.11.1.el6.x86_64 0000001 SMP
 
$less /var/log/messages
...
Oct 19 12:24:33 localhost kernel: BUG: soft lockup - CPU#1 stuck for 67s! [gnome-panel:2250]
Oct 19 12:24:33 localhost kernel: Modules linked in: nls_utf8 fuse vboxvideo(U) drm vboxsf(U) autofs4 sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 ppdev parport_pc parport microcode sg i2c_piix4 i2c_core vboxguest(U) e1000 ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif pata_acpi ata_generic ata_piix ahci dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib]
Oct 19 12:24:33 localhost kernel: CPU 1
Oct 19 12:24:33 localhost kernel: Modules linked in: nls_utf8 fuse vboxvideo(U) drm vboxsf(U) autofs4 sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 ppdev parport_pc parport microcode sg i2c_piix4 i2c_core vboxguest(U) e1000 ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif pata_acpi ata_generic ata_piix ahci dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib]
Oct 19 12:24:33 localhost kernel:
Oct 19 12:24:33 localhost kernel: Pid: 2250, comm: gnome-panel Not tainted 2.6.32-279.11.1.el6.x86_64 0000001 innotek GmbH VirtualBox/VirtualBox
Oct 19 12:24:33 localhost kernel: RIP: 0010:[<ffffffff815006ee>] [<ffffffff815006ee>] _spin_lock+0x1e/0x30
Oct 19 12:24:33 localhost kernel: RSP: 0018:ffff88011b5f3e28 EFLAGS: 00000202
Oct 19 12:24:33 localhost kernel: RAX: 0000000000000007 RBX: ffff88011b5f3e28 RCX: 000000000000825f
Oct 19 12:24:33 localhost kernel: RDX: 0000000000000008 RSI: ffff88005c63f3d8 RDI: ffff88007defd6f8
Oct 19 12:24:33 localhost kernel: RBP: ffffffff8100bc0e R08: ffff88011b034ae0 R09: 0000000000000000
Oct 19 12:24:33 localhost kernel: R10: 0000000000000000 R11: 0000000000000002 R12: 000000000000825f
Oct 19 12:24:33 localhost kernel: R13: 0000000000060006 R14: ffff88011b034ae0 R15: 0000000000000000
Oct 19 12:24:33 localhost kernel: FS: 00007f70d649d940(0000) GS:ffff880028300000(0000) knlGS:0000000000000000
Oct 19 12:24:33 localhost kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 19 12:24:33 localhost kernel: CR2: 00007f74b14a6000 CR3: 000000011a843000 CR4: 00000000000006e0
Oct 19 12:24:33 localhost kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Oct 19 12:24:33 localhost kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Oct 19 12:24:33 localhost kernel: Process gnome-panel (pid: 2250, threadinfo ffff88011b5f2000, task ffff88011a86b500)
Oct 19 12:24:33 localhost kernel: Stack:
Oct 19 12:24:33 localhost kernel: ffff88011b5f3ee8 ffffffff811bd012 0000000000000000 0000000000000000
Oct 19 12:24:33 localhost kernel: <d> ffff88011a86b500 ffff88011ba92780 00000000019097c0 ffff88011b034ad8
Oct 19 12:24:33 localhost kernel: <d> 0000000000000400 ffff88011b034aa8 0000000000000000 ffff88011a86b500
Oct 19 12:24:33 localhost kernel: Call Trace:
Oct 19 12:24:33 localhost kernel: [<ffffffff811bd012>] ? inotify_read+0x1a2/0x330
Oct 19 12:24:33 localhost kernel: [<ffffffff81092160>] ? autoremove_wake_function+0x0/0x40
Oct 19 12:24:33 localhost kernel: [<ffffffff81213476>] ? security_file_permission+0x16/0x20
Oct 19 12:24:33 localhost kernel: [<ffffffff8117bc25>] ? vfs_read+0xb5/0x1a0
Oct 19 12:24:33 localhost kernel: [<ffffffff810d6cf2>] ? audit_syscall_entry+0x272/0x2a0
Oct 19 12:24:33 localhost kernel: [<ffffffff8117bd61>] ? sys_read+0x51/0x90
Oct 19 12:24:33 localhost kernel: [<ffffffff8100b0f2>] ? system_call_fastpath+0x16/0x1b
Oct 19 12:24:33 localhost kernel: Code: 00 00 00 01 74 05 e8 52 e1 d7 ff c9 c3 55 48 89 e5 0f 1f 44 00 00 b8 00 00 01 00 f0 0f c1 07 0f b7 d0 c1 e8 10 39 c2 74 0e f3 90 <0f> b7 17 eb f5 83 3f 00 75 f4 eb df c9 c3 0f 1f 40 00 55 48 89
Oct 19 12:24:33 localhost kernel: Call Trace:
Oct 19 12:24:33 localhost kernel: [<ffffffff811bd012>] ? inotify_read+0x1a2/0x330
Oct 19 12:24:33 localhost kernel: [<ffffffff81092160>] ? autoremove_wake_function+0x0/0x40
Oct 19 12:24:33 localhost kernel: [<ffffffff81213476>] ? security_file_permission+0x16/0x20
Oct 19 12:24:33 localhost kernel: [<ffffffff8117bc25>] ? vfs_read+0xb5/0x1a0
Oct 19 12:24:33 localhost kernel: [<ffffffff810d6cf2>] ? audit_syscall_entry+0x272/0x2a0
Oct 19 12:24:33 localhost kernel: [<ffffffff8117bd61>] ? sys_read+0x51/0x90
Oct 19 12:24:33 localhost kernel: [<ffffffff8100b0f2>] ? system_call_fastpath+0x16/0x1b
Oct 19 12:24:34 localhost abrtd: Directory 'oops-2012-10-19-12:24:34-1837-0' creation detected
Oct 19 12:24:34 localhost abrt-dump-oops: Reported 1 kernel oopses to Abrt
Oct 19 12:24:34 localhost abrtd: Can't open file '/var/spool/abrt/oops-2012-10-19-12:24:34-1837-0/uid': Datei oder Verzeichnis nicht gefunden
Oct 19 12:25:57 localhost kernel: BUG: soft lockup - CPU#1 stuck for 67s! [gnome-panel:2250]
...

The lockups occur in an unpredictable pattern.
(0019768)
Jason Laprade (reporter)
2014-05-19 15:30

This particular problem seems to have multiple sources. We can see that in younik_1's report, he is dealing with:
"DELL TL2000 tape library (with SAS cable and serial attached SCSI controller Adaptec ASC-1405 Unified Serial HBA"

msz59's report has a "SAS controller for an IBM LTO-4 tape drive"'

vacantserver and byrnejb do not mention specifically whether or not they have a tape drive or not.

Tobias Braeutigam is using VirtualBox, therefore a tape device is unlikely.

In my environment I have seen this issue three times. In all three cases the factors that were common were:

A) Adaptec ASC-1405 Unified Serial HBA
B) LTO 4 or LTO 5 Half Height Tape drive (aka ULT3580-HH5 or ULT3580-HH4 or similar model)
C) Tape drive firmware using the Dell firmware A07, A08.
D) CentOS 6.x
E) The kernel panic messages that say:

kernel: [<ffffffffa0161a39>] ? sas_scsi_recover_host+0x389/0xdd0 [libsas]
kernel: [<ffffffff8136945a>] ? scsi_error_handler+0x13a/0x6d0
kernel: [<ffffffff81369320>] ? scsi_error_handler+0x0/0x6d0

F) Messages that say:

BUG: soft lockup - CPU#1 stuck for 67s! [scsi_eh_8:334]

I have not been able to test in every environment, but in one case I was able to resolve the issue by upgrading the firmware of the tape drive to the latest release. For Dell at least, this firmware is A10 and is marked as Urgent.

* Please Note: The Dell A10 firmware corresponds to a different actual firmware than is reported in dmesg. You will need to find the related firmware version that you are actually running.

The other solution has been to replace the SAS card with an LSI model card.

- Issue History
Date Modified Username Field Change
2012-07-12 09:08 younik_1 New Issue
2012-09-04 12:07 msz59 Note Added: 0015737
2012-09-08 17:38 vacantserver Note Added: 0015758
2012-09-09 19:41 tigalch Relationship added related to 0005945
2012-10-18 19:26 byrnejb Note Added: 0015957
2012-10-22 07:41 Tobias Braeutigam Note Added: 0015975
2014-05-19 15:30 Jason Laprade Note Added: 0019768


Copyright © 2000 - 2014 MantisBT Team
Powered by Mantis Bugtracker