View Issue Details

IDProjectCategoryView StatusLast Update
0006675CentOS-6kernelpublic2013-09-20 09:56
ReporterWebCraft 
PriorityhighSeveritycrashReproducibilityalways
Status newResolutionopen 
PlatformSuperMicro SuperServer 5016I-M6FOSCentOS, Oracle LinuxOS Version6.4
Product Version6.4 
Target VersionFixed in Version 
Summary0006675: MegaRaid volume lock up
DescriptionDuring, or soon after boot, writes to the disk become blocked. System can not reboot (hardware reset help only).

The same bug:
http://bugs.centos.org/view.php?id=5383
Additional InformationSuperServer 5016I-M6F, motherboard X8SI6-F, BIOS v. 1.2a
http://www.supermicro.com.tw/products/system/1u/5016/sys-5016i-mt.cfm
CPU Intel Xeon X3470, RAM 16GB ECC
RAID LSI SAS 2008 (integrated Supermicro SMC 2008-iMR with AOC-SAS2-RAID5-KEY) MegaRAID 9240-8i
HDD: 2x SAS Seagate 300GB (VD0 mirror, sda, system) and 2x SATA Seagate 1000GB (VD1 mirror, user data), all HDDs are enterprise series models/
RAID conf: 2 mirrors (2SAS+2SATA drives in RAID 1) in different VDs (sda, sdb)

Centos 5 was installed a year and a half ago, all works fine (5.4->5.5->5.6->5.7->5.8->5.9), 32x and 64x (a few servers).

Centos 6.4 installation crashed in packages configuration section (fresh install).

Initial RAID info:
Product Name : LSI MR-USAS2
BIOS Version : 4.14.00
Preboot CLI Version: 03.01-002:#%00008
WebBIOS Version : 4.0-16-e_5-Rel
NVDATA Version : 3.03.0044
FW Version : 2.40.04-0819
Boot Block Version : 2.01.00.00-0019
FW Package Build: 20.1.2-0003

So, at first, RAID SW has been upgraded (FW from official supermicro ftp):
Product Name : LSI MR-USAS2
FW Package Build: 20.10.1-0018
BIOS Version : 4.19.00_4.11.05.00_0x0417A000
Preboot CLI Version: 03.02-015:#%00008
WebBIOS Version : 4.0-43-e_31-Rel
NVDATA Version : 3.09.03-0009
FW Version : 2.120.04-1073
Boot Block Version : 2.02.00.00-0001

Installation has been completed after upgrade. But some time later SAS mirror (sda) has locked up with multiple error message in tty:

sd 0:2:0:0: rejecting I/O to offline device

The next upgrade of RAID FW (FW provided by Supermicro' support):
Product Name : Supermicro SMC2008-iMR
FW Package Build: 20.10.1-0119
BIOS Version : 4.31.00_4.12.05.00_0x05180000
Preboot CLI Version: 03.02-020:#%00009
WebBIOS Version : 4.0-59-e_48-Rel
NVDATA Version : 3.09.03-0043
FW Version : 2.130.364-1847
Boot Block Version : 2.02.00.00-0001

System has become unresponsive in a 2-20 minutes after reboot.
TagsNo tags attached.

Activities

WebCraft

WebCraft

2013-09-20 00:20

reporter   ~0018031

# lspci
02:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic MegaRAID SAS 2008 [Falcon] (rev 02)

# modinfo megaraid_sas
filename: /lib/modules/2.6.32-358.el6.x86_64/kernel/drivers/scsi/megaraid/megaraid_sas.ko
description: LSI MegaRAID SAS Driver
author: megaraidlinux@lsi.com
version: 06.504.01.00-rh1
license: GPL
srcversion: BAEA569F109C63BD8764796
alias: pci:v00001000d0000005Dsv*sd*bc*sc*i*
alias: pci:v00001000d0000005Bsv*sd*bc*sc*i*
alias: pci:v00001028d00000015sv*sd*bc*sc*i*
alias: pci:v00001000d00000413sv*sd*bc*sc*i*
alias: pci:v00001000d00000071sv*sd*bc*sc*i*
alias: pci:v00001000d00000073sv*sd*bc*sc*i*
alias: pci:v00001000d00000079sv*sd*bc*sc*i*
alias: pci:v00001000d00000078sv*sd*bc*sc*i*
alias: pci:v00001000d0000007Csv*sd*bc*sc*i*
alias: pci:v00001000d00000060sv*sd*bc*sc*i*
alias: pci:v00001000d00000411sv*sd*bc*sc*i*
depends:
vermagic: 2.6.32-358.el6.x86_64 SMP mod_unload modversions
parm: max_sectors:Maximum number of sectors per IO command (int)
parm: msix_disable:Disable MSI-X interrupt handling. Default: 0 (int)
parm: msix_vectors:MSI-X max vector count. Default: Set by FW (int)
parm: throttlequeuedepth:Adapter queue depth when throttled due to I/O timeout. Default: 16 (int)
parm: resetwaittime:Wait time in seconds after I/O timeout before resetting adapter. Default: 180 (int)
tru

tru

2013-09-20 08:54

administrator   ~0018035

you mention that 5.9 is ok on the same kind of hardware, can you rule out any hardware related error (disk/controller/...) by installing a 5.9 on that same server?
WebCraft

WebCraft

2013-09-20 09:56

reporter   ~0018036

We have 4 identical machines, all were stable with CentOS 5 (x32 and x64). Since nothing is logged, we've had precious little information to diagnose with. A quick workaround is to manually disable the ASPM: 'pcie_aspm=performance' in grub.conf start options ('pcie_aspm=off' isn't working correctly).

# cat /sys/module/pcie_aspm/parameters/policy
default [performance] powersave

Useful URL:
https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Power_Management_Guide/ASPM.html

Issue History

Date Modified Username Field Change
2013-09-19 23:49 WebCraft New Issue
2013-09-20 00:20 WebCraft Note Added: 0018031
2013-09-20 08:54 tru Note Added: 0018035
2013-09-20 09:56 WebCraft Note Added: 0018036