View Issue Details

IDProjectCategoryView StatusLast Update
0016239CentOS-7xfsprogspublic2019-07-03 16:15
Reporterbgolliher 
PriorityurgentSeveritymajorReproducibilityrandom
Status newResolutionopen 
Product Version7.6.1810 
Target VersionFixed in Version 
Summary0016239: XFS (sdk1): Corruption detected. Unmount and run xfs_repair
DescriptionWe've had 3 xfs corruption detected events. We have 128TB filesystems (16 of them), and have lost files (we have backups). We've seen this on older RHEL based distributions (Scientific Linux 6 - 2.6.32 kernel) but hoped new deployments on Centos7 with 3.10 kernel would remove these problems.

[bgolliher@system1 ~]$ xfs_info /dev/sdk1
meta-data=/dev/sdk1 isize=512 agcount=128, agsize=268435440 blks
         = sectsz=4096 attr=2, projid32bit=1
         = crc=1 finobt=0 spinodes=0
data = bsize=4096 blocks=34179644928, imaxpct=1
         = sunit=16 swidth=160 blks
naming =version 2 bsize=8192 ascii-ci=0 ftype=1
log =internal bsize=4096 blocks=521728, version=2
         = sectsz=4096 sunit=1 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
[bgolliher@system1 ~]$ uname -a
Linux system1.box.net 3.10.0-957.10.1.el7.x86_64 #1 SMP Mon Mar 18 15:06:45 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
[bgolliher@system1 ~]$ xfs_info -V
xfs_info version 4.5.0
[bgolliher@system1 ~]$

We have more then 250 systems with xfs filesystems ranging from 25TB to 128TB. The new systems running CentOS7 are all 128TB filesystems.
Those 128TB filesystems are laid over a raid volume presented by a AVAGO MegaRAID SAS 9380-8e (LSI Logic / Symbios Logic MegaRAID SAS-3 3108 [Invader] from lspci -nn). We're using Driver Name = megaraid_sas Driver Version = 07.709.08.00, and the cards themselves are on Firmware Package Build = 24.21.0-0091 and Firmware Version = 4.680.00-8446. There are two cards, each has a WDC Jbod attached so 204 drives across both jbods. Each raid set is Raid6 across 14 disks. Each disk is 14TB, WUH721414AL. The system is purpose built for density.

-----------------------------------------------------------------------------
DG Arr Row EID:Slot DID Type State BT Size PDC PI SED DS3 FSpace TR
-----------------------------------------------------------------------------
 0 - - - - RAID6 Optl N 127.329 TB dflt N N dflt N N
 0 0 - - - RAID6 Optl N 127.329 TB dflt N N dflt N N
 0 0 0 8:0 50 DRIVE Onln N 12.732 TB dflt N N dflt - N
 0 0 1 8:1 46 DRIVE Onln N 12.732 TB dflt N N dflt - N
 0 0 2 8:2 49 DRIVE Onln N 12.732 TB dflt N N dflt - N
 0 0 3 8:3 53 DRIVE Onln N 12.732 TB dflt N N dflt - N
 0 0 4 8:4 80 DRIVE Onln N 12.732 TB dflt N N dflt - N
 0 0 5 8:5 65 DRIVE Onln N 12.732 TB dflt N N dflt - N
 0 0 6 8:6 66 DRIVE Onln N 12.732 TB dflt N N dflt - N
 0 0 7 8:7 67 DRIVE Onln N 12.732 TB dflt N N dflt - N
 0 0 8 8:8 61 DRIVE Onln N 12.732 TB dflt N N dflt - N
 0 0 9 8:9 58 DRIVE Onln N 12.732 TB dflt N N dflt - N
 0 0 10 8:10 69 DRIVE Onln N 12.732 TB dflt N N dflt - N
 0 0 11 8:11 76 DRIVE Onln N 12.732 TB dflt N N dflt - N

We know how to recover, and have had to recover a handful of files from backup. We are looking for ways to prevent this from happening.
Steps To ReproduceIt happens sporadically.
Additional InformationWe have more than 3,700 filesystems with this kind of setup.
Tags3.10.0-957.21.2.el7.x86_64, corruption, xfs
abrt_hash
URL

Activities

TrevorH

TrevorH

2019-07-03 16:15

manager   ~0034773

CentOS is a rebuild of the sources used to create RHEL. We do not modify anything except to remove branding and logos. You will need to submit your request to Redhat via bugzilla.redhat.com and if/when RH accepts it and incorporates it into RHEL and releases a patched version, then CentOS will pick it up and rebuild it.

Issue History

Date Modified Username Field Change
2019-07-03 16:06 bgolliher New Issue
2019-07-03 16:06 bgolliher Tag Attached: 3.10.0-957.21.2.el7.x86_64
2019-07-03 16:06 bgolliher Tag Attached: xfs
2019-07-03 16:06 bgolliher Tag Attached: corruption
2019-07-03 16:15 TrevorH Note Added: 0034773