View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0015863 | CentOS-7 | lvm2 | public | 2019-02-23 14:54 | 2019-04-27 07:36 |
Reporter | ben_prescott | Assigned To | |||
Priority | normal | Severity | major | Reproducibility | sometimes |
Status | new | Resolution | open | ||
Product Version | 7.6.1810 | ||||
Summary | 0015863: /dev/mdxx volume and related volume group not available to some lvm commands after upgrading to lvm2-2.02.180-10.el7_6.3 | ||||
Description | One of two volume groups on system became largely 'masked' following upgrade from lvm2-2.02.180-10.el7_6.2. Both are built on MD RAID devices. The one logical volume in the VG still works, and mounts at boot, but manipulation of the volume group is largely impossible because the commands do not know about it. Running pvscan --cache against the underlying device fixes this issue until next reboot. lvmdiskscan also sees the 'missing' physical volume. Investigation and reading around it leads me to suspect it's an issue with lvmetad, or the mechanism by which it is populated. The issue was drawn to my attention because puppet attempted to recreate the PV and VG. | ||||
Steps To Reproduce | Not exactly sure how to reproduce it, but this is the steps I followed to troubleshoot it. I may be able to reproduce it on another machine, and if so, I'll update the case. When running on the el7_6.3 version: $ df -h | grep mapper /dev/mapper/yeagervg-lv00sys00 32G 7.1G 25G 23% / /dev/mapper/datavg-lv01bkp00 400G 337G 64G 85% /srv/data/backup /dev/mapper/yeagervg-lv00hom00 10G 1.5G 8.6G 15% /home # vgs VG #PV #LV #SN Attr VSize VFree yeagervg 1 3 0 wz--n- <230.76g <156.76g # vgdisplay datavg Volume group "datavg" not found Cannot process volume group datavg # pvdisplay /dev/md10 Failed to find physical volume "/dev/md10". # lvmdiskscan [..] /dev/md2 [ <230.76 GiB] LVM physical volume [..] /dev/md10 [ <1.82 TiB] LVM physical volume [..] 2 LVM physical volumes # pvscan --cache /dev/md10 # pvs PV VG Fmt Attr PSize PFree /dev/md10 datavg lvm2 a-- <1.82t <1.43t /dev/md2 yeagervg lvm2 a-- <230.76g <156.76g # vgs VG #PV #LV #SN Attr VSize VFree datavg 1 1 0 wz--n- <1.82t <1.43t yeagervg 1 3 0 wz--n- <230.76g <156.76g | ||||
Additional Information | https://bugzilla.redhat.com/show_bug.cgi?id=1676921 Excerpt "The lvm2-2.02.180-10.el7_6.3 package introduces a new way of detecting MDRAID and multipath devices." The only error I found was in the syslog: lvmetad[2599]: vg_lookup vgid YIIP9N-yQSu-sqh6-yVAB-odLy-kxGT-vpO68G name datavg found incomplete mapping uuid none name none Probably related, but googling around that error didn't provide any useful clues. Worked around it by downgrading the lvm2 packages and dependencies as I knew when the issue started. Disabling lvmetad might also resolve the issue, but I've not tried that. # yum downgrade lvm2-libs-2.02.180-10.el7_6.2 \ lvm2-2.02.180-10.el7_6.2 \ device-mapper-event-1.02.149-10.el7_6.2 \ device-mapper-event-libs-1.02.149-10.el7_6.2 \ device-mapper-1.02.149-10.el7_6.2 \ device-mapper-libs-1.02.149-10.el7_6.2 | ||||
Tags | No tags attached. | ||||
abrt_hash | |||||
URL | http://blog.thewatertower.org/2019/02/23/awol-linux-lvm-volume-group-and-physical-volume/ | ||||
Reproduces on the original machine - if I upgrade the packages, the PV and VG then don't display again. Attempting to reproduce on another machine. |
|
Tried to reproduce on another machine. Created RAID5 devices for consistency; I don't suppose this is relevant. # rpm -qa lvm2 lvm2-2.02.177-4.el7.x86_64 # cat /etc/redhat-release CentOS Linux release 7.5.1804 (Core) * partitioned two additional disks * will create three RAID devices, based on three guesses on what's triggering this - md3 for case: not the VG containing root. - md9 for case: not adjacent to the next lowest device - md10 for case: two digit device name mdadm --create /dev/md3 --level=5 --raid-devices=3 /dev/sdb1 /dev/sdd1 /dev/sdb5 mdadm --create /dev/md9 --level=5 --raid-devices=3 /dev/sdb2 /dev/sdd2 /dev/sdb6 mdadm --create /dev/md10 --level=5 --raid-devices=3 /dev/sdb3 /dev/sdd3 /dev/sdb7 * rest of test set vgcreate testvg1 /dev/md3 vgcreate testvg2 /dev/md9 vgcreate testvg3 /dev/md10 lvcreate -L 1G -n vol1 testvg1 lvcreate -L 1G -n vol2 testvg2 lvcreate -L 1G -n vol3 testvg3 mkfs.xfs /dev/testvg1/vol1 mkfs.xfs /dev/testvg2/vol2 mkfs.xfs /dev/testvg3/vol3 for i in 1 2 3 ; do echo "/dev/mapper/testvg${i}-vol${i} /mnt/${i} xfs defaults 1 1" >> /etc/fstab mkdir -p /mnt/${i} done mount -a df -h | grep mapper * results # df -h | grep mapper /dev/mapper/proximavg-lv00sys00 15G 9.4G 4.6G 68% / /dev/mapper/proximavg-lv00hom00 9.9G 5.3G 4.1G 57% /home /dev/mapper/testvg1-vol1 1018M 33M 986M 4% /mnt/1 /dev/mapper/testvg2-vol2 1018M 33M 986M 4% /mnt/2 /dev/mapper/testvg3-vol3 1018M 33M 986M 4% /mnt/3 # pvs PV VG Fmt Attr PSize PFree /dev/md10 testvg3 lvm2 a-- 19.98g 18.98g /dev/md2 proximavg lvm2 a-- 223.84g 198.84g /dev/md3 testvg1 lvm2 a-- 19.98g 18.98g /dev/md9 testvg2 lvm2 a-- 19.98g 18.98g # vgs VG #PV #LV #SN Attr VSize VFree proximavg 1 2 0 wz--n- 223.84g 198.84g testvg1 1 1 0 wz--n- 19.98g 18.98g testvg2 1 1 0 wz--n- 19.98g 18.98g testvg3 1 1 0 wz--n- 19.98g 18.98g # cat /proc/mdstat Personalities : [raid1] [raid6] [raid5] [raid4] md10 : active raid5 sdb7[3] sdd3[1] sdb3[0] 20953088 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU] md9 : active raid5 sdb6[3] sdd2[1] sdb2[0] 20953088 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU] md3 : active raid5 sdb5[3] sdd1[1] sdb1[0] 20953088 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU] md2 : active raid1 sda3[0] sdc3[2] 234727232 blocks super 1.2 [2/2] [UU] bitmap: 1/2 pages [4KB], 65536KB chunk md1 : active raid1 sda1[0] sdc1[2] 8286144 blocks super 1.2 [2/2] [UU] md0 : active raid1 sda2[0] sdc2[2] 1048512 blocks super 1.0 [2/2] [UU] bitmap: 0/1 pages [0KB], 65536KB chunk unused devices: * reboot; patch OS. * reboot # rpm -qa lvm2 lvm2-2.02.180-10.el7_6.3.x86_64 # vgs VG #PV #LV #SN Attr VSize VFree proximavg 1 2 0 wz--n- 223.84g 198.84g testvg1 1 1 0 wz--n- 19.98g 18.98g testvg2 1 1 0 wz--n- 19.98g 18.98g testvg3 1 1 0 wz--n- 19.98g 18.98g # lvs LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert lv00hom00 proximavg -wi-ao---- 10.00g lv00sys00 proximavg -wi-ao---- 15.00g vol1 testvg1 -wi-ao---- 1.00g vol2 testvg2 -wi-ao---- 1.00g vol3 testvg3 -wi-ao---- 1.00g # pvs PV VG Fmt Attr PSize PFree /dev/md10 testvg3 lvm2 a-- 19.98g 18.98g /dev/md2 proximavg lvm2 a-- 223.84g 198.84g /dev/md3 testvg1 lvm2 a-- 19.98g 18.98g /dev/md9 testvg2 lvm2 a-- 19.98g 18.98g * LVs, VGS, and then the RAID devices removed (signatures removed off partitions) for all but /dev/md10 * reboot # cat /proc/mdstat Personalities : [raid1] [raid6] [raid5] [raid4] md10 : active raid5 sdb7[3] sdb3[0] sdd3[1] 20953088 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU] md1 : active raid1 sda1[0] sdc1[2] 8286144 blocks super 1.2 [2/2] [UU] md0 : active raid1 sda2[0] sdc2[2] 1048512 blocks super 1.0 [2/2] [UU] bitmap: 0/1 pages [0KB], 65536KB chunk md2 : active raid1 sda3[0] sdc3[2] 234727232 blocks super 1.2 [2/2] [UU] bitmap: 1/2 pages [4KB], 65536KB chunk unused devices: # pvs PV VG Fmt Attr PSize PFree /dev/md10 testvg3 lvm2 a-- 19.98g 18.98g /dev/md2 proximavg lvm2 a-- 223.84g 198.84g # vgs VG #PV #LV #SN Attr VSize VFree proximavg 1 2 0 wz--n- 223.84g 198.84g testvg3 1 1 0 wz--n- 19.98g 18.98g |
|
Conclusion: can't easily reproduce the issue from scratch on another machine, and not sure what the missing variables are. | |
I just had this same lvmetad issue/bug happening on my home server (running CentOS 7.6, with most updated installed), which has LVM on top of MDRAID. Running these commands fixed the issue, and lvm pvs/vgs/lvs became "visible" again: systemctl stop lvm2-lvmetad.socket systemctl stop lvm2-lvmetad.service Is there already upstream rhel7 bugzilla open about this issue? |
|
I've looked in the last week or so for an upstream bugzilla, and there doesn't seem to be one yet. I've not raised one (yet.) I wasn't sure what the correct process was, given that I'm not running RHEL, and couldn't find any guidance anywhere. I'm thinking it's an issue that occurs during boot, as my experience is consistent with pasik; restarting lvm2-lvmetad.service is a workaround. I rebuilt a second machine as centos7 last week, and it's intermittently demonstrating the issue. That's when I thought it might not just be one machine with the issue. I've been building and destroying kvm guests over the last few weeks, and I think I saw it there as well, but I didn't keep track of it. I've enabled lvm2-lvmetad debugging on my machines and have the debug data for a 'bad' reboot, once I've got a debug data for a 'good' reboot with the downgraded packages, I planning to raise a bugzilla. My reading around suggests lvmetad is fed by udev, so might need debugging there as well, but wasn't going to hold off on raising a bugzilla. |
|
I suspect I won't get round to raising a bugzilla until the weekend, but will update here when I do. | |
https://bugzilla.redhat.com/show_bug.cgi?id=1703644 | |
but actually, I think there's already one - https://bugzilla.redhat.com/show_bug.cgi?id=1672336 - will find out soon enough, and I'll try the patched packages. | |
Date Modified | Username | Field | Change |
---|---|---|---|
2019-02-23 14:54 | ben_prescott | New Issue | |
2019-02-23 20:12 | ben_prescott | Note Added: 0033891 | |
2019-02-23 21:00 | ben_prescott | Note Added: 0033892 | |
2019-02-23 21:02 | ben_prescott | Note Added: 0033893 | |
2019-04-24 21:00 | pasik | Note Added: 0034250 | |
2019-04-25 12:54 | ben_prescott | Note Added: 0034257 | |
2019-04-25 12:58 | ben_prescott | Note Added: 0034258 | |
2019-04-27 07:16 | ben_prescott | Note Added: 0034401 | |
2019-04-27 07:36 | ben_prescott | Note Added: 0034402 |