View Issue Details

IDProjectCategoryView StatusLast Update
0015863CentOS-7lvm2public2019-04-27 07:36
Reporterben_prescott 
PrioritynormalSeveritymajorReproducibilitysometimes
Status newResolutionopen 
Product Version7.6.1810 
Target VersionFixed in Version 
Summary0015863: /dev/mdxx volume and related volume group not available to some lvm commands after upgrading to lvm2-2.02.180-10.el7_6.3
DescriptionOne of two volume groups on system became largely 'masked' following upgrade from lvm2-2.02.180-10.el7_6.2.

Both are built on MD RAID devices.

The one logical volume in the VG still works, and mounts at boot, but manipulation of the volume group is largely impossible because the commands do not know about it.

Running pvscan --cache against the underlying device fixes this issue until next reboot.
lvmdiskscan also sees the 'missing' physical volume.

Investigation and reading around it leads me to suspect it's an issue with lvmetad, or the mechanism by which it is populated.

The issue was drawn to my attention because puppet attempted to recreate the PV and VG.

Steps To ReproduceNot exactly sure how to reproduce it, but this is the steps I followed to troubleshoot it. I may be able to reproduce it on another machine, and if so, I'll update the case.

When running on the el7_6.3 version:

$ df -h | grep mapper
/dev/mapper/yeagervg-lv00sys00 32G 7.1G 25G 23% /
/dev/mapper/datavg-lv01bkp00 400G 337G 64G 85% /srv/data/backup
/dev/mapper/yeagervg-lv00hom00 10G 1.5G 8.6G 15% /home

# vgs
  VG #PV #LV #SN Attr VSize VFree
  yeagervg 1 3 0 wz--n- <230.76g <156.76g
# vgdisplay datavg
  Volume group "datavg" not found
  Cannot process volume group datavg
# pvdisplay /dev/md10
  Failed to find physical volume "/dev/md10".
# lvmdiskscan
[..]
  /dev/md2 [ <230.76 GiB] LVM physical volume
[..]
  /dev/md10 [ <1.82 TiB] LVM physical volume
[..]
  2 LVM physical volumes

# pvscan --cache /dev/md10
# pvs
  PV VG Fmt Attr PSize PFree
  /dev/md10 datavg lvm2 a-- <1.82t <1.43t
  /dev/md2 yeagervg lvm2 a-- <230.76g <156.76g
# vgs
  VG #PV #LV #SN Attr VSize VFree
  datavg 1 1 0 wz--n- <1.82t <1.43t
  yeagervg 1 3 0 wz--n- <230.76g <156.76g
Additional Informationhttps://bugzilla.redhat.com/show_bug.cgi?id=1676921
Excerpt "The lvm2-2.02.180-10.el7_6.3 package introduces a new way of detecting MDRAID and multipath devices."

The only error I found was in the syslog:

lvmetad[2599]: vg_lookup vgid YIIP9N-yQSu-sqh6-yVAB-odLy-kxGT-vpO68G name datavg found incomplete mapping uuid none name none

Probably related, but googling around that error didn't provide any useful clues.

Worked around it by downgrading the lvm2 packages and dependencies as I knew when the issue started. Disabling lvmetad might also resolve the issue, but I've not tried that.

# yum downgrade lvm2-libs-2.02.180-10.el7_6.2 \
                lvm2-2.02.180-10.el7_6.2 \
                device-mapper-event-1.02.149-10.el7_6.2 \
                device-mapper-event-libs-1.02.149-10.el7_6.2 \
                device-mapper-1.02.149-10.el7_6.2 \
                device-mapper-libs-1.02.149-10.el7_6.2
TagsNo tags attached.
abrt_hash
URLhttp://blog.thewatertower.org/2019/02/23/awol-linux-lvm-volume-group-and-physical-volume/

Activities

ben_prescott

ben_prescott

2019-02-23 20:12

reporter   ~0033891

Reproduces on the original machine - if I upgrade the packages, the PV and VG then don't display again.

Attempting to reproduce on another machine.
ben_prescott

ben_prescott

2019-02-23 21:00

reporter   ~0033892

Tried to reproduce on another machine.
Created RAID5 devices for consistency; I don't suppose this is relevant.


# rpm -qa lvm2
lvm2-2.02.177-4.el7.x86_64
# cat /etc/redhat-release
CentOS Linux release 7.5.1804 (Core)

* partitioned two additional disks
* will create three RAID devices, based on three guesses on what's triggering this
  - md3 for case: not the VG containing root.
  - md9 for case: not adjacent to the next lowest device
  - md10 for case: two digit device name

mdadm --create /dev/md3 --level=5 --raid-devices=3 /dev/sdb1 /dev/sdd1 /dev/sdb5
mdadm --create /dev/md9 --level=5 --raid-devices=3 /dev/sdb2 /dev/sdd2 /dev/sdb6
mdadm --create /dev/md10 --level=5 --raid-devices=3 /dev/sdb3 /dev/sdd3 /dev/sdb7

* rest of test set

vgcreate testvg1 /dev/md3
vgcreate testvg2 /dev/md9
vgcreate testvg3 /dev/md10
lvcreate -L 1G -n vol1 testvg1
lvcreate -L 1G -n vol2 testvg2
lvcreate -L 1G -n vol3 testvg3
mkfs.xfs /dev/testvg1/vol1
mkfs.xfs /dev/testvg2/vol2
mkfs.xfs /dev/testvg3/vol3

for i in 1 2 3 ; do
echo "/dev/mapper/testvg${i}-vol${i} /mnt/${i} xfs defaults 1 1" >> /etc/fstab
mkdir -p /mnt/${i}
done

mount -a
df -h | grep mapper

* results

# df -h | grep mapper
/dev/mapper/proximavg-lv00sys00 15G 9.4G 4.6G 68% /
/dev/mapper/proximavg-lv00hom00 9.9G 5.3G 4.1G 57% /home
/dev/mapper/testvg1-vol1 1018M 33M 986M 4% /mnt/1
/dev/mapper/testvg2-vol2 1018M 33M 986M 4% /mnt/2
/dev/mapper/testvg3-vol3 1018M 33M 986M 4% /mnt/3

# pvs
  PV VG Fmt Attr PSize PFree
  /dev/md10 testvg3 lvm2 a-- 19.98g 18.98g
  /dev/md2 proximavg lvm2 a-- 223.84g 198.84g
  /dev/md3 testvg1 lvm2 a-- 19.98g 18.98g
  /dev/md9 testvg2 lvm2 a-- 19.98g 18.98g
# vgs
  VG #PV #LV #SN Attr VSize VFree
  proximavg 1 2 0 wz--n- 223.84g 198.84g
  testvg1 1 1 0 wz--n- 19.98g 18.98g
  testvg2 1 1 0 wz--n- 19.98g 18.98g
  testvg3 1 1 0 wz--n- 19.98g 18.98g
# cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4]
md10 : active raid5 sdb7[3] sdd3[1] sdb3[0]
      20953088 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU]
      
md9 : active raid5 sdb6[3] sdd2[1] sdb2[0]
      20953088 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU]
      
md3 : active raid5 sdb5[3] sdd1[1] sdb1[0]
      20953088 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU]
      
md2 : active raid1 sda3[0] sdc3[2]
      234727232 blocks super 1.2 [2/2] [UU]
      bitmap: 1/2 pages [4KB], 65536KB chunk

md1 : active raid1 sda1[0] sdc1[2]
      8286144 blocks super 1.2 [2/2] [UU]
      
md0 : active raid1 sda2[0] sdc2[2]
      1048512 blocks super 1.0 [2/2] [UU]
      bitmap: 0/1 pages [0KB], 65536KB chunk

unused devices:

* reboot; patch OS.
* reboot

# rpm -qa lvm2
lvm2-2.02.180-10.el7_6.3.x86_64
# vgs
  VG #PV #LV #SN Attr VSize VFree
  proximavg 1 2 0 wz--n- 223.84g 198.84g
  testvg1 1 1 0 wz--n- 19.98g 18.98g
  testvg2 1 1 0 wz--n- 19.98g 18.98g
  testvg3 1 1 0 wz--n- 19.98g 18.98g
# lvs
  LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
  lv00hom00 proximavg -wi-ao---- 10.00g
  lv00sys00 proximavg -wi-ao---- 15.00g
  vol1 testvg1 -wi-ao---- 1.00g
  vol2 testvg2 -wi-ao---- 1.00g
  vol3 testvg3 -wi-ao---- 1.00g
# pvs
  PV VG Fmt Attr PSize PFree
  /dev/md10 testvg3 lvm2 a-- 19.98g 18.98g
  /dev/md2 proximavg lvm2 a-- 223.84g 198.84g
  /dev/md3 testvg1 lvm2 a-- 19.98g 18.98g
  /dev/md9 testvg2 lvm2 a-- 19.98g 18.98g


* LVs, VGS, and then the RAID devices removed (signatures removed off partitions) for all but /dev/md10
* reboot

# cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4]
md10 : active raid5 sdb7[3] sdb3[0] sdd3[1]
      20953088 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU]
      
md1 : active raid1 sda1[0] sdc1[2]
      8286144 blocks super 1.2 [2/2] [UU]
      
md0 : active raid1 sda2[0] sdc2[2]
      1048512 blocks super 1.0 [2/2] [UU]
      bitmap: 0/1 pages [0KB], 65536KB chunk

md2 : active raid1 sda3[0] sdc3[2]
      234727232 blocks super 1.2 [2/2] [UU]
      bitmap: 1/2 pages [4KB], 65536KB chunk

unused devices:
# pvs
  PV VG Fmt Attr PSize PFree
  /dev/md10 testvg3 lvm2 a-- 19.98g 18.98g
  /dev/md2 proximavg lvm2 a-- 223.84g 198.84g
# vgs
  VG #PV #LV #SN Attr VSize VFree
  proximavg 1 2 0 wz--n- 223.84g 198.84g
  testvg3 1 1 0 wz--n- 19.98g 18.98g
ben_prescott

ben_prescott

2019-02-23 21:02

reporter   ~0033893

Conclusion: can't easily reproduce the issue from scratch on another machine, and not sure what the missing variables are.
pasik

pasik

2019-04-24 21:00

@56@   ~0034250

I just had this same lvmetad issue/bug happening on my home server (running CentOS 7.6, with most updated installed), which has LVM on top of MDRAID.

Running these commands fixed the issue, and lvm pvs/vgs/lvs became "visible" again:

systemctl stop lvm2-lvmetad.socket
systemctl stop lvm2-lvmetad.service

Is there already upstream rhel7 bugzilla open about this issue?
ben_prescott

ben_prescott

2019-04-25 12:54

reporter   ~0034257

I've looked in the last week or so for an upstream bugzilla, and there doesn't seem to be one yet. I've not raised one (yet.) I wasn't sure what the correct process was, given that I'm not running RHEL, and couldn't find any guidance anywhere.

I'm thinking it's an issue that occurs during boot, as my experience is consistent with pasik; restarting lvm2-lvmetad.service is a workaround.

I rebuilt a second machine as centos7 last week, and it's intermittently demonstrating the issue. That's when I thought it might not just be one machine with the issue. I've been building and destroying kvm guests over the last few weeks, and I think I saw it there as well, but I didn't keep track of it.

I've enabled lvm2-lvmetad debugging on my machines and have the debug data for a 'bad' reboot, once I've got a debug data for a 'good' reboot with the downgraded packages, I planning to raise a bugzilla.

My reading around suggests lvmetad is fed by udev, so might need debugging there as well, but wasn't going to hold off on raising a bugzilla.
ben_prescott

ben_prescott

2019-04-25 12:58

reporter   ~0034258

I suspect I won't get round to raising a bugzilla until the weekend, but will update here when I do.
ben_prescott

ben_prescott

2019-04-27 07:16

reporter   ~0034401

https://bugzilla.redhat.com/show_bug.cgi?id=1703644
ben_prescott

ben_prescott

2019-04-27 07:36

reporter   ~0034402

but actually, I think there's already one - https://bugzilla.redhat.com/show_bug.cgi?id=1672336 - will find out soon enough, and I'll try the patched packages.

Issue History

Date Modified Username Field Change
2019-02-23 14:54 ben_prescott New Issue
2019-02-23 20:12 ben_prescott Note Added: 0033891
2019-02-23 21:00 ben_prescott Note Added: 0033892
2019-02-23 21:02 ben_prescott Note Added: 0033893
2019-04-24 21:00 pasik Note Added: 0034250
2019-04-25 12:54 ben_prescott Note Added: 0034257
2019-04-25 12:58 ben_prescott Note Added: 0034258
2019-04-27 07:16 ben_prescott Note Added: 0034401
2019-04-27 07:36 ben_prescott Note Added: 0034402