View Issue Details

IDProjectCategoryView StatusLast Update
0010191CentOS-7kernelpublic2018-02-07 07:21
Reportertrnubo 
PriorityhighSeveritymajorReproducibilityalways
Status closedResolutionnot fixable 
Product Version7.2.1511 
Target VersionFixed in Version 
Summary0010191: trim/discards stopped working on lvm thin volume
DescriptionDescription of problem:

Since upgrading to CentOS 7.2 from CentOS 7.1 fstrim on a LVM thin volume is trowing "fstrim: /mnt: the discard operation is not supported" instead of performing a trim and reducing the data usage reported by the "lvs" command.

Booting into an older kernel such as "3.10.0-229.20.1.el7.x86_64" the issue goes away and fstrim works as expected.

How reproducible:

I've been testing on AWS with a 100G EBS volume attached at /dev/xvdb

## Upgrade kernel
# uname -r
3.10.0-327.4.4.el7.x86_64

## Create LV
pvcreate /dev/xvdb
vgcreate data /dev/xvdb
lvcreate -l 99%VG --thinpool pool00 data
lvcreate -T data/pool00 -V 32G --name data
mkfs.xfs /dev/mapper/data-data
mount /dev/mapper/data-data /mnt

## Check lsblk -D out put
# lsblk -D
NAME DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO
xvda 0 0B 0B 0
└─xvda1 0 0B 0B 0
xvdb 0 0B 0B 0
├─data-pool00_tmeta 0 0B 0B 0
│ └─data-pool00-tpool 0 0B 0B 0
│ ├─data-pool00 0 0B 0B 0
│ └─data-data 0 0B 0B 0
└─data-pool00_tdata 0 0B 0B 0
  └─data-pool00-tpool 0 0B 0B 0
    ├─data-pool00 0 0B 0B 0
    └─data-data 0 0B 0B 0
# fstrim -v /mnt
fstrim: /mnt: the discard operation is not supported

Actual results:

Expected results:

By downgrading the kernel to the last CentOS 7.1 kernel the issue goes away.

# grubby --set-default /boot/vmlinuz-3.10.0-229.20.1.el7.x86_64

reboot

# uname -r
3.10.0-229.20.1.el7.x86_64
# lsblk -D
NAME DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO
xvda 0 0B 0B 0
└─xvda1 0 0B 0B 0
xvdb 0 0B 0B 0
├─data-pool00_tmeta 0 0B 0B 0
│ └─data-pool00-tpool 0 64K 64K 0
│ ├─data-pool00 0 64K 64K 0
│ └─data-data 0 64K 64K 0
└─data-pool00_tdata 0 0B 0B 0
  └─data-pool00-tpool 0 64K 64K 0
    ├─data-pool00 0 64K 64K 0
    └─data-data 0 64K 64K 0
# fstrim -v /mnt
/mnt: 32 GiB (34341306368 bytes) trimmed

Additional info:

Both failed and working tests used the same dm and lvm2 packages
# rpm -qa | grep -E '(device-mapper|kernel|lvm2)' | sort
device-mapper-1.02.107-5.el7.x86_64
device-mapper-event-1.02.107-5.el7.x86_64
device-mapper-event-libs-1.02.107-5.el7.x86_64
device-mapper-libs-1.02.107-5.el7.x86_64
device-mapper-persistent-data-0.5.5-1.el7.x86_64
kernel-3.10.0-229.14.1.el7.x86_64
kernel-3.10.0-229.20.1.el7.x86_64
kernel-3.10.0-327.4.4.el7.x86_64
kernel-tools-3.10.0-327.4.4.el7.x86_64
kernel-tools-libs-3.10.0-327.4.4.el7.x86_64
lvm2-2.02.130-5.el7.x86_64
lvm2-libs-2.02.130-5.el7.x86_64

Appears to be similar to https://bugzilla.redhat.com/show_bug.cgi?id=1284174 but that is against a much newer kernel.
TagsNo tags attached.
abrt_hash
URL

Activities

toracat

toracat

2016-01-21 21:42

manager  

10191cplus.patch (6,294 bytes)
centosplus patch (bug 10191)

commit 993ceab91986e2e737ce9a3e23bebc8cce649240
Author: Joe Thornber <ejt@redhat.com>
Date:   Wed Dec 2 12:24:39 2015 +0000

    dm thin metadata: fix bug in dm_thin_remove_range()
    
    dm_btree_remove_leaves() only unmaps a contiguous region so we need a
    loop, in __remove_range(), to handle ranges that contain multiple
    regions.
    
    A new btree function, dm_btree_lookup_next(), is introduced which is
    more efficiently able to skip over regions of the thin device which
    aren't mapped.  __remove_range() uses dm_btree_lookup_next() for each
    iteration of __remove_range()'s loop.
    
    Also, improve description of dm_btree_remove_leaves().
    
    Fixes: 6550f075 ("dm thin metadata: add dm_thin_remove_range()")
    Signed-off-by: Joe Thornber <ejt@redhat.com>
    Signed-off-by: Mike Snitzer <snitzer@redhat.com>
    Cc: stable@vger.kernel.org # 4.1+

    Applied by: Akemi Yagi <toracat@centos.org>

diff --git a/drivers/md/dm-thin-metadata.c b/drivers/md/dm-thin-metadata.c
index 1fa4569..67871e7 100644
--- a/drivers/md/dm-thin-metadata.c
+++ b/drivers/md/dm-thin-metadata.c
@@ -1530,7 +1530,7 @@ static int __remove(struct dm_thin_device *td, dm_block_t block)
 static int __remove_range(struct dm_thin_device *td, dm_block_t begin, dm_block_t end)
 {
 	int r;
-	unsigned count;
+	unsigned count, total_count = 0;
 	struct dm_pool_metadata *pmd = td->pmd;
 	dm_block_t keys[1] = { td->id };
 	__le64 value;
@@ -1553,11 +1553,29 @@ static int __remove_range(struct dm_thin_device *td, dm_block_t begin, dm_block_
 	if (r)
 		return r;
 
-	r = dm_btree_remove_leaves(&pmd->bl_info, mapping_root, &begin, end, &mapping_root, &count);
-	if (r)
-		return r;
+	/*
+	 * Remove leaves stops at the first unmapped entry, so we have to
+	 * loop round finding mapped ranges.
+	 */
+	while (begin < end) {
+		r = dm_btree_lookup_next(&pmd->bl_info, mapping_root, &begin, &begin, &value);
+		if (r == -ENODATA)
+			break;
+
+		if (r)
+			return r;
+
+		if (begin >= end)
+			break;
+
+		r = dm_btree_remove_leaves(&pmd->bl_info, mapping_root, &begin, end, &mapping_root, &count);
+		if (r)
+			return r;
+
+		total_count += count;
+	}
 
-	td->mapped_blocks -= count;
+	td->mapped_blocks -= total_count;
 	td->changed = 1;
 
 	/*
diff --git a/drivers/md/persistent-data/dm-btree.c b/drivers/md/persistent-data/dm-btree.c
index 0918a7c..7e5b7f1 100644
--- a/drivers/md/persistent-data/dm-btree.c
+++ b/drivers/md/persistent-data/dm-btree.c
@@ -63,6 +63,11 @@ int lower_bound(struct btree_node *n, uint64_t key)
 	return bsearch(n, key, 0);
 }
 
+static int upper_bound(struct btree_node *n, uint64_t key)
+{
+	return bsearch(n, key, 1);
+}
+
 void inc_children(struct dm_transaction_manager *tm, struct btree_node *n,
 		  struct dm_btree_value_type *vt)
 {
@@ -392,6 +397,82 @@ int dm_btree_lookup(struct dm_btree_info *info, dm_block_t root,
 }
 EXPORT_SYMBOL_GPL(dm_btree_lookup);
 
+static int dm_btree_lookup_next_single(struct dm_btree_info *info, dm_block_t root,
+				       uint64_t key, uint64_t *rkey, void *value_le)
+{
+	int r, i;
+	uint32_t flags, nr_entries;
+	struct dm_block *node;
+	struct btree_node *n;
+
+	r = bn_read_lock(info, root, &node);
+	if (r)
+		return r;
+
+	n = dm_block_data(node);
+	flags = le32_to_cpu(n->header.flags);
+	nr_entries = le32_to_cpu(n->header.nr_entries);
+
+	if (flags & INTERNAL_NODE) {
+		i = lower_bound(n, key);
+		if (i < 0 || i >= nr_entries) {
+			r = -ENODATA;
+			goto out;
+		}
+
+		r = dm_btree_lookup_next_single(info, value64(n, i), key, rkey, value_le);
+		if (r == -ENODATA && i < (nr_entries - 1)) {
+			i++;
+			r = dm_btree_lookup_next_single(info, value64(n, i), key, rkey, value_le);
+		}
+
+	} else {
+		i = upper_bound(n, key);
+		if (i < 0 || i >= nr_entries) {
+			r = -ENODATA;
+			goto out;
+		}
+
+		*rkey = le64_to_cpu(n->keys[i]);
+		memcpy(value_le, value_ptr(n, i), info->value_type.size);
+	}
+out:
+	dm_tm_unlock(info->tm, node);
+	return r;
+}
+
+int dm_btree_lookup_next(struct dm_btree_info *info, dm_block_t root,
+			 uint64_t *keys, uint64_t *rkey, void *value_le)
+{
+	unsigned level;
+	int r = -ENODATA;
+	__le64 internal_value_le;
+	struct ro_spine spine;
+
+	init_ro_spine(&spine, info);
+	for (level = 0; level < info->levels - 1u; level++) {
+		r = btree_lookup_raw(&spine, root, keys[level],
+				     lower_bound, rkey,
+				     &internal_value_le, sizeof(uint64_t));
+		if (r)
+			goto out;
+
+		if (*rkey != keys[level]) {
+			r = -ENODATA;
+			goto out;
+		}
+
+		root = le64_to_cpu(internal_value_le);
+	}
+
+	r = dm_btree_lookup_next_single(info, root, keys[level], rkey, value_le);
+out:
+	exit_ro_spine(&spine);
+	return r;
+}
+
+EXPORT_SYMBOL_GPL(dm_btree_lookup_next);
+
 /*
  * Splits a node by creating a sibling node and shifting half the nodes
  * contents across.  Assumes there is a parent node, and it has room for
diff --git a/drivers/md/persistent-data/dm-btree.h b/drivers/md/persistent-data/dm-btree.h
index 11d8cf7..c74301f 100644
--- a/drivers/md/persistent-data/dm-btree.h
+++ b/drivers/md/persistent-data/dm-btree.h
@@ -110,6 +110,13 @@ int dm_btree_lookup(struct dm_btree_info *info, dm_block_t root,
 		    uint64_t *keys, void *value_le);
 
 /*
+ * Tries to find the first key where the bottom level key is >= to that
+ * given.  Useful for skipping empty sections of the btree.
+ */
+int dm_btree_lookup_next(struct dm_btree_info *info, dm_block_t root,
+			 uint64_t *keys, uint64_t *rkey, void *value_le);
+
+/*
  * Insertion (or overwrite an existing value).  O(ln(n))
  */
 int dm_btree_insert(struct dm_btree_info *info, dm_block_t root,
@@ -135,9 +142,10 @@ int dm_btree_remove(struct dm_btree_info *info, dm_block_t root,
 		    uint64_t *keys, dm_block_t *new_root);
 
 /*
- * Removes values between 'keys' and keys2, where keys2 is keys with the
- * final key replaced with 'end_key'.  'end_key' is the one-past-the-end
- * value.  'keys' may be altered.
+ * Removes a _contiguous_ run of values starting from 'keys' and not
+ * reaching keys2 (where keys2 is keys with the final key replaced with
+ * 'end_key').  'end_key' is the one-past-the-end value.  'keys' may be
+ * altered.
  */
 int dm_btree_remove_leaves(struct dm_btree_info *info, dm_block_t root,
 			   uint64_t *keys, uint64_t end_key,
10191cplus.patch (6,294 bytes)
toracat

toracat

2016-01-21 21:43

manager   ~0025460

Upstream (kernel.org) commit: 993ceab91986e2e737ce9a3e23bebc8cce649240

Adapted for the plus kernel and uploaded.
toracat

toracat

2016-01-22 18:35

manager   ~0025485

A centosplus kernel set with the patch applied is now available from:

http://people.centos.org/toracat/kernel/7/plus/bug10073_10191/

Please test. Note that the packages are unsigned and are provided for testing purposes.
toracat

toracat

2016-01-26 17:03

manager   ~0025522

kernel-plus-3.10.0-327.4.5.el7 is out. The patch is now in this update.
gooo

gooo

2016-01-29 15:02

reporter   ~0025550

I still get "fstrim: /srv/bricks/testbrick02: the discard operation is not supported"
with kernel-plus-3.10.0-327.4.5.el7:

# uname -r
3.10.0-327.4.5.el7.centos.plus.x86_64

# mount
/dev/mapper/sys_hvmsrv01-testbrick02 on /srv/bricks/testbrick02 type xfs (rw,relatime,seclabel,attr2,inode64,sunit=1024,swidth=1024,noquota)

# lvs -a -olv_name,data_lv,pool_lv,discards,data_percent --segments
  LV Data Pool Discards Data%
  gluster-pool3 [gluster-pool3_tdata] nopassdown 0,73
  [gluster-pool3_tdata]
  [gluster-pool3_tmeta]
  testbrick02 gluster-pool3 nopassdown 0,49

Latest kernel allowing fstrim to work is still 3.10.0-229.20.1.el7.x86_64, at least with my configuration.
trnubo

trnubo

2016-02-02 00:23

reporter   ~0025568

Still getting "the discard operation is not supported" with the kernel "3.10.0-327.4.5.el7.x86_64"
a823275@trbvn.com

a823275@trbvn.com

2016-02-14 16:12

reporter   ~0025718

I get different output for HDDs and SSDs:

> uname -a
Linux kvm 3.10.0-327.4.5.el7.x86_64 #1 SMP Mon Jan 25 22:07:14 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

# sda is a ssd, sdb is a hdd

> lsblk -D
NAME DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO
sda 0 512B 2G 1
├─sda1 0 512B 2G 1
├─sda2 0 512B 2G 1
│ └─luks-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx 0 512B 2G 0
│ ├─vg_kvm-root 0 512B 2G 0
│ ├─vg_kvm-swap 0 512B 2G 0
│ └─vg_kvm-vmImages 0 512B 2G 0
└─sda3 0 512B 2G 1
  └─luks-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx 0 512B 2G 0
    ├─vg_vmData1-thinpool00_tmeta 0 512B 2G 0
    │ └─vg_vmData1-thinpool00-tpool 0 512B 2G 0
    │ ├─vg_vmData1-thinpool00 0 512B 2G 0
    │ ├─vg_vmData1-vmname123 0 64K 16G 0
[...]
sdb 0 0B 0B 0
└─sdb1 0 0B 0B 0
  └─luks-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx 0 0B 0B 0
    ├─vg_vmData2-thinpool01_tmeta 0 0B 0B 0
    │ └─vg_vmData2-thinpool01-tpool 0 0B 0B 0
    │ ├─vg_vmData2-thinpool01 0 0B 0B 0
    │ └─vg_vmData2-vmname456 0 0B 0B 0
[...]
gooo

gooo

2016-02-17 13:29

reporter   ~0025760

still getting error with latest kernel 3.10.0-327.10.1.el7.x86_64

# fstrim /mnt
fstrim: /mnt: the discard operation is not supported
# mount | grep /mnt
/dev/mapper/sys_hvmsrv01-388a0c4626d744e5b923609ee12916e4_0 on /mnt type xfs (rw,relatime,seclabel,nouuid,attr2,inode64,sunit=1024,swidth=1024,noquota)

# lsblk -D
sdc 0 0B 0B 0
├─sdc1 0 0B 0B 0
│ └─md126 0 0B 0B 0
├─sdc2 0 0B 0B 0
│ └─md127 0 0B 0B 0
│ ├─sys_hvmsrv01-gluster--syspool_tmeta 0 0B 0B 0
│ │ └─sys_hvmsrv01-gluster--syspool-tpool 0 0B 0B 0
│ │ ├─sys_hvmsrv01-gluster--syspool 0 0B 0B 0
│ │ ├─sys_hvmsrv01-388a0c4626d744e5b923609ee12916e4_0 0 0B 0B 0
│ │ ├─sys_hvmsrv01-388a0c4626d744e5b923609ee12916e4_1 0 0B 0B 0
│ └─sys_hvmsrv01-gluster--syspool_tdata 0 0B 0B 0
│ └─sys_hvmsrv01-gluster--syspool-tpool 0 0B 0B 0
│ ├─sys_hvmsrv01-gluster--syspool 0 0B 0B 0
│ ├─sys_hvmsrv01-388a0c4626d744e5b923609ee12916e4_0 0 0B 0B 0
│ ├─sys_hvmsrv01-388a0c4626d744e5b923609ee12916e4_1 0 0B ├─sdc3 0 0B 0B 0
│ └─md125 0 0B 0B 0
├─sdc4 0 0B 0B 0
│ └─md124 0 0B 0B 0
├─sdc5 0 0B 0B 0
└─sdc6 0 0B 0B 0
sdd 0 0B 0B 0
├─sdd1 0 0B 0B 0
│ └─md126 0 0B 0B 0
├─sdd2 0 0B 0B 0
│ └─md127 0 0B 0B 0
│ ├─sys_hvmsrv01-gluster--syspool_tmeta 0 0B 0B 0
│ │ └─sys_hvmsrv01-gluster--syspool-tpool 0 0B 0B 0
│ │ ├─sys_hvmsrv01-gluster--syspool 0 0B 0B 0
│ │ ├─sys_hvmsrv01-388a0c4626d744e5b923609ee12916e4_0 0 0B 0B 0
│ │ ├─sys_hvmsrv01-388a0c4626d744e5b923609ee12916e4_1 0 0B 0B 0
│ └─sys_hvmsrv01-gluster--syspool_tdata 0 0B 0B 0
│ └─sys_hvmsrv01-gluster--syspool-tpool 0 0B 0B 0
│ ├─sys_hvmsrv01-gluster--syspool 0 0B 0B 0
│ ├─sys_hvmsrv01-388a0c4626d744e5b923609ee12916e4_0 0 0B 0B 0
│ ├─sys_hvmsrv01-388a0c4626d744e5b923609ee12916e4_1 0 0B 0B 0
├─sdd3 0 0B 0B 0
│ └─md125 0 0B 0B 0
├─sdd4 0 0B 0B 0
│ └─md124 0 0B 0B 0
├─sdd5 0 0B 0B 0
└─sdd6 0 0B 0B 0
tuempeltaucher

tuempeltaucher

2016-03-09 19:55

reporter   ~0025970

Any update here?

I have the same issue, fstrim is working on my ssds but not on the hdds.
trnubo

trnubo

2016-03-09 22:20

reporter   ~0025971

Does this need to be upstreamed to the redhat bugzilla? If so is there a process for that or just create the bug there and reference this one.
tuempeltaucher

tuempeltaucher

2016-03-10 05:51

reporter   ~0025977

I just tried the kernel-ml from elrepo

uname -a
4.4.4-1.el7.elrepo.x86_64

And it is working just fine on my ssd and the hdd.
trnubo

trnubo

2016-05-16 05:17

reporter   ~0026564

This appears to be fixed in the latest centos 7 kernels. kernel-3.10.0-327.13.1.el7.x86_64 & kernel-3.10.0-327.18.2.el7.x86_64

[root@centos ~]# lsblk -D
NAME DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO
sr0 0 0B 0B 0
vda 0 0B 0B 0
└─vda1 0 0B 0B 0
vdb 0 0B 0B 0
├─data-pool00_tmeta 0 0B 0B 0
│ └─data-pool00-tpool 0 0B 0B 0
│ ├─data-pool00 0 0B 0B 0
│ └─data-data 0 64K 16G 0
└─data-pool00_tdata 0 0B 0B 0
  └─data-pool00-tpool 0 0B 0B 0
    ├─data-pool00 0 0B 0B 0
    └─data-data 0 64K 16G 0
[root@centos ~]# uname -a
Linux centos.localdomain 3.10.0-327.18.2.el7.x86_64 #1 SMP Thu May 12 11:03:55 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
[root@centos ~]# fstrim -v /mnt/data
/mnt/data: 28.1 GiB (30146936832 bytes) trimmed
dbray925

dbray925

2016-09-13 11:40

reporter   ~0027499

Seems like this issue is either back again, or was not fixed. I have just installed and patched the latest CentOS 7 with kernel:
3.10.0-327.28.3.el7.x86_64

I made sure to update /etc/lvm/lvm.conf ran dracut -f and rebooted, but I am still getting:
fstrim: /: the discard operation is not supported
toracat

toracat

2017-12-16 17:36

manager   ~0030767

How about now with kernel 3.10.0-693.xxx?
toracat

toracat

2018-02-07 07:21

manager   ~0031176

Closing due to inactivity.

Issue History

Date Modified Username Field Change
2016-01-20 03:29 trnubo New Issue
2016-01-21 21:42 toracat File Added: 10191cplus.patch
2016-01-21 21:43 toracat Note Added: 0025460
2016-01-21 21:43 toracat Status new => assigned
2016-01-22 18:35 toracat Note Added: 0025485
2016-01-26 17:03 toracat Note Added: 0025522
2016-01-29 15:02 gooo Note Added: 0025550
2016-02-02 00:23 trnubo Note Added: 0025568
2016-02-14 16:12 a823275@trbvn.com Note Added: 0025718
2016-02-17 13:29 gooo Note Added: 0025760
2016-03-09 19:55 tuempeltaucher Note Added: 0025970
2016-03-09 22:20 trnubo Note Added: 0025971
2016-03-10 05:51 tuempeltaucher Note Added: 0025977
2016-05-16 05:17 trnubo Note Added: 0026564
2016-09-13 11:40 dbray925 Note Added: 0027499
2017-12-16 17:36 toracat Status assigned => feedback
2017-12-16 17:36 toracat Note Added: 0030767
2018-02-07 07:21 toracat Status feedback => closed
2018-02-07 07:21 toracat Resolution open => not fixable
2018-02-07 07:21 toracat Note Added: 0031176