View Issue Details

IDProjectCategoryView StatusLast Update
0015937CentOS-7kernelpublic2019-03-19 19:03
Reporterjsomervi Assigned To 
Status newResolutionopen 
Product Version7.6.1810 
Summary0015937: rbd/ceph/ext4 lockup issue
DescriptionInstall CentOS 7.6 . Install jewell release of ceph. Create an rbd device of size 256 MB. Format it to ext4 and mount it. Then pound it with writes via fsstress. You will eventually see after a few minutes:

[ 840.074340] INFO: task fsstress:6646 blocked for more than 120 seconds.
[ 840.077420] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 960.081179] INFO: task jbd2/rbd0-8:4468 blocked for more than 120 seconds.
[ 960.086855] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 960.094278] INFO: task fsstress:6646 blocked for more than 120 seconds.
[ 960.100031] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1080.106776] INFO: task jbd2/rbd0-8:4468 blocked for more than 120 seconds.
[ 1080.110165] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

It is stuck hard, not advancing at all.

root 4468 0.1 0.0 0 0 ? D 13:38 0:01 [jbd2/rbd0-8]
root 6646 0.0 0.0 69908 588 ttyS0 D+ 13:49 0:00 ./fsstress -w -n 542 -d /mnt

# cat /proc/4468/stack
[<ffffffffa3bb57a1>] wait_on_page_bit+0x81/0xa0
[<ffffffffa3bb58d1>] __filemap_fdatawait_range+0x111/0x190
[<ffffffffa3bb5964>] filemap_fdatawait_range+0x14/0x30
[<ffffffffa3bb59a7>] filemap_fdatawait+0x27/0x30
[<ffffffffc069ab01>] jbd2_journal_commit_transaction+0xa81/0x19b0 [jbd2]
[<ffffffffc06a0e89>] kjournald2+0xc9/0x260 [jbd2]
[<ffffffffa3ac1c71>] kthread+0xd1/0xe0
[<ffffffffa4174c37>] ret_from_fork_nospec_end+0x0/0x39
[<ffffffffffffffff>] 0xffffffffffffffff

# cat /proc/6646/stack
[<ffffffffa3bb57a1>] wait_on_page_bit+0x81/0xa0
[<ffffffffa3bb58d1>] __filemap_fdatawait_range+0x111/0x190
[<ffffffffa3bb5964>] filemap_fdatawait_range+0x14/0x30
[<ffffffffa3bb7dc6>] filemap_write_and_wait_range+0x56/0x90
[<ffffffffc06b65aa>] ext4_sync_file+0xba/0x320 [ext4]
[<ffffffffa3c76227>] do_fsync+0x67/0xb0
[<ffffffffa3c76510>] SyS_fsync+0x10/0x20
[<ffffffffa4174ddb>] system_call_fastpath+0x22/0x27
[<ffffffffffffffff>] 0xffffffffffffffff

There's nothing interesting in dmesg other than the above tracebacks.

If you make the size of the rbd device 1 GB or greater, the problem seems to go away. At least we haven't seen it yet.

I tried using the older CentOS 7 kernels and they all exhibited the same problem, with a small difference. The 7.0 and 7.1 kernels only had fsstress locked up without the jbd2 kernel thread also locking up. I tried using the upstream 4.9 kernel which I built myself on the CentOS 7.6 platform using the config file from 7.6. It worked fine, no lockups.

Another point of interest. I instead made an xfs filesystem on /dev/rbd0 and it exhibited no problems.

I turned on dynamic debug for both modules rbd and libceph, and saw nothing interesting in the rather voluminous set of logs. I'm no expert in this software however.

Moving ceph jewell ahead from 10.2.6 to 10.2.11 made no difference.
Steps To ReproduceThe problem appears on real hardware, qemu, and virtualbox.

I create a qemu session with two qcow disks (160 GB and 10 GB) and at least one network adaptor bridged through to the internet.

# Install CentOS7 Min install ISO
- Language: English
- Date/Time: whatever
- Network/Hostname
  - enp0s3 <- or whatever
    - Turn on
    - General: Automatically connect to this network when its available
    - IPv4 Manual Config
      - Address:
      - DNS:
  - enp0s8
    - Leave off
  - enp0s9
    - Leave off
  - Hostname: controller-0
- Installation Destination
  - Select /dev/sda
  - Automatic partitioning
- Software Selection
  - Minimal Install
- Root password
  - root/whatever
- Create User
  - user1/another whatever

# Wait for install to complete, then log in as root

# Enable tty console
  - vi /etc/default/grub
  - (update this line)
    - GRUB_CMDLINE_LINUX="crashkernel=auto console=ttyS0,115200n8"
  - grub2-mkconfig --output=/boot/grub2/grub.cfg
  - reboot

# login as root again

# Update
rpm --import
rpm -Uvh
rpm -Uvh
yum -y install epel-release
yum -y repolist && yum -y update

# Install helpful packages
yum -y install yum-plugin-remove-with-leaves
yum -y install yum-versionlock

# Install required packages:
yum -y install ntp ntpdate ntp-doc
yum -y install wget
yum -y install boost-devel boost-thread boost-system boost-program-options cryptsetup gdisk hdparm gperftools python-flask python-requests redhat-lsb-core fuse-libs lttng-ust libbabeltrace fcgi python-setuptools
yum --enablerepo=epel install leveldb-devel

# Setup Ceph repo
rpm --import ''
cat > /etc/yum.repos.d/ceph.repo <<EOF
name=Ceph packages for $basearch

name=Ceph noarch packages

name=Ceph source packages

# See what versions we have and install 10.2.6
yum list ceph
yum install ceph
yum --showduplicates list ceph
yum -y install librados2-10.2.6
yum versionlock add librados2
yum -y install libradosstriper1-10.2.6
yum versionlock add libradosstriper1
yum -y install ceph-10.2.6

# Setup /etc/hosts and /etc/ceph/ceph.conf
cat <<EOF > /etc/hosts localhost localhost.localdomain localhost4 localhost4.localdomain4 localhost localhost.localdomain localhost4 localhost4.localdomain4 controller-0

cat <<EOF > /etc/ceph/ceph.conf
        # Unique ID for the cluster.
        fsid = 0a6c1f55-a661-4e2b-8fd1-9c9d61ae0372
        # Public network where the monitor is connected to, i.e,
        #public network =
        # For version 0.55 and beyond, you must explicitly enable
        # or disable authentication with "auth" entries in [global].
        auth_cluster_required = none
        auth_service_required = none
        auth_client_required = none
        osd_journal_size = 1024

        # Uncomment the following line if you are mounting with ext4
        # filestore xattr use omap = true

        # Number of replicas of objects. Write an object 2 times.
        # Cluster cannot reach an active + clean state until there's enough OSDs
        # to handle the number of copies of an object. In this case, it requires
        # at least 2 OSDs
        osd_pool_default_size = 2

        # Allow writing one copy in a degraded state.
        osd_pool_default_min_size = 1

        # Ensure you have a realistic number of placement groups. We recommend
        # approximately 100 per OSD. E.g., total number of OSDs multiplied by 100
        # divided by the number of replicas (i.e., osd pool default size). So for
        # 2 OSDs and osd pool default size = 2, we'd recommend approximately
        # (100 * 2) / 2 = 100.
        osd_pool_default_pg_num = 64
        osd_pool_default_pgp_num = 64
        osd_crush_chooseleaf_type = 1
        setuser match path = /var/lib/ceph/$type/$cluster-$id

        # Override Jewel default of 2 reporters. StarlingX has replication factor 2
        mon_osd_min_down_reporters = 1

        # Use Hammer's report interval default value
        osd_mon_report_interval_max = 120
 mon_initial_members = controller-0
 auth_supported = none
 ms_bind_ipv6 = false

        osd_mkfs_type = xfs
        osd_mkfs_options_xfs = "-f"
        osd_mount_options_xfs = "rw,noatime,inode64,logbufs=8,logbsize=256k"

    mon warn on legacy crush tunables = false
    # Quiet new warnings on move to Hammer
    mon pg warn max per osd = 2048
    mon pg warn max object skew = 0
    mon clock drift allowed = .1

public_addr =
mon_addr =
host = controller-0

public_addr =

# Enable the monitor
sed -i 's/--setuser ceph --setgroup ceph//g' /usr/lib/systemd/system/ceph-mon@.service
rm -fr /var/lib/ceph/mon/ceph-controller-0
ceph-mon -i controller-0 --pid-file /var/run/ceph/ -c /etc/ceph/ceph.conf --cluster ceph --mkfs
touch /var/lib/ceph/mon/ceph-controller-0/done
systemctl enable ceph-mon@controller-0.service
systemctl start ceph-mon@controller-0.service
systemctl status ceph-mon@controller-0.service
ceph -s

# Enable the OSD
ceph osd tree
sed -i 's/--setuser ceph --setgroup ceph//g' /usr/lib/systemd/system/ceph-osd@.service
umount /var/lib/ceph/osd/ceph-0
/usr/sbin/ceph-disk prepare --cluster ceph --cluster-uuid 0a6c1f55-a661-4e2b-8fd1-9c9d61ae0372 --fs-type xfs --zap-disk /dev/sdb
ceph-disk --setuser root --setgroup root activate /dev/sdb1
systemctl enable ceph-osd@0
systemctl start ceph-osd@0
systemctl status ceph-osd@0
ceph osd tree
ceph -s
ceph osd pool set rbd size 1
ceph -s

# Run the test

rbd -p rbd create test --size 256M --image-format=1
rbd -p rbd map test
mkfs.ext4 -F -m0 /dev/rbd0
mount /dev/rbd0 /mnt

# Obtain fsstress, either compiled or get the source and build it yourself.

for i in {1..1000}; do echo -n $i: ; sudo ./fsstress -w -n $i -d /mnt; done

Lockups usually occur somewhere around iteration 400, maybe earlier if you're lucky. The longest I personally saw it go was 750.

TagsNo tags attached.


There are no notes attached to this issue.

Issue History

Date Modified Username Field Change
2019-03-19 19:03 jsomervi New Issue