View Issue Details

IDProjectCategoryView StatusLast Update
0015162CentOS-7dlmpublic2018-10-03 14:39
Reportervnick 
PrioritynormalSeveritycrashReproducibilityalways
Status newResolutionopen 
PlatformVMware x86-64OSCentOS-7OS Version7.5.1804
Product Version7.5.1804 
Target VersionFixed in Version 
Summary0015162: Soft Lockup with DLM over SCTP on Multihomed Host
DescriptionI'm running a cluster configuration on a pair of multi-homed virtual machines. I've used corosync, pacemaker, and pcs to set this up. Each of the VMs has three network interfaces - two of them are connected directly to each other through separate physical 10GbE links, and the third is a management interface.

When setting up the cluster with pcs I'm using the option to specify multiple network paths for each node. I then create the the two resources - dlm and clvmd. I'm able to successfully start dlm_controld, but when I try to start clvmd, I get the "CPU1 Bug Soft Lockup" message and the console or SSH session becomes unresponsive.
Steps To Reproduce1. Install all of the cluster prereqs and configure networking
2. pcs cluster setup --start --name cluster node1-int1,node1-int2 node2-int1,node2-int2
3. pcs resource create dlm ocf:pacemaker:controld op monitor interval=30s on-fail=fence clone interleave=true ordered=true
4. pcs resource create clvmd ocf:heartbeat:clvm op monitor interval=30s on-fail=fence clone interleave=true ordered=true
5. pcs constraint order start dlm-clone then clvmd-clone
6. pcs constraint colocation add clvmd-clone with dlm-clone
7. (node1) pcs resource debug-start dlm
8. (node2) pcs resource debug-start dlm
9. (node1) pcs resource debug-start clvmd
Additional Informationdmesg output:

[106764.704962] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [clvmd:46431]
[106764.705889] Modules linked in: sctp drbd_transport_tcp(OE) drbd(OE) dlm nf_conntrack_netbios_ns nf_conntrack_broadcast ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter vmw_vsock_vmci_transport vsock sunrpc sb_edac iosf_mbi crc32_pclmul ghash_clmulni_intel aesni_intel ppdev lrw gf128mul glue_helper ablk_helper cryptd vmw_balloon pcspkr joydev i2c_piix4 sg vmw_vmci shpchp parport_pc parport ip_tables xfs libcrc32c sr_mod cdrom
[106764.705930] sd_mod crc_t10dif crct10dif_generic ata_generic pata_acpi vmwgfx drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm crct10dif_pclmul crct10dif_common crc32c_intel drm mptspi scsi_transport_spi mptscsih ata_piix libata serio_raw mptbase vmxnet3 i2c_core dm_mirror dm_region_hash dm_log dm_mod
[106764.705946] CPU: 0 PID: 46431 Comm: clvmd Tainted: G OEL ------------ 3.10.0-862.9.1.el7.x86_64 #1
[106764.705947] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/21/2015
[106764.705949] task: ffff9660e6f9cf10 ti: ffff96603e448000 task.ti: ffff96603e448000
[106764.705950] RIP: 0010:[<ffffffff8655b1e9>] [<ffffffff8655b1e9>] __write_lock_failed+0x9/0x20
[106764.705957] RSP: 0018:ffff96603e44bd88 EFLAGS: 00000297
[106764.705958] RAX: ffff96603e44bfd8 RBX: ffff96603e44bd30 RCX: 0000000000000000
[106764.705959] RDX: ffff96603e548000 RSI: 0000000000000200 RDI: ffff96603e5481ac
[106764.705960] RBP: ffff96603e44bd88 R08: 0000000000000004 R09: 0000000000000000
[106764.705961] R10: ffff96603eef4180 R11: d2057d5ef9a07b65 R12: ffff96603e44bd58
[106764.705963] R13: ffff96603e44bdcc R14: 0000000000000004 R15: 0000000000000000
[106764.705964] FS: 00007f80a4d17880(0000) GS:ffff9660f9600000(0000) knlGS:0000000000000000
[106764.705966] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[106764.705984] CR2: 00007f80a2e0efe0 CR3: 00000000acf84000 CR4: 00000000003607f0
[106764.705989] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[106764.705990] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[106764.705991] Call Trace:
[106764.705997] [<ffffffff869166ba>] _raw_write_lock_bh+0x2a/0x30
[106764.706004] [<ffffffffc06a381e>] save_listen_callbacks.isra.5+0x1e/0x70 [dlm]
[106764.706008] [<ffffffffc06a4ee2>] dlm_lowcomms_start+0x462/0x580 [dlm]
[106764.706012] [<ffffffffc06a033b>] dlm_new_lockspace+0x10b/0x170 [dlm]
[106764.706016] [<ffffffffc06a9f1f>] device_write+0x38f/0x770 [dlm]
[106764.706020] [<ffffffff8641b490>] vfs_write+0xc0/0x1f0
[106764.706022] [<ffffffff8641c2bf>] SyS_write+0x7f/0xf0
[106764.706026] [<ffffffff86920795>] system_call_fastpath+0x1c/0x21
[106764.706027] Code: 00 00 e9 03 00 00 00 41 ff e7 e8 07 00 00 00 f3 90 0f ae e8 eb f9 4c 89 3c 24 c3 90 90 90 90 90 90 90 55 48 89 e5 f0 ff 07 f3 90 <83> 3f 01 75 f9 f0 ff 0f 75 f1 5d c3 90 66 2e 0f 1f 84 00 00 00
Tags3.10.0-862.9.1.el7, 7.5, cluster, clvmd
abrt_hash
URL

Activities

dragle

dragle

2018-09-13 17:08

reporter   ~0032717

I'm thinking this is the same as an issue I'm having. There are at least many similarities.

In my case, I'm not running VMs; just two physical servers. But like you I'm running DRBD + dlm.

In my case I start with a working cluster. The exact problem you describe starts as soon as I try to add redundant ring protocol to my corosync configuration. I.E., prior to my change my corosync.conf looks like this:

totem {
    version: 2
    cluster_name: MyCluster
    secauth: off
    transport: udpu
}

nodelist {
    node {
        ring0_addr: node1.mydomain.com
        nodeid: 1
    }

    node {
        ring0_addr: node2.mydomain.com
        nodeid: 2
    }
}

quorum {
    provider: corosync_votequorum
    two_node: 1
}

logging {
    to_logfile: yes
    logfile: /var/log/cluster/corosync.log
    to_syslog: yes
}

And that configuration appears to work fine, not having any trouble. But when I change it to look like this:

totem {
    version: 2
    cluster_name: MyCluster
    secauth: off
    transport: udpu
    rrp_mode: passive
}

nodelist {
    node {
        ring0_addr: node1.mydomain.com
        ring1_addr: node1lan.mydomain.com
        nodeid: 1
    }

    node {
        ring0_addr: node2.mydomain.com
        ring1_addr: node2lan.mydomain.com
        nodeid: 2
    }
}

quorum {
    provider: corosync_votequorum
    two_node: 1
}

logging {
    to_logfile: yes
    logfile: /var/log/cluster/corosync.log
    to_syslog: yes
}

node1/node2 and node1lan/node2lan resolve via /etc/hosts to two different private networks over two different NICs on each machine. Both networks connect and communicate ok, at least until the cluster starts.

After restarting my cluster I start getting the error you describe as soon as my filesystem mounts start (they never complete). DRBD and dlm start and appear to be ok. According to the message the CPU was stuck in mount and the mounts timeout. Over time I can see the load averages on both machines consistently climbing, with at least one CPU continuously spinning at 100%. Immediately when it happens I cannot SSH to the other node; and eventually the console on the node I am on becomes completely unresponsive.

The FS is GFS2. So far the only thing I've been able to do when it happens is hard power down the machines. And then when I remove the RRP stuph and restart the cluster all is well again.

It's worth noting that several weeks ago I tried setting up a dlm/lvmlockd configuration (with actual clustered volumes), and had the same problem. At the time I thought it more related to lvmlockd since my errors were reported as coming within it (same CPU#n stuck message). Since I didn't technically need the lvmlockd layer (in my situation I'm fine with just basing the DRBD volumes on an existing LVM volume) I reverted all that, including the RRP which I had tried at the same time. But now I'm wondering if it's something to do with RRP/dlm/GFS2 interactions.
dragle

dragle

2018-09-20 15:25

reporter   ~0032765

Not an answer, but FYI, per https://access.redhat.com/articles/3068921 (support contract required):

"dlm with RRP / SCTP: Red Hat does not support the usage of dlm or dlm-using components when dlm is configured to use SCTP communications, also known as "multi-homing". dlm automatically enables SCTP communications if the cluster is configured to use redundant rings (RRP) - meaning DLM is not supported in RRP clusters."
jch2os

jch2os

2018-09-26 12:55

reporter   ~0032806

I think I have something along these lines as well. I had a two machine cluster setup that has been running for over a year. I decided to do an upgrade. Brought the one machine out of cluster control, pcs cluster stop, then did an yum upgrade. Rebooted that machine and tried connecting to the cluster again, drbd started, then dlm and then when clvmd tried to start the machine that was just upgraded crashed. I was tail'ing messages and he is what I saw.


 kernel:NMI watchdog: BUG: soft lockup - CPU#4 stuck for 23s! [clvmd:17847]
Sep 26 07:41:26 vms03 kernel: NMI watchdog: BUG: soft lockup - CPU#4 stuck for 23s! [clvmd:17847]
Sep 26 07:41:26 vms03 kernel: Modules linked in: sctp drbd(OE) dlm mpt3sas mpt2sas raid_class mptctl mptbase xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 tun ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter devlink dell_rbu bridge stp llc bonding gpio_ich iTCO_wdt iTCO_vendor_support dcdbas coretemp kvm_intel kvm irqbypass joydev ses enclosure scsi_transport_sas sg pcspkr ipmi_si ipmi_devintf ipmi_msghandler lpc_ich i5000_edac i5k_amb shpchp nfsd auth_rpcgss nfs_acl lockd grace sunrpc binfmt_misc ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic sr_mod cdrom crct10dif_common ata_generic pata_acpi amdkfd amd_iommu_v2 radeon i2c_algo_bit drm_kms_helper
Sep 26 07:41:26 vms03 kernel: syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ata_piix drm libata i2c_core e1000e serio_raw megaraid_sas bnx2 ptp pps_core virtio_pci virtio_balloon virtio_scsi virtio_blk virtio_net virtio_ring virtio libcrc32c [last unloaded: drbd]
Sep 26 07:41:26 vms03 kernel: CPU: 4 PID: 17847 Comm: clvmd Kdump: loaded Tainted: G OE ------------ 3.10.0-862.11.6.el7.x86_64 #1
Sep 26 07:41:26 vms03 kernel: Hardware name: Dell Inc. PowerEdge 2950/0X999R, BIOS 2.7.0 10/30/2010
Sep 26 07:41:26 vms03 kernel: task: ffff8a4c75a01fa0 ti: ffff8a5433628000 task.ti: ffff8a5433628000
Sep 26 07:41:26 vms03 kernel: RIP: 0010:[<ffffffffb6d5f26c>] [<ffffffffb6d5f26c>] __write_lock_failed+0xc/0x20
Sep 26 07:41:26 vms03 kernel: RSP: 0018:ffff8a543362bd88 EFLAGS: 00000297
Sep 26 07:41:26 vms03 kernel: RAX: ffff8a543362bfd8 RBX: ffff8a543362bd30 RCX: 0000000000000000
Sep 26 07:41:26 vms03 kernel: RDX: ffff8a5432ba0000 RSI: 0000000000000200 RDI: ffff8a5432ba01ac
Sep 26 07:41:26 vms03 kernel: RBP: ffff8a543362bd88 R08: 0000000000000004 R09: 0000000000000000

Right now I manually brought up DRBD on this machine just to keep the drives in sync, but can only run the VM's on the one. Anyone know what I need to do to resolve this? If DLM doesn't support RRP clusters, then what other options do I have?
vnick

vnick

2018-09-26 13:41

reporter   ~0032807

Regarding the note on the lack of support from Red Hat - the issue I ran into was, if I had a host that had two IP addresses on it, whether I configured RRP or not, DLM detected multiple interfaces and would refuse to use the non-SCTP cluster method (dmesg error saying something like "this host has multiple interfaces, you should use SCTP.") And, when I tried SCTP, I would get the lock-up. So, it seems like you should be able to either force it to use non-SCTP on hosts with multiple interfaces, or fix hang.
jch2os

jch2os

2018-09-26 13:46

reporter   ~0032808

From my log you can see that SCTP is being used as well. Can I revert to and older DLM?

Sep 26 07:40:59 vms03 stonith-ng[20377]: notice: On loss of CCM Quorum: Ignore
Sep 26 07:40:59 vms03 stonith-ng[20377]: warning: No template/tag named 'clvmd'
Sep 26 07:40:59 vms03 stonith-ng[20377]: error: Constraint 'colocation-clvmd-dlm-clone-INFINITY': Invalid reference to 'clvmd'
Sep 26 07:40:59 vms03 clvm(clvmd)[17705]: INFO: clvmd is not running
Sep 26 07:40:59 vms03 clvm(clvmd)[17705]: INFO: clvmd is not running
Sep 26 07:40:59 vms03 clvm(clvmd)[17705]: INFO: Starting /usr/sbin/clvmd:
Sep 26 07:40:59 vms03 kernel: dlm: Using SCTP for communications
Sep 26 07:40:59 vms03 corosync[2761]: [TOTEM ] Retransmit List: 17a9
Sep 26 07:40:59 vms03 corosync[2761]: [TOTEM ] Retransmit List: 17a9
Sep 26 07:40:59 vms03 corosync[2761]: [TOTEM ] Retransmit List: 17a9
Sep 26 07:40:59 vms03 kernel: sctp: Hash tables configured (bind 512/512)
Sep 26 07:41:03 vms03 corosync[2761]: [TOTEM ] Marking ringid 1 interface 192.168.1.23 FAULTY
Sep 26 07:41:26 vms03 kernel: NMI watchdog: BUG: soft lockup - CPU#4 stuck for 23s! [clvmd:17847]
Sep 26 07:41:26 vms03 kernel: Modules linked in: sctp drbd(OE) dlm mpt3sas mpt2sas raid_class mptctl mptbase xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 tun ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter devlink dell_rbu bridge stp llc bonding gpio_ich iTCO_wdt iTCO_vendor_support dcdbas coretemp kvm_intel kvm irqbypass joydev ses enclosure scsi_transport_sas sg pcspkr ipmi_si ipmi_devintf ipmi_msghandler lpc_ich i5000_edac i5k_amb shpchp nfsd auth_rpcgss nfs_acl lockd grace sunrpc binfmt_misc ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic sr_mod cdrom crct10dif_common ata_generic pata_acpi amdkfd amd_iommu_v2 radeon i2c_algo_bit drm_kms_helper
Sep 26 07:41:26 vms03 kernel: syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ata_piix drm libata i2c_core e1000e serio_raw megaraid_sas bnx2 ptp pps_core virtio_pci virtio_balloon virtio_scsi virtio_blk virtio_net virtio_ring virtio libcrc32c [last unloaded: drbd]
Sep 26 07:41:26 vms03 kernel: CPU: 4 PID: 17847 Comm: clvmd Kdump: loaded Tainted: G OE ------------ 3.10.0-862.11.6.el7.x86_64 #1
Sep 26 07:41:26 vms03 kernel: Hardware name: Dell Inc. PowerEdge 2950/0X999R, BIOS 2.7.0 10/30/2010
jch2os

jch2os

2018-09-26 14:09

reporter   ~0032809

Do you think if I comment out the rrp_mode: passive in my config, it will start working?

totem {
version: 2
secauth: off
cluster_name: cluster01
transport: udpu
rrp_mode: passive

        interface {
                ringnumber: 0
                bindnetaddr: 10.10.0.0
                mcastaddr: 226.94.1.1
                mcastport: 5405
                ttl: 1
        }
        interface {
                ringnumber: 1
                bindnetaddr: 192.168.1.0
                mcastaddr: 227.94.1.2
                mcastport: 5407
                ttl: 1
        }

}

nodelist {
  node {
        ring0_addr: vms03
        nodeid: 1
       }
  node {
        ring0_addr: vms04
        nodeid: 2
       }
}

quorum {
provider: corosync_votequorum
two_node: 1
}

logging {
to_syslog: yes
}

corosync {
        user: root
        group: root
}
jch2os

jch2os

2018-09-27 11:44

reporter   ~0032810

Me adjusting my config to this, didn't seem to help. Any other ideas?

totem {
version: 2
secauth: off
cluster_name: cluster01
transport: udpu
        interface {
                ringnumber: 0
                bindnetaddr: 10.10.0.0
                ttl: 1
        }
}

nodelist {
  node {
        ring0_addr: vms03
        nodeid: 1
       }
  node {
        ring0_addr: vms04
        nodeid: 2
       }
}

quorum {
provider: corosync_votequorum
two_node: 1
}

logging {
to_syslog: yes
}

corosync {
        user: root
        group: root
}
jch2os

jch2os

2018-10-03 14:39

reporter   ~0032853

I think my issue was that I did not reboot the machines after taking the multiple rings out. I did a pcs cluster stop --all, modified the corosync.conf, and then rebooted both machines. Then when I did a pcs cluster start --all, the systems seemed to work!

Issue History

Date Modified Username Field Change
2018-08-12 02:01 vnick New Issue
2018-08-12 02:01 vnick Tag Attached: 3.10.0-862.9.1.el7
2018-08-12 02:01 vnick Tag Attached: 7.5
2018-08-12 02:01 vnick Tag Attached: clvmd
2018-08-12 02:01 vnick Tag Attached: cluster
2018-09-13 17:08 dragle Note Added: 0032717
2018-09-20 15:25 dragle Note Added: 0032765
2018-09-26 12:55 jch2os Note Added: 0032806
2018-09-26 13:41 vnick Note Added: 0032807
2018-09-26 13:46 jch2os Note Added: 0032808
2018-09-26 14:09 jch2os Note Added: 0032809
2018-09-27 11:44 jch2os Note Added: 0032810
2018-10-03 14:39 jch2os Note Added: 0032853