View Issue Details

IDProjectCategoryView StatusLast Update
0017610CentOS-7iscsi-initiator-utilspublic2020-07-21 17:43
Reporteranandgbhat 
PriorityurgentSeveritycrashReproducibilityalways
Status newResolutionopen 
Product Version7.8-2003 
Target VersionFixed in Version 
Summary0017610: iscsid stuck in D state causing all the iscsi traffic to stall
Descriptioniscsid gets stuck into D state when iscsi target flips. A single server is configured with multiple iscsi targets with multiple LUNs (>100) exposed to the host. We see occasional iSCSI issues wherein iscsid gets stuck in D state when the iscsi target toggles. This is typically seen when multiple iSCSI targets toggle at the same time.

Here is the state of iscsid:

[root@system-test-01-bqkp70202642-node-2 cohesity]# ps afx | grep iscsid
 4326 pts/0 S+ 0:00 \_ grep --color=auto iscsid
19795 ? D<Ls 0:37 /sbin/iscsid -f
24288 ? D< 0:00 \_ /sbin/iscsid -f
[root@system-test-01-bqkp70202642-node-2 cohesity]# ps afx | less
[root@system-test-01-bqkp70202642-node-2 cohesity]# ps aux | grep iscsid
root 19795 0.0 0.0 61044 9932 ? D<Ls Jul10 0:37 /sbin/iscsid -f
root 20846 0.0 0.0 112812 968 pts/0 S+ 03:54 0:00 grep --color=auto iscsid
root 24288 0.0 0.0 61044 3436 ? D< Jul12 0:00 /sbin/iscsid -f
[root@system-test-01-bqkp70202642-node-2 cohesity]# cat /proc/24288/stack
[<ffffffff8395aa0b>] blk_execute_rq+0xab/0x150
[<ffffffff83ae89d3>] scsi_execute+0xd3/0x170
[<ffffffff83aea8ae>] scsi_execute_req_flags+0x8e/0x100
[<ffffffff83aee1b3>] scsi_probe_and_add_lun+0x243/0xe50
[<ffffffff83aef172>] scsi_report_lun_scan+0x3b2/0x540
[<ffffffff83aef731>] __scsi_scan_target+0x121/0x260
[<ffffffff83aef988>] scsi_scan_target+0x118/0x130
[<ffffffffc089132b>] iscsi_user_scan_session.part.13+0xdb/0x110 [scsi_transport_iscsi]
[<ffffffffc0891381>] iscsi_user_scan_session+0x21/0x30 [scsi_transport_iscsi]
[<ffffffff83ab4c45>] device_for_each_child+0x55/0x90
[<ffffffffc088efb3>] iscsi_user_scan+0x43/0x60 [scsi_transport_iscsi]
[<ffffffff83af1918>] store_scan+0xa8/0x100
[<ffffffff83ab413b>] dev_attr_store+0x1b/0x30
[<ffffffff838da472>] sysfs_kf_write+0x42/0x50
[<ffffffff838d9a5b>] kernfs_fop_write+0xeb/0x160
[<ffffffff8384d1b0>] vfs_write+0xc0/0x1f0
[<ffffffff8384df7f>] SyS_write+0x7f/0xf0
[<ffffffff83d92ed2>] system_call_fastpath+0x25/0x2a
[<ffffffffffffffff>] 0xffffffffffffffff
[root@system-test-01-bqkp70202642-node-2 cohesity]# cat /proc/19795/stack
[<ffffffff836bd6cd>] flush_workqueue+0x13d/0x5e0
[<ffffffff83ae35cd>] scsi_flush_work+0x1d/0x50
[<ffffffffc0890705>] iscsi_remove_session+0xd5/0x1c0 [scsi_transport_iscsi]
[<ffffffffc0890a32>] iscsi_destroy_session+0x12/0x50 [scsi_transport_iscsi]
[<ffffffffc0d2e6f8>] iscsi_session_teardown+0xd8/0x100 [libiscsi]
[<ffffffffc09bffa0>] iscsi_sw_tcp_session_destroy+0x50/0x70 [iscsi_tcp]
[<ffffffffc0892301>] iscsi_if_recv_msg+0xc81/0x14f0 [scsi_transport_iscsi]
[<ffffffffc0892c3b>] iscsi_if_rx+0xcb/0x230 [scsi_transport_iscsi]
[<ffffffff83c90ce0>] netlink_unicast+0x170/0x210
[<ffffffff83c91088>] netlink_sendmsg+0x308/0x420
[<ffffffff83c333a6>] sock_sendmsg+0xb6/0xf0
[<ffffffff83c34269>] ___sys_sendmsg+0x3e9/0x400
[<ffffffff83c35921>] __sys_sendmsg+0x51/0x90
[<ffffffff83c35972>] SyS_sendmsg+0x12/0x20
[<ffffffff83d92ed2>] system_call_fastpath+0x25/0x2a
[<ffffffffffffffff>] 0xffffffffffffffff

Here are the iscsid and kernel versions:

$ uname -a
Linux system-test-01-bqkp80201813-node-4 3.10.0-1127.13.1.el7.x86_64 #1 SMP Tue Jun 23 15:46:38 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
$ iscsid --version
iscsid version 6.2.0.874-17

Crashed the kernel using sysrq to collect the crash dump and logs. Attached the vmcore dmesg along.

Steps To ReproduceMap multiple LUNs from mutiple iSCSI targets.
Continuously do IOs on each of the LUNs while flipping the iSCSI targets.
iscsid gets into D state
Tags3.10.0.-1127
abrt_hash
URL

Activities

anandgbhat

anandgbhat

2020-07-21 13:57

reporter  

vmcore-dmesg.webarchive (1,041,719 bytes)
ManuelWolfshant

ManuelWolfshant

2020-07-21 17:43

manager   ~0037387

CentOS is a rebuild of the sources used to create RHEL and aims to reproduce RHEL bug for bug and feature for feature. Please submit your request to Redhat via bugzilla.redhat.com and if/when RH accepts it and incorporates it into RHEL and releases a patched version, then CentOS will pick it up automatically.
For easier tracking, please crosslink this bug with the one opened at bugzilla.redhat.com.

Issue History

Date Modified Username Field Change
2020-07-21 13:57 anandgbhat New Issue
2020-07-21 13:57 anandgbhat File Added: vmcore-dmesg.webarchive
2020-07-21 13:57 anandgbhat Tag Attached: 3.10.0.-1127
2020-07-21 17:43 ManuelWolfshant Note Added: 0037387