View Issue Details

IDProjectCategoryView StatusLast Update
0015433CentOS-7kernelpublic2018-11-07 21:52
Reporterr0mr0m 
PriorityurgentSeveritycrashReproducibilityrandom
Status newResolutionopen 
Product Version7.5.1804 
Target VersionFixed in Version 
Summary0015433: Kernel crash with null pointer derefence
DescriptionHey,
during the last couple of week I keep on getting kernel panics on my production servers with IP: [<ffffffff8300ac1e>] iommu_flush_dev_iotlb+0x2e/0xa0

The stack trace is mostly different, but always at the network part.

Few examples:

[52016.801572] BUG: unable to handle kernel NULL pointer dereference at 0000000000000308
[52016.802076] IP: [<ffffffff8300ac1e>] iommu_flush_dev_iotlb+0x2e/0xa0
[52016.802524] PGD 8000007e0cb17067 PUD 7e0cb18067 PMD 0
[52016.802956] Oops: 0000 [#1] SMP

[52016.808007] CPU: 86 PID: 183191 Comm: kworker/86:48 Kdump: loaded Tainted: P OE ------------ 3.10.0-862.3.3.el7.strato0005.07278de50e2c.x86_64 #1
[52016.808765] Hardware name: Dell Inc. PowerEdge FC640/05YC4P, BIOS 1.4.8 05/22/2018
[52016.809511] Workqueue: i40e i40e_service_task [i40e]
[52016.810256] task: ffff89a8afb3bf40 ti: ffff89abd76b8000 task.ti: ffff89abd76b8000
[52016.811015] RIP: 0010:[<ffffffff8300ac1e>] [<ffffffff8300ac1e>] iommu_flush_dev_iotlb+0x2e/0xa0
[52016.811790] RSP: 0018:ffff89adbe4c3dd8 EFLAGS: 00010046
[52016.812565] RAX: 0000000000000002 RBX: 0000000000000000 RCX: ffff8a2db3f09c68
[52016.813338] RDX: 0000000000000001 RSI: 0000000000001000 RDI: ffffffff83c325b0
[52016.814109] RBP: ffff89adbe4c3e00 R08: 0000000000000000 R09: ffff8930ce8c8400
[52016.814914] R10: ffff89adbe4dbb20 R11: ffffcec8f04ae200 R12: 0000000000000308
[52016.815680] R13: 000000000000003f R14: 0000000000001000 R15: 00000000000f5708
[52016.816468] FS: 0000000000000000(0000) GS:ffff89adbe4c0000(0000) knlGS:0000000000000000
[52016.817257] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[52016.818054] CR2: 0000000000000308 CR3: 0000007e0c6be000 CR4: 00000000007627e0
[52016.818918] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[52016.819733] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[52016.820504] PKRU: 00000000
[52016.821274] Call Trace:
[52016.822047] <IRQ>
[52016.822056] [<ffffffff8300aea0>] flush_unmaps_timeout+0xd0/0x1c0
[52016.823707] [<ffffffff8300add0>] ? dma_free_pagelist+0x50/0x50
[52016.824556] [<ffffffff82aa1ae8>] call_timer_fn+0x38/0x110
[52016.825405] [<ffffffff8300add0>] ? dma_free_pagelist+0x50/0x50
[52016.826259] [<ffffffff82aa3fdd>] run_timer_softirq+0x22d/0x310
[52016.827124] [<ffffffff82a9ac05>] __do_softirq+0xf5/0x280
[52016.827973] [<ffffffff831890dc>] call_softirq+0x1c/0x30
[52016.828821] [<ffffffff82a2d625>] do_softirq+0x65/0xa0
[52016.829704] [<ffffffff82a9af85>] irq_exit+0x105/0x110
[52016.830591] [<ffffffff8318a508>] smp_apic_timer_interrupt+0x48/0x60
[52016.831488] [<ffffffff8318689c>] apic_timer_interrupt+0x17c/0x190
[52016.832390] <EOI>
[52016.832400] [<ffffffff82b9cfc1>] ? free_hot_cold_page+0x101/0x160
[52016.834220] [<ffffffff82b9d917>] __page_frag_cache_drain+0x37/0x40
[52016.835183] [<ffffffffc0444c3a>] i40e_clean_rx_ring+0x11a/0x1f0 [i40e]
[52016.836130] [<ffffffffc0427348>] i40e_down+0x168/0x1b0 [i40e]
[52016.837082] [<ffffffffc0427435>] i40e_vsi_close+0xa5/0xb0 [i40e]
[52016.838042] [<ffffffffc0427455>] i40e_close+0x15/0x20 [i40e]
[52016.839007] [<ffffffffc04274a3>] i40e_quiesce_vsi.part.74+0x43/0x50 [i40e]
[52016.840134] [<ffffffffc04274fd>] i40e_pf_quiesce_all_vsi.isra.75+0x4d/0x60 [i40e]
[52016.841155] [<ffffffffc042c18a>] i40e_handle_lldp_event+0x27a/0x580 [i40e]
[52016.842156] [<ffffffffc042d9ce>] i40e_service_task+0xd1e/0x1420 [i40e]
[52016.843252] [<ffffffff82ab312f>] process_one_work+0x17f/0x440
[52016.844270] [<ffffffff82ab3df6>] worker_thread+0x126/0x3c0
[52016.845268] [<ffffffff82ab3cd0>] ? manage_workers.isra.24+0x2a0/0x2a0
[52016.846273] [<ffffffff82abb161>] kthread+0xd1/0xe0
[52016.847284] [<ffffffff82abb090>] ? insert_kthread_work+0x40/0x40
[52016.848306] [<ffffffff8318565d>] ret_from_fork_nospec_begin+0x7/0x21
[52016.849334] [<ffffffff82abb090>] ? insert_kthread_work+0x40/0x40
[52016.850426] Code: 00 00 55 48 89 e5 41 57 41 56 49 89 f6 41 55 41 89 d5 41 54 49 89 fc 48 c7 c7 b0 25 c3 83 49 81 c4 08 03 00 00 53 e8 a2 0b 17 00 <49> 8b 1c 24 49 89 c7 4c 39 e3 75 0e eb 3d 0f 1f 40 00 48 8b 1b


Another one, on slightly different hardware:

[847820.291079] BUG: unable to handle kernel NULL pointer dereference at 0000000000000308
[847820.291088] IP: [<ffffffffa01aaa8e>] iommu_flush_dev_iotlb+0x2e/0xa0
[847820.291090] PGD 0
[847820.291092] Oops: 0000 [#1] SMP
[847820.291130] Modules linked in: xt_REDIRECT nf_nat_redirect ip6table_mangle xt_nat xt_mark xt_connmark xt_CHECKSUM iptable_mangle xt_set ip_set_hash_net ip_set vhost_net vhost macvtap
macvlan tun ip6table_raw xt_CT xt_mac xt_physdev veth vport_vxlan vxlan ip6_udp_tunnel udp_tunnel iptable_raw rbd libceph dns_resolver ebtable_filter ebtables ip6table_filter ip6_tables x
t_comment xt_multiport ipt_REJECT nf_reject_ipv4 nf_conntrack_netlink nfnetlink iptable_nat xt_addrtype iptable_filter xt_conntrack br_netfilter bridge stp llc xfs openvswitch nf_conntrac
k_ipv6 nf_nat_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_defrag_ipv6 nf_nat nf_conntrack mlx5_core(OE) mlxfw(OE) mlx4_ib(OE) mlx4_en(OE) mlx4_core(OE) mlx_compat(OE) ib_isert is
csi_target_mod ib_srpt target_core_mod ib_srp scsi_transport_srp
[847820.291169] scsi_tgt ib_ipoib rpcrdma ib_iser libiscsi rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm skx_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqb
ypass pcspkr bnxt_re ib_core ses enclosure sg joydev mei_me hpilo hpwdt mei lpc_ich shpchp ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter dm_multipath ip_tables ext4 mbcache jbd2 d
m_thin_pool dm_persistent_data dm_bio_prison dm_bufio sd_mod crc_t10dif crct10dif_generic uas usb_storage mgag200 i2c_algo_bit drm_kms_helper crct10dif_pclmul crct10dif_common crc32_pclmu
l crc32c_intel syscopyarea sysfillrect ghash_clmulni_intel bnx2x sysimgblt fb_sys_fops ttm aesni_intel lrw gf128mul drm glue_helper ablk_helper cryptd serio_raw scsi_transport_iscsi smart
pqi bnxt_en tg3 scsi_transport_sas mdio libcrc32c devlink ptp i2c_core
[847820.291173] pps_core wmi sunrpc dm_mirror dm_region_hash dm_log dm_mod
[847820.291177] CPU: 11 PID: 19820 Comm: msgr-worker-0 Kdump: loaded Tainted: G OE ------------ 3.10.0-862.11.6.el7.x86_64 #1
[847820.291178] Hardware name: HPE ProLiant DL380 Gen10/ProLiant DL380 Gen10, BIOS U30 06/15/2018
[847820.291180] task: ffff8af7a472bf40 ti: ffff8b0767ea8000 task.ti: ffff8b0767ea8000
[847820.291182] RIP: 0010:[<ffffffffa01aaa8e>] [<ffffffffa01aaa8e>] iommu_flush_dev_iotlb+0x2e/0xa0
[847820.291184] RSP: 0018:ffff8af8401c3c58 EFLAGS: 00010046
[847820.291184] RAX: 0000000000000002 RBX: 0000000000000000 RCX: ffff8b083ac0daf8
[847820.291185] RDX: 0000000000000001 RSI: 0000000000001000 RDI: ffffffffa0e1ea50
[847820.291186] RBP: ffff8af8401c3c80 R08: 0000000000000000 R09: ffff8af967f20500
[847820.291187] R10: ffffffffa019f212 R11: fffffc383ff28bc0 R12: 0000000000000308
[847820.291188] R13: 000000000000003f R14: 0000000000001000 R15: 00000000000fd5b0
[847820.291190] FS: 00007f04ba320700(0000) GS:ffff8af8401c0000(0000) knlGS:0000000000000000
[847820.291191] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
:847820.291191] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[847820.291192] CR2: 0000000000000308 CR3: 0000000f287d2000 CR4: 00000000007627e0
[847820.291193] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[847820.291194] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[847820.291194] PKRU: 55555554
[847820.291195] Call Trace:
[847820.291200] <IRQ>
[847820.291200] [<ffffffffa01aad10>] flush_unmaps_timeout+0xd0/0x1c0
[847820.291205] [<ffffffffa01af0b8>] intel_unmap+0x1c8/0x230
[847820.291207] [<ffffffffa01af1be>] intel_unmap_page+0xe/0x10
[847820.291227] [<ffffffffc0a2efd5>] bnx2x_rx_int+0x4c5/0x1880 [bnx2x]
[847820.291231] [<ffffffffa01db634>] ? consume_skb+0x34/0x90
[847820.291243] [<ffffffffc0a20008>] ? bnx2x_link_initialize+0x68/0x6d0 [bnx2x]
[847820.291256] [<ffffffffc0a322dd>] bnx2x_poll+0x1dd/0x260 [bnx2x]
[847820.291258] [<ffffffffa01f152f>] net_rx_action+0x26f/0x390
[847820.291262] [<ffffffff9fc9dba5>] __do_softirq+0xf5/0x280
[847820.291266] [<ffffffffa0328cec>] call_softirq+0x1c/0x30
[847820.291269] [<ffffffff9fc2e625>] do_softirq+0x65/0xa0
[847820.291271] [<ffffffff9fc9df25>] irq_exit+0x105/0x110
[847820.291273] [<ffffffffa0329fa6>] do_IRQ+0x56/0xf0
[847820.291277] [<ffffffffa031c362>] common_interrupt+0x162/0x162
[847820.291282] <EOI>
[847820.291282] [<ffffffffa0231e00>] ? rt_acct_proc_show+0xd0/0xd0
[847820.291284] [<ffffffffa0231e06>] ? ipv4_mtu+0x6/0x70
[847820.291287] [<ffffffffa0254192>] ? tcp_current_mss+0x42/0xa0
[847820.291289] [<ffffffffa024433c>] tcp_send_mss+0x1c/0x120
[847820.291291] [<ffffffffa0247d21>] tcp_sendmsg+0x121/0xc80
[847820.291296] [<ffffffffa02739a9>] inet_sendmsg+0x69/0xb0
[847820.291300] [<ffffffff9fed80d3>] ? selinux_socket_sendmsg+0x23/0x30
[847820.291303] [<ffffffffa01d1396>] sock_sendmsg+0xb6/0xf0
[847820.291306] [<ffffffffa031b46e>] ? _raw_spin_unlock_bh+0x1e/0x20
[847820.291308] [<ffffffffa01d6790>] ? release_sock+0x120/0x170
[847820.291311] [<ffffffffa01d21a9>] ___sys_sendmsg+0x3a9/0x3c0
[847820.291313] [<ffffffffa01d0a01>] ? sock_aio_read+0x21/0x30
[847820.291316] [<ffffffff9fe1e5c3>] ? do_sync_read+0x93/0xe0
[847820.291319] [<ffffffffa01d37b1>] __sys_sendmsg+0x51/0x90
[847820.291321] [<ffffffffa01d3802>] SyS_sendmsg+0x12/0x20
[847820.291323] [<ffffffffa032579b>] system_call_fastpath+0x22/0x27
[847820.291345] Code: 00 00 55 48 89 e5 41 57 41 56 49 89 f6 41 55 41 89 d5 41 54 49 89 fc 48 c7 c7 50 ea e1 a0 49 81 c4 08 03 00 00 53 e8 92 0b 17 00 <49> 8b 1c 24 49 89 c7 4c 39 e3 75 0e eb 3d 0f 1f 40 00 48 8b 1b
[847820.291347] RIP [<ffffffffa01aaa8e>] iommu_flush_dev_iotlb+0x2e/0xa0
[847820.291348] RSP <ffff8af8401c3c58>




Another example:

71725.386570] BUG: unable to handle kernel NULL pointer dereference at 0000000000000308
[71725.387216] IP: [<ffffffff8c5aaa8e>] iommu_flush_dev_iotlb+0x2e/0xa0
[71725.387813] PGD 0
[71725.388427] Oops: 0000 [#1] SMP
[71725.389181] Modules linked in: xt_CT xt_mac xt_physdev xt_set ip_set_hash_net ip_set vhost_net vhost macvtap macvlan tun xt_CHECKSUM xt_REDIRECT nf_nat_redirect ip6table_mangle xt_nat
xt_mark xt_connmark iptable_mangle ip6table_raw veth vport_vxlan vxlan ip6_udp_tunnel udp_tunnel iptable_raw rbd libceph dns_resolver ebtable_filter ebtables ip6table_filter ip6_tables xt
_comment xt_multiport ipt_REJECT nf_reject_ipv4 nf_conntrack_netlink nfnetlink iptable_nat xt_addrtype iptable_filter xt_conntrack br_netfilter bridge stp llc xfs openvswitch nf_conntrack
_ipv6 nf_nat_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_defrag_ipv6 nf_nat nf_conntrack mlx5_core(OE) mlxfw(OE) mlx4_ib(OE) mlx4_en(OE) mlx4_core(OE) mlx_compat(OE) ib_isert isc
si_target_mod rpcrdma ib_iser ib_srpt libiscsi target_core_mod
[71725.394798] ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm skx_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqb
ypass pcspkr ses enclosure bnxt_re ib_core sg joydev mei_me hpilo hpwdt lpc_ich mei shpchp ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter dm_multipath ip_tables ext4 mbcache jbd2 d
m_thin_pool dm_persistent_data dm_bio_prison dm_bufio sd_mod crc_t10dif crct10dif_generic uas usb_storage mgag200 i2c_algo_bit drm_kms_helper crct10dif_pclmul crct10dif_common crc32_pclmu
l crc32c_intel ghash_clmulni_intel syscopyarea bnx2x sysfillrect sysimgblt fb_sys_fops ttm aesni_intel lrw gf128mul glue_helper ablk_helper drm cryptd smartpqi scsi_transport_iscsi serio_
raw bnxt_en tg3 scsi_transport_sas mdio libcrc32c devlink i2c_core
[71725.401517] ptp pps_core wmi sunrpc dm_mirror dm_region_hash dm_log dm_mod
[71725.403856] CPU: 10 PID: 0 Comm: swapper/10 Kdump: loaded Tainted: G OE ------------ 3.10.0-862.11.6.el7.x86_64 #1
71725.401517] ptp pps_core wmi sunrpc dm_mirror dm_region_hash dm_log dm_mod
[71725.403856] CPU: 10 PID: 0 Comm: swapper/10 Kdump: loaded Tainted: G OE ------------ 3.10.0-862.11.6.el7.x86_64 #1
[71725.406321] Hardware name: HPE ProLiant DL380 Gen10/ProLiant DL380 Gen10, BIOS U30 06/15/2018
[71725.407599] task: ffff926f69b9cf10 ti: ffff926f68c00000 task.ti: ffff926f68c00000
[71725.408879] RIP: 0010:[<ffffffff8c5aaa8e>] [<ffffffff8c5aaa8e>] iommu_flush_dev_iotlb+0x2e/0xa0
[71725.410174] RSP: 0018:ffff927e00183c58 EFLAGS: 00010046
[71725.411453] RAX: 0000000000000002 RBX: 0000000000000000 RCX: ffff928dfaf0daf8
[71725.412735] RDX: 0000000000000001 RSI: 0000000000001000 RDI: ffffffff8d21ea50
[71725.414003] RBP: ffff927e00183c80 R08: 0000000000000000 R09: ffff927ea9e6e500
[71725.415281] R10: ffffffff8c59f212 R11: ffffebfd39719cc0 R12: 0000000000000308
[71725.416574] R13: 000000000000003f R14: 0000000000001000 R15: 00000000000fe80f
[71725.417877] FS: 0000000000000000(0000) GS:ffff927e00180000(0000) knlGS:0000000000000000
[71725.419195] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[71725.420521] CR2: 0000000000000308 CR3: 0000001a45f2c000 CR4: 00000000007627e0
[71725.421851] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[71725.423170] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[71725.424487] PKRU: 00000000
[71725.425783] Call Trace:
[71725.427063] <IRQ>
[71725.427076] [<ffffffff8c5aad10>] flush_unmaps_timeout+0xd0/0x1c0
[71725.429600] [<ffffffff8c5af0b8>] intel_unmap+0x1c8/0x230
[71725.430858] [<ffffffff8c5af1be>] intel_unmap_page+0xe/0x10
[71725.432151] [<ffffffffc09e4fd5>] bnx2x_rx_int+0x4c5/0x1880 [bnx2x]
[71725.433428] [<ffffffff8c5db402>] ? skb_release_data+0x62/0x140
[71725.434727] [<ffffffffc09e82dd>] bnx2x_poll+0x1dd/0x260 [bnx2x]
[71725.436015] [<ffffffff8c5f152f>] net_rx_action+0x26f/0x390
[71725.437313] [<ffffffff8c09dba5>] __do_softirq+0xf5/0x280
[71725.438601] [<ffffffff8c728cec>] call_softirq+0x1c/0x30
[71725.439873] [<ffffffff8c02e625>] do_softirq+0x65/0xa0
[71725.441147] [<ffffffff8c09df25>] irq_exit+0x105/0x110
[71725.442402] [<ffffffff8c729fa6>] do_IRQ+0x56/0xf0
[71725.443639] [<ffffffff8c71c362>] common_interrupt+0x162/0x162
[71725.444874] <EOI>
[71725.444888] [<ffffffff8c56e5d7>] ? cpuidle_enter_state+0x57/0xd0
[71725.447334] [<ffffffff8c56e72e>] cpuidle_idle_call+0xde/0x230
[71725.448532] [<ffffffff8c0366ce>] arch_cpu_idle+0xe/0xb0
[71725.449697] [<ffffffff8c0f5dba>] cpu_startup_entry+0x14a/0x1e0
[71725.450836] [<ffffffff8c057187>] start_secondary+0x1f7/0x270
[71725.451943] [<ffffffff8c0000d5>] start_cpu+0x5/0x14
[71725.453013] Code: 00 00 55 48 89 e5 41 57 41 56 49 89 f6 41 55 41 89 d5 41 54 49 89 fc 48 c7 c7 50 ea 21 8d 49 81 c4 08 03 00 00 53 e8 92 0b 17 00 <49> 8b 1c 24 49 89 c7 4c 39 e3 75 0e eb 3d 0f 1f 40 00 48 8b 1b
[71725.455256] RIP [<ffffffff8c5aaa8e>] iommu_flush_dev_iotlb+0x2e/0xa0
TagsNo tags attached.
abrt_hash
URL

Activities

TrevorH

TrevorH

2018-11-02 12:32

manager   ~0033049

You are at least 2 kernel versions backlevel and `rpm -q --changelog kernel-3.10.0-862.14.4.el7.x86_64 | less` shows 447 lines in the changelog between yours and $current and among those fixes listed is one that says:

- [iommu] amd: Add NULL sanity check for struct irq_2_irte.ir_data (Suravee Suthikulpanit) [1600661 1542697]

I can see that yours is an Intel machine not AMD but...

There is also at least one fix to the i40e driver since your current kernel and that appears quite liberally in your stack trace.
r0mr0m

r0mr0m

2018-11-02 12:37

reporter   ~0033050

Sure, but on another stuck trace I get bnx2x_rx_int part of the traceback, so seems there is nothing specific to i40.
r0mr0m

r0mr0m

2018-11-07 21:52

reporter   ~0033091

@TrevorH any ideas? I have the vmcore if it can help.

Issue History

Date Modified Username Field Change
2018-11-02 12:14 r0mr0m New Issue
2018-11-02 12:32 TrevorH Note Added: 0033049
2018-11-02 12:37 r0mr0m Note Added: 0033050
2018-11-07 21:52 r0mr0m Note Added: 0033091