View Issue Details

IDProjectCategoryView StatusLast Update
0014413CentOS-7kernelpublic2018-07-14 18:25
Reporteragoodm 
PrioritynormalSeveritymajorReproducibilityrandom
Status assignedResolutionopen 
PlatformIntel X86-64OSCentOSOS Version7.4
Product Version 
Target VersionFixed in Version 
Summary0014413: Kernel bug 109581
DescriptionSeemingly randomly extremely large amount of Kernel error messages are generated which overwhelms CPU resource in affected boxes. Seems to be hardware dependent.

Can be worked around by not using FQ_Codel. Kernel errors very similar to Kernel Bug 109581 where fix was apparently backported by Redhat to kernel-3.10.0-774.el7.

Please could someone provide the source code from devel branch for this or newer kernel, or alternatively compile me a kernel to test?
Steps To Reproduce1. Attach Qdisc/filter/fq_codel to physical interface:

tc qdisc add dev eth1 root handle 1:0 estimator 1sec 4sec hfsc default 14
tc class add dev eth1 parent 1:0 classid 1:1 hfsc sc rate 98mbit
tc class add dev eth1 parent 1:1 classid 1:11 estimator 1sec 4sec hfsc sc m1 25kbit d 20ms m2 100kbit
tc class add dev eth1 parent 1:1 classid 1:14 estimator 1sec 4sec hfsc sc rate 98mbit

tc qdisc add dev eth1 parent 1:11 handle 1011: fq_codel noecn
tc filter add dev eth1 parent 1011: protocol all handle 11 flow hash keys src,nfct-src divisor 1024

tc filter add dev eth1 parent 1:0 protocol all handle 11 fw flowid 1:11

(above is a rough example)

2. Send some traffic through from the box
Additional InformationJan 20 19:02:20 gateway-accommagency kernel: WARNING: CPU: 2 PID: 18 at net/sched/sch_hfsc.c:1426 hfsc_dequeue+0x332/0x350 [sch_hfsc]
Jan 20 19:02:20 gateway-accommagency kernel: Modules linked in: cls_fw cls_flow sch_fq_codel sch_hfsc bsd_comp ppp_synctty ppp_async crc_ccitt ppp_generic slhc xt_CHECKSUM ipt_REJECT nf_reject_ipv4 tun eb
table_filter ebtables ip6table_filter ip6_tables bridge 8021q garp mrp stp llc ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat xt_mark iptable_mangle iptable_security iptable_raw nf_l
og_ipv4 nf_log_common xt_LOG xt_mac xt_pkttype xt_length ts_bm xt_connbytes nf_conntrack_ipv4 nf_defrag_ipv4 xt_recent xt_string xt_conntrack nf_conntrack libcrc32c iptable_filter w83627ehf hwmon_vid dm_m
irror dm_region_hash dm_log dm_mod intel_powerclamp coretemp snd_hda_codec_hdmi snd_hda_codec_realtek intel_rapl snd_hda_codec_generic kvm_intel kvm irqbypass crc32_pclmul snd_hda_intel snd_intel_sst_acpi
 snd_intel_sst_core ghash_clmulni_intel
Jan 20 19:02:20 gateway-accommagency kernel: snd_soc_sst_mfld_platform cryptd snd_hda_codec snd_soc_sst_match snd_soc_core snd_hda_core snd_compress snd_hwdep snd_seq snd_seq_device sg pcspkr snd_pcm i2c_
i801 shpchp snd_timer snd iosf_mbi soundcore nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 mbcache jbd2 raid1 sd_mod crc_t10dif crct10dif_generic i915 drm_kms_helper ahci syscopyarea libahci
sysfillrect sysimgblt fb_sys_fops crct10dif_pclmul crct10dif_common igb libata drm crc32c_intel ptp pps_core dca i2c_algo_bit serio_raw i2c_core video
Jan 20 19:02:20 gateway-accommagency kernel: CPU: 2 PID: 18 Comm: ksoftirqd/2 Tainted: G W ------------ 3.10.0-693.el7.x86_64 #1
Jan 20 19:02:20 gateway-accommagency kernel: Hardware name: Supermicro X10SBA/X10SBA, BIOS 1.2b 10/17/2016
Jan 20 19:02:20 gateway-accommagency kernel: 0000000000000000 00000000a28adacc ffff88027fd03dc8 ffffffff816a3d91
Jan 20 19:02:20 gateway-accommagency kernel: ffff88027fd03e08 ffffffff810879c8 0000059287d08457 ffff880268302148
Jan 20 19:02:20 gateway-accommagency kernel: 0000000287d0908e ffff880268302000 ffff880268302490 ffff880268302000
Jan 20 19:02:20 gateway-accommagency kernel: Call Trace:
Jan 20 19:02:20 gateway-accommagency kernel: <IRQ> [<ffffffff816a3d91>] dump_stack+0x19/0x1b
Jan 20 19:02:20 gateway-accommagency kernel: [<ffffffff810879c8>] __warn+0xd8/0x100
Jan 20 19:02:20 gateway-accommagency kernel: [<ffffffff81087b0d>] warn_slowpath_null+0x1d/0x20
Jan 20 19:02:20 gateway-accommagency kernel: [<ffffffffc08607d2>] hfsc_dequeue+0x332/0x350 [sch_hfsc]
Jan 20 19:02:20 gateway-accommagency kernel: [<ffffffff815af677>] __qdisc_run+0x47/0x1b0
Jan 20 19:02:20 gateway-accommagency kernel: [<ffffffff81585c68>] net_tx_action+0x1c8/0x230
Jan 20 19:02:20 gateway-accommagency kernel: [<ffffffff81090b3f>] __do_softirq+0xef/0x280
Jan 20 19:02:20 gateway-accommagency kernel: [<ffffffff816b6a5c>] call_softirq+0x1c/0x30
Jan 20 19:02:20 gateway-accommagency kernel: [<ffffffff8102d3c5>] do_softirq+0x65/0xa0
Jan 20 19:02:20 gateway-accommagency kernel: [<ffffffff81090ec5>] irq_exit+0x105/0x110
Jan 20 19:02:20 gateway-accommagency kernel: [<ffffffff816b76c2>] smp_apic_timer_interrupt+0x42/0x50
Jan 20 19:02:20 gateway-accommagency kernel: [<ffffffff816b5c1d>] apic_timer_interrupt+0x6d/0x80
Jan 20 19:02:20 gateway-accommagency kernel: <EOI> [<ffffffff816a94c9>] ? schedule+0x29/0x70
Jan 20 19:02:20 gateway-accommagency kernel: [<ffffffff816aba39>] ? _raw_spin_lock_irq+0x9/0x30
Jan 20 19:02:20 gateway-accommagency kernel: [<ffffffff816a8c8d>] __schedule+0x9d/0x8b0
Jan 20 19:02:20 gateway-accommagency kernel: [<ffffffff816a94c9>] schedule+0x29/0x70
Jan 20 19:02:20 gateway-accommagency kernel: [<ffffffff810b9045>] smpboot_thread_fn+0xd5/0x180
Jan 20 19:02:20 gateway-accommagency kernel: [<ffffffff810b8f70>] ? lg_double_unlock+0x40/0x40
Jan 20 19:02:20 gateway-accommagency kernel: [<ffffffff810b098f>] kthread+0xcf/0xe0
Jan 20 19:02:20 gateway-accommagency kernel: [<ffffffff810b08c0>] ? insert_kthread_work+0x40/0x40
Jan 20 19:02:20 gateway-accommagency kernel: [<ffffffff816b4f18>] ret_from_fork+0x58/0x90
Jan 20 19:02:20 gateway-accommagency kernel: [<ffffffff810b08c0>] ? insert_kthread_work+0x40/0x40
Jan 20 19:02:20 gateway-accommagency kernel: ---[ end trace 4a0b94612ca1f2f1 ]---
TagsNo tags attached.
abrt_hash
URL

Activities

TrevorH

TrevorH

2018-01-25 18:58

manager   ~0031043

Any kernel greater than 3.10.0-693* is an internal RH only kernel. They just released 7.5 beta upstream and that has an -830 kernel so the one you're after will appear when 7.5 goes GA in 2 - 4 months time (based on previous releases, no guarantees).
agoodm

agoodm

2018-01-25 21:40

reporter   ~0031046

What method should I be using to complete the following tasks then:

Update 7.3 with current software updates (I am holding all of the dozens of systems at 7.3 at present).

Fix the systems that I have updated to 7.4 so that I can use HFSC+FQ_Codel without creating random outages for my clients?

My product works with HFSC+PFIFO or HFSC+BFIFO but with noticeable performance degradation.
TrevorH

TrevorH

2018-01-25 21:45

manager   ~0031047

CentOS cannot release that kernel because we do not have it. Only RH have that kernel and you have to be part of RH to get it or be a customer with a support entitlement so that you can open a paid issue to get access to it.

You have the choices of building your own kernel from the latest available kernel source RPM available on vault.centos.org and incorporating the patch(es) that RH applied to it - not easy especially if you don't know what patch it was. Or you can buy a RHEL support entitlement and reproduce the issue on RHEL and raise a support ticket and get the kernel that way or you get to do what the rest of us do and wait for 7.5 to drop and be rebuilt.
agoodm

agoodm

2018-07-12 14:44

reporter   ~0032233

Upstream claims that this was fixed in 3.10.0-693 however the fault is still occurring for me with 3.10.0-862.6.3.el7.x86_64

I did compile my own kernel modules for some older versions which does resolve my issue, but this is very awkward / long winded since I havnt worked out how to compile just the module I need - I end up compiling the entire kernel which is taking some time even with my most powerful machines.
toracat

toracat

2018-07-12 19:00

manager   ~0032234

If you must build your own kernel module, then follow the instructions in this CentOS wiki article:

https://wiki.centos.org/HowTos/BuildingKernelModules
toracat

toracat

2018-07-13 04:42

manager   ~0032236

@agoodm

Assuming the patch that fixes the issue is this one:

https://patchwork.ozlabs.org/patch/803885/

I have built a test centosplus kernel that has the above patch applied:

https://people.centos.org/toracat/kernel/7/plus/bug14413/

Can you test to see if this kernel resolves the issue?
toracat

toracat

2018-07-14 11:59

manager   ~0032256

Kernel bug 109581 ( https://bugzilla.kernel.org/show_bug.cgi?id=109581 ) has just been updated with a new piece of info:

================================
This warning is resolved by:

commit 35b42da69e35536da603a50e40aa6c41b2f7b0f8
Author: Cong Wang <xiyou.wangcong@gmail.com>
Date: Fri Jun 22 14:33:16 2018 -0700

    net_sched: remove a bogus warning in hfsc
================================

I will try rebuilding the plus kernel with the patch above.
toracat

toracat

2018-07-14 18:25

manager   ~0032257

@agoodm

A new set of the kernel-plus package is now in:

https://people.centos.org/toracat/kernel/7/plus/bug14413v3/
(kernel-plus-3.10.0-862.6.3.bug14413_3.el7)

This has the patch from comment 32256 above.

Issue History

Date Modified Username Field Change
2018-01-25 18:34 agoodm New Issue
2018-01-25 18:58 TrevorH Note Added: 0031043
2018-01-25 21:40 agoodm Note Added: 0031046
2018-01-25 21:45 TrevorH Note Added: 0031047
2018-07-12 14:44 agoodm Note Added: 0032233
2018-07-12 19:00 toracat Note Added: 0032234
2018-07-13 04:42 toracat Note Added: 0032236
2018-07-13 04:43 toracat Status new => feedback
2018-07-14 11:59 toracat Note Added: 0032256
2018-07-14 11:59 toracat Status feedback => assigned
2018-07-14 18:25 toracat Note Added: 0032257