View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0014413 | CentOS-7 | kernel | public | 2018-01-25 18:34 | 2020-05-12 23:16 |
Reporter | agoodm | Assigned To | |||
Priority | normal | Severity | major | Reproducibility | random |
Status | resolved | Resolution | fixed | ||
Platform | Intel X86-64 | OS | CentOS | OS Version | 7.4 |
Summary | 0014413: Kernel bug 109581 | ||||
Description | Seemingly randomly extremely large amount of Kernel error messages are generated which overwhelms CPU resource in affected boxes. Seems to be hardware dependent. Can be worked around by not using FQ_Codel. Kernel errors very similar to Kernel Bug 109581 where fix was apparently backported by Redhat to kernel-3.10.0-774.el7. Please could someone provide the source code from devel branch for this or newer kernel, or alternatively compile me a kernel to test? | ||||
Steps To Reproduce | 1. Attach Qdisc/filter/fq_codel to physical interface: tc qdisc add dev eth1 root handle 1:0 estimator 1sec 4sec hfsc default 14 tc class add dev eth1 parent 1:0 classid 1:1 hfsc sc rate 98mbit tc class add dev eth1 parent 1:1 classid 1:11 estimator 1sec 4sec hfsc sc m1 25kbit d 20ms m2 100kbit tc class add dev eth1 parent 1:1 classid 1:14 estimator 1sec 4sec hfsc sc rate 98mbit tc qdisc add dev eth1 parent 1:11 handle 1011: fq_codel noecn tc filter add dev eth1 parent 1011: protocol all handle 11 flow hash keys src,nfct-src divisor 1024 tc filter add dev eth1 parent 1:0 protocol all handle 11 fw flowid 1:11 (above is a rough example) 2. Send some traffic through from the box | ||||
Additional Information | Jan 20 19:02:20 gateway-accommagency kernel: WARNING: CPU: 2 PID: 18 at net/sched/sch_hfsc.c:1426 hfsc_dequeue+0x332/0x350 [sch_hfsc] Jan 20 19:02:20 gateway-accommagency kernel: Modules linked in: cls_fw cls_flow sch_fq_codel sch_hfsc bsd_comp ppp_synctty ppp_async crc_ccitt ppp_generic slhc xt_CHECKSUM ipt_REJECT nf_reject_ipv4 tun eb table_filter ebtables ip6table_filter ip6_tables bridge 8021q garp mrp stp llc ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat xt_mark iptable_mangle iptable_security iptable_raw nf_l og_ipv4 nf_log_common xt_LOG xt_mac xt_pkttype xt_length ts_bm xt_connbytes nf_conntrack_ipv4 nf_defrag_ipv4 xt_recent xt_string xt_conntrack nf_conntrack libcrc32c iptable_filter w83627ehf hwmon_vid dm_m irror dm_region_hash dm_log dm_mod intel_powerclamp coretemp snd_hda_codec_hdmi snd_hda_codec_realtek intel_rapl snd_hda_codec_generic kvm_intel kvm irqbypass crc32_pclmul snd_hda_intel snd_intel_sst_acpi snd_intel_sst_core ghash_clmulni_intel Jan 20 19:02:20 gateway-accommagency kernel: snd_soc_sst_mfld_platform cryptd snd_hda_codec snd_soc_sst_match snd_soc_core snd_hda_core snd_compress snd_hwdep snd_seq snd_seq_device sg pcspkr snd_pcm i2c_ i801 shpchp snd_timer snd iosf_mbi soundcore nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 mbcache jbd2 raid1 sd_mod crc_t10dif crct10dif_generic i915 drm_kms_helper ahci syscopyarea libahci sysfillrect sysimgblt fb_sys_fops crct10dif_pclmul crct10dif_common igb libata drm crc32c_intel ptp pps_core dca i2c_algo_bit serio_raw i2c_core video Jan 20 19:02:20 gateway-accommagency kernel: CPU: 2 PID: 18 Comm: ksoftirqd/2 Tainted: G W ------------ 3.10.0-693.el7.x86_64 #1 Jan 20 19:02:20 gateway-accommagency kernel: Hardware name: Supermicro X10SBA/X10SBA, BIOS 1.2b 10/17/2016 Jan 20 19:02:20 gateway-accommagency kernel: 0000000000000000 00000000a28adacc ffff88027fd03dc8 ffffffff816a3d91 Jan 20 19:02:20 gateway-accommagency kernel: ffff88027fd03e08 ffffffff810879c8 0000059287d08457 ffff880268302148 Jan 20 19:02:20 gateway-accommagency kernel: 0000000287d0908e ffff880268302000 ffff880268302490 ffff880268302000 Jan 20 19:02:20 gateway-accommagency kernel: Call Trace: Jan 20 19:02:20 gateway-accommagency kernel: <IRQ> [<ffffffff816a3d91>] dump_stack+0x19/0x1b Jan 20 19:02:20 gateway-accommagency kernel: [<ffffffff810879c8>] __warn+0xd8/0x100 Jan 20 19:02:20 gateway-accommagency kernel: [<ffffffff81087b0d>] warn_slowpath_null+0x1d/0x20 Jan 20 19:02:20 gateway-accommagency kernel: [<ffffffffc08607d2>] hfsc_dequeue+0x332/0x350 [sch_hfsc] Jan 20 19:02:20 gateway-accommagency kernel: [<ffffffff815af677>] __qdisc_run+0x47/0x1b0 Jan 20 19:02:20 gateway-accommagency kernel: [<ffffffff81585c68>] net_tx_action+0x1c8/0x230 Jan 20 19:02:20 gateway-accommagency kernel: [<ffffffff81090b3f>] __do_softirq+0xef/0x280 Jan 20 19:02:20 gateway-accommagency kernel: [<ffffffff816b6a5c>] call_softirq+0x1c/0x30 Jan 20 19:02:20 gateway-accommagency kernel: [<ffffffff8102d3c5>] do_softirq+0x65/0xa0 Jan 20 19:02:20 gateway-accommagency kernel: [<ffffffff81090ec5>] irq_exit+0x105/0x110 Jan 20 19:02:20 gateway-accommagency kernel: [<ffffffff816b76c2>] smp_apic_timer_interrupt+0x42/0x50 Jan 20 19:02:20 gateway-accommagency kernel: [<ffffffff816b5c1d>] apic_timer_interrupt+0x6d/0x80 Jan 20 19:02:20 gateway-accommagency kernel: <EOI> [<ffffffff816a94c9>] ? schedule+0x29/0x70 Jan 20 19:02:20 gateway-accommagency kernel: [<ffffffff816aba39>] ? _raw_spin_lock_irq+0x9/0x30 Jan 20 19:02:20 gateway-accommagency kernel: [<ffffffff816a8c8d>] __schedule+0x9d/0x8b0 Jan 20 19:02:20 gateway-accommagency kernel: [<ffffffff816a94c9>] schedule+0x29/0x70 Jan 20 19:02:20 gateway-accommagency kernel: [<ffffffff810b9045>] smpboot_thread_fn+0xd5/0x180 Jan 20 19:02:20 gateway-accommagency kernel: [<ffffffff810b8f70>] ? lg_double_unlock+0x40/0x40 Jan 20 19:02:20 gateway-accommagency kernel: [<ffffffff810b098f>] kthread+0xcf/0xe0 Jan 20 19:02:20 gateway-accommagency kernel: [<ffffffff810b08c0>] ? insert_kthread_work+0x40/0x40 Jan 20 19:02:20 gateway-accommagency kernel: [<ffffffff816b4f18>] ret_from_fork+0x58/0x90 Jan 20 19:02:20 gateway-accommagency kernel: [<ffffffff810b08c0>] ? insert_kthread_work+0x40/0x40 Jan 20 19:02:20 gateway-accommagency kernel: ---[ end trace 4a0b94612ca1f2f1 ]--- | ||||
Tags | No tags attached. | ||||
abrt_hash | |||||
URL | |||||
Any kernel greater than 3.10.0-693* is an internal RH only kernel. They just released 7.5 beta upstream and that has an -830 kernel so the one you're after will appear when 7.5 goes GA in 2 - 4 months time (based on previous releases, no guarantees). | |
What method should I be using to complete the following tasks then: Update 7.3 with current software updates (I am holding all of the dozens of systems at 7.3 at present). Fix the systems that I have updated to 7.4 so that I can use HFSC+FQ_Codel without creating random outages for my clients? My product works with HFSC+PFIFO or HFSC+BFIFO but with noticeable performance degradation. |
|
CentOS cannot release that kernel because we do not have it. Only RH have that kernel and you have to be part of RH to get it or be a customer with a support entitlement so that you can open a paid issue to get access to it. You have the choices of building your own kernel from the latest available kernel source RPM available on vault.centos.org and incorporating the patch(es) that RH applied to it - not easy especially if you don't know what patch it was. Or you can buy a RHEL support entitlement and reproduce the issue on RHEL and raise a support ticket and get the kernel that way or you get to do what the rest of us do and wait for 7.5 to drop and be rebuilt. |
|
Upstream claims that this was fixed in 3.10.0-693 however the fault is still occurring for me with 3.10.0-862.6.3.el7.x86_64 I did compile my own kernel modules for some older versions which does resolve my issue, but this is very awkward / long winded since I havnt worked out how to compile just the module I need - I end up compiling the entire kernel which is taking some time even with my most powerful machines. |
|
If you must build your own kernel module, then follow the instructions in this CentOS wiki article: https://wiki.centos.org/HowTos/BuildingKernelModules |
|
@agoodm Assuming the patch that fixes the issue is this one: https://patchwork.ozlabs.org/patch/803885/ I have built a test centosplus kernel that has the above patch applied: https://people.centos.org/toracat/kernel/7/plus/bug14413/ Can you test to see if this kernel resolves the issue? |
|
Kernel bug 109581 ( https://bugzilla.kernel.org/show_bug.cgi?id=109581 ) has just been updated with a new piece of info: ================================ This warning is resolved by: commit 35b42da69e35536da603a50e40aa6c41b2f7b0f8 Author: Cong Wang <xiyou.wangcong@gmail.com> Date: Fri Jun 22 14:33:16 2018 -0700 net_sched: remove a bogus warning in hfsc ================================ I will try rebuilding the plus kernel with the patch above. |
|
@agoodm A new set of the kernel-plus package is now in: https://people.centos.org/toracat/kernel/7/plus/bug14413v3/ (kernel-plus-3.10.0-862.6.3.bug14413_3.el7) This has the patch from comment 32256 above. |
|
Just for info, the referenced patch was discussed in this mailing list thread: https://www.mail-archive.com/netdev@vger.kernel.org/msg240430.html |
|
kernel-plus-3.10.0-862.11.6.el7.centos.plus has the patch. | |
This patch (centos-linux-3.10-net_sched_codel-bug14413v3.patch) is no longer needed as of kernel-1127.8.2.el7. | |
Date Modified | Username | Field | Change |
---|---|---|---|
2018-01-25 18:34 | agoodm | New Issue | |
2018-01-25 18:58 | TrevorH | Note Added: 0031043 | |
2018-01-25 21:40 | agoodm | Note Added: 0031046 | |
2018-01-25 21:45 | TrevorH | Note Added: 0031047 | |
2018-07-12 14:44 | agoodm | Note Added: 0032233 | |
2018-07-12 19:00 | toracat | Note Added: 0032234 | |
2018-07-13 04:42 | toracat | Note Added: 0032236 | |
2018-07-13 04:43 | toracat | Status | new => feedback |
2018-07-14 11:59 | toracat | Note Added: 0032256 | |
2018-07-14 11:59 | toracat | Status | feedback => assigned |
2018-07-14 18:25 | toracat | Note Added: 0032257 | |
2018-08-02 16:40 | toracat | Note Added: 0032421 | |
2018-08-16 05:18 | toracat | Note Added: 0032499 | |
2018-08-16 05:21 | toracat | Status | assigned => resolved |
2018-08-16 05:21 | toracat | Resolution | open => fixed |
2020-05-12 23:16 | toracat | Note Added: 0036924 |