View Issue Details

IDProjectCategoryView StatusLast Update
0014779CentOS-7kernelpublic2018-10-30 21:54
Reporternewton 
PrioritynormalSeveritycrashReproducibilityalways
Status resolvedResolutionfixed 
PlatformQ1900B-ITXOSCentOSOS Version7.5.1804
Product Version7.5.1804 
Target VersionFixed in Version 
Summary0014779: BUG: unable to handle kernel NULL pointer dereference at (null) in snd-hdmi-lpe-audio
DescriptionSystem works without issues up to and including kernel 3.10.0-693.21.1.el7.x86_64, crash happens with 3.10.0-862.el7.x86_64 and 3.10.0-862.2.3.el7.x86_64.

Sorry, this is a root server in a data center, I only had network kvm available (and even that was a good will action by the provider), so I only can provide a screenshot of the oops. Hope, that helps.
Steps To ReproduceBoot 3.10.0-862.el7.x86_64 or 3.10.0-862.2.3.el7.x86_64 on affected hardware.
Tagskerneloops
abrt_hash
URL

Activities

newton

newton

2018-05-12 10:08

reporter  

mikerotec

mikerotec

2018-05-15 23:26

reporter   ~0031835

Kernel update to 3.10.0-862.2.3.el7 killed my HP Proliant G4... error indicates that the BIOS has corrupted the boot something or other. (did not get that recorded)

I switched to the backup BIOS, and booted from previous kernel ( all is fine again running kernel 3.10.0-693.21.1.el7 )
mikerotec

mikerotec

2018-05-22 20:54

reporter   ~0031893

Exact same issue with kernel-3.10.0-862.3.2.el7.x86_64

( i had to roll back to kernel 3.10.0-693.21.1.el7 )
tomkep

tomkep

2018-05-24 17:08

reporter   ~0031912

Same here. I was able to workaround and boot by adding:

blacklist snd-soc-hdac-hdmi
blacklist snd-hdmi-lpe-audio
blacklist snd-hda-codec-hdmi

to /etc/modprobe.d/snd.conf (the second line likely making a difference, I din't have time to narrow it down).

After adding the above lines and booting to new kernel you can trigger the crash by:

modprobe snd-hdmi-lpe-audio

and get the crashdump.

The end of vmcore-dmesg.txt has:

[ 27.822264] traps: addconn[2542] trap stack segment ip:7f9de9e7262c sp:7ffc17807c10 error:0 in libc-2.17.so[7f9de9df2000+1c3000]
[ 67.421336] input: Intel HDMI/DP LPE Audio HDMI/DP,pcm=0 as /devices/pci0000:00/0000:00:02.0/hdmi-lpe-audio/sound/card0/input6
[ 67.426297] input: Intel HDMI/DP LPE Audio HDMI/DP,pcm=1 as /devices/pci0000:00/0000:00:02.0/hdmi-lpe-audio/sound/card0/input7
[ 67.434041] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 67.438513] IP: [<ffffffffa03686ab>] __list_add+0x1b/0xc0
[ 67.442870] PGD 800000003f438067 PUD b30f1067 PMD 0
[ 67.447251] Oops: 0000 [#1] SMP
[ 67.451589] Modules linked in: snd_hdmi_lpe_audio snd_hda_codec_hdmi snd_hda_codec snd_hda_core snd_hwdep drbg ansi_cprng rmd160 crypto_null ip_vti af_key ah6 ah4 esp6 esp4 xfrm4_mode_beet xfrm4_tunnel xfrm4_mode_tunnel xfrm4_mode_transport xfrm6_mode_transport xfrm6_mode_ro xfrm6_mode_beet xfrm6_mode_tunnel ipcomp ipcomp6 xfrm6_tunnel tunnel6 xfrm_ipcomp cmac camellia_generic camellia_x86_64 nf_log_ipv4 nf_log_common xt_LOG ip_set_hash_ip cast6_generic cast5_generic cast_common deflate cts gcm ccm serpent_sse2_x86_64 serpent_generic blowfish_generic blowfish_x86_64 blowfish_common twofish_generic twofish_x86_64_3way xts twofish_x86_64 twofish_common xcbc sha512_ssse3 sha512_generic mcryptd des_generic lrw gf128mul glue_helper ablk_helper tun ip_gre gre 8021q garp mrp stp llc bonding nf_conntrack_ipv6
[ 67.467198] nf_defrag_ipv6 sit tunnel4 ip_tunnel ip6table_filter xt_TCPMSS xt_set ip6table_mangle ip6_tables nf_nat_sip nf_conntrack_sip nf_nat_irc nf_conntrack_irc nf_nat_ftp nf_conntrack_ftp xt_REDIRECT nf_nat_redirect xt_nat iptable_nat nf_nat_ipv4 nf_nat xt_comment nf_conntrack_ipv4 nf_defrag_ipv4 xt_multiport xt_conntrack nf_conntrack libcrc32c iptable_filter ip_set_hash_netiface ip_set_hash_netport ip_set nfnetlink vfat fat intel_powerclamp coretemp intel_rapl kvm_intel kvm joydev iTCO_wdt ppdev irqbypass iTCO_vendor_support snd_soc_rt5670 crc32_pclmul snd_soc_rt5645 snd_intel_sst_acpi ghash_clmulni_intel snd_intel_sst_core snd_soc_rt5640 cryptd snd_soc_sst_atom_hifi2_platform snd_soc_rl6231 snd_soc_sst_match snd_soc_core sg hid_logitech_dj snd_compress pcspkr snd_pcm lpc_ich shpchp i2c_i801
[ 67.484440] parport_pc snd_timer parport snd soundcore regmap_i2c i2c_designware_platform i2c_designware_core pwm_lpss auth_rpcgss sunrpc ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic i915 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ahci drm libahci e1000e libata crct10dif_pclmul crct10dif_common crc32c_intel serio_raw ptp pps_core sdhci_acpi sdhci mmc_core video i2c_hid i2c_core iosf_mbi dm_mirror dm_region_hash dm_log dm_mod
[ 67.503184] CPU: 2 PID: 59 Comm: kworker/2:1 Kdump: loaded Not tainted 3.10.0-862.3.2.el7.x86_64 #1
[ 67.509635] Hardware name: To be filled by O.E.M. To be filled by O.E.M./Aptio CRB, BIOS 5.6.5 07/09/2016
[ 67.516138] Workqueue: events had_audio_wq [snd_hdmi_lpe_audio]
[ 67.522634] task: ffff947034920fd0 ti: ffff947034954000 task.ti: ffff947034954000
[ 67.529167] RIP: 0010:[<ffffffffa03686ab>] [<ffffffffa03686ab>] __list_add+0x1b/0xc0
[ 67.535739] RSP: 0018:ffff947034957d48 EFLAGS: 00010246
[ 67.542273] RAX: 00000000ffffffff RBX: ffff947034957d70 RCX: 0000000000000000
[ 67.548831] RDX: ffff9470316fc908 RSI: 0000000000000000 RDI: ffff947034957d70
[ 67.555265] RBP: ffff947034957d60 R08: 0000000000000000 R09: ae3eceb10defc8e0
[ 67.561484] R10: ae3eceb10defc8e0 R11: 0000000000000001 R12: ffff9470316fc908
[ 67.567577] R13: 0000000000000000 R14: 00000000ffffffff R15: ffff9470316fc908
[ 67.573620] FS: 0000000000000000(0000) GS:ffff94703fd00000(0000) knlGS:0000000000000000
[ 67.579746] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 67.585865] CR2: 0000000000000000 CR3: 00000000b304a000 CR4: 00000000001007e0
[ 67.592060] Call Trace:
[ 67.598274] [<ffffffffa0712c36>] __mutex_lock_slowpath+0xa6/0x1d0
[ 67.604598] [<ffffffffa071203f>] mutex_lock+0x1f/0x2f
[ 67.610936] [<ffffffffc0d1736c>] had_audio_wq+0x5c/0x738 [snd_hdmi_lpe_audio]
[ 67.617343] [<ffffffffa00b312f>] process_one_work+0x17f/0x440
[ 67.623791] [<ffffffffa00b3df6>] worker_thread+0x126/0x3c0
[ 67.630276] [<ffffffffa00b3cd0>] ? manage_workers.isra.24+0x2a0/0x2a0
[ 67.636801] [<ffffffffa00bb161>] kthread+0xd1/0xe0
[ 67.643339] [<ffffffffa00bb090>] ? insert_kthread_work+0x40/0x40
[ 67.649904] [<ffffffffa0720677>] ret_from_fork_nospec_begin+0x21/0x21
[ 67.656520] [<ffffffffa00bb090>] ? insert_kthread_work+0x40/0x40
[ 67.663001] Code: ff ff ff 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 55 48 89 e5 41 55 49 89 f5 41 54 49 89 d4 53 4c 8b 42 08 48 89 fb 49 39 f0 75 2a <4d> 8b 45 00 4d 39 c4 75 68 4c 39 e3 74 3e 4c 39 eb 74 39 49 89
[ 67.670173] RIP [<ffffffffa03686ab>] __list_add+0x1b/0xc0
[ 67.676876] RSP <ffff947034957d48>
[ 67.683552] CR2: 0000000000000000

I have the crashdump and can send it for analysis to specific person (preferably encrypted) but I'd rather not post it for public viewing...
mikerotec

mikerotec

2018-06-04 20:39

reporter   ~0032006

and... after recent Microcode update, now have random crashes with 3.10.0-693.21.1.el7 as well.

Rolled back to "emergency kernel" 3.10.0.327 ...
jsmith

jsmith

2018-06-06 16:14

reporter   ~0032023

I'm seeing this as well.
toracat

toracat

2018-06-10 15:58

manager   ~0032050

Can someone do a test-install of ELRepo's kernel-ml [1] ? The current version is kernel-ml-4.17.0-1.el7.elrepo. This is to find out if the issue reported in this tracker has been fixed in the latest mainline kernel.

[1] https://elrepo.org/tiki/kernel-ml
newton

newton

2018-06-11 23:28

reporter   ~0032058

I'm currently running 3.10.0-862.3.2.el7.x86_64 and I have several snd modules blacklisted. When I "modprobe snd-hdmi-lpe-audio" with this kernel the machine crashes.

However, the kernel-el kernel 4.17.0-1 does not have the module snd-hdmi-lpe-audio and if I boot that kernel without any blacklisted snd modules no snd modules get autoloaded. So, I doubt this kernel has the relevant options enabled which, of course, would render this test useless.
toracat

toracat

2018-06-12 00:39

manager   ~0032059

Hmm, you are right. kernel-ml does not have this module. Will see if it can be enabled.
toracat

toracat

2018-06-12 04:59

manager   ~0032060

I have rebuilt kernel-ml with CONFIG_HDMI_LPE_AUDIO=m (kernel-ml-4.17.0-1.ay1.el7.x86_64.rpm) and uploaded it to:

http://elrepo.org/people/akemi/testing/el7/kernel/

Please note that the packages are not signed.
madhatta

madhatta

2018-06-12 07:52

reporter   ~0032062

I can confirm that I, too, am having this problem, with 3.10.0-862.3.2.el7.x86_64, following two fresh installs on new hardware yesterday. I'm sorry all the evidence I have is a photo. Going back to the install kernel (3.10.0-693.el7.x86_64) is a workaround.

I have installed kernel-ml-4.17.0-1.ay1.el7.x86_64.rpm as requested and can confirm the issue is not present there, at least for me.

IMG_4136.JPG (1,168,486 bytes)
toracat

toracat

2018-06-12 14:40

manager   ~0032067

@madhatta

Thanks for reporting the result with kernel-ml. Good to know that the fix is in the latest mainline kernel. Hopefully we can identify the patch that takes care of the current problem.
toracat

toracat

2018-06-13 06:00

manager   ~0032074

The following upstream patch (commit c77a6edb6d4d35204673cad7389c317bfb17492e ) is a likely candidate that fixes the issue:

https://patchwork.kernel.org/patch/10246971/

I built a kernel-plus package using the above patch ( kernel-plus-3.10.0-862.3.2.el7.bug14779.centos.plus.x86_64.rpm ). It is available for testing:

https://people.centos.org/toracat/kernel/7/plus/bug14779/
madhatta

madhatta

2018-06-13 06:58

reporter   ~0032075

Unfortunately, the decision was taken to put C6 on the hardware I was working on, for delivery-date reasons. I am expecting more of the same hardware shortly, and will attempt to continue testing and feedback at that time.
newton

newton

2018-06-13 07:19

reporter   ~0032076

[root@tux ~]# uname -a
Linux tux.leun.net 3.10.0-862.3.2.el7.bug14779.centos.plus.x86_64 #1 SMP Tue Jun 12 22:32:06 PDT 2018 x86_64 x86_64 x86_64 GNU/Linux
[root@tux ~]# modprobe snd-hdmi-lpe-audio

System survived that - crashes without the fix. So, yup, can confirm. Thanks.
toracat

toracat

2018-06-13 16:16

manager   ~0032079

@newton

Thanks for the test result and the confirmation that the patch worked. The next official update to kernel-plus will include the patch.

Now, the next step is to file a bug report with Red Hat at http://bugzilla.redhat.com to get this patch into the RHEL kernel. Then the CentOS kernel will inherit it.
toracat

toracat

2018-06-16 05:50

manager   ~0032099

kernel-plus-3.10.0-862.3.3.el7.centos.plus has been released. It has the patch from this bug report.
mikerotec

mikerotec

2018-06-18 22:59

reporter   ~0032108

Well, this didn't solve it for me. (latest 3.10.0-862.3.3.el7 kernel crashes my system still). I guess I'll have to figure out how to get the crashlog and make a new bug report...
TrevorH

TrevorH

2018-06-18 23:12

manager   ~0032109

You need the centosplus kernel not the main distro one.
mikerotec

mikerotec

2018-06-19 00:03

reporter   ~0032110

Found and installed centosplus kernel, but it too crashes on boot. My system must have a slightly different bug than the one fixed...

20180618_164727_001.jpg (106,861 bytes)
20180618_164727_001.jpg (106,861 bytes)
01_20180618_164347.jpg (568,344 bytes)
02_20180618_164450.jpg (279,896 bytes)
toracat

toracat

2018-07-05 18:35

manager   ~0032183

@mikerotec

If the provided plus kernel does not fix the crash, your problem is most likely different. You might want to file a new bug report with all the details.
toracat

toracat

2018-08-10 17:23

manager   ~0032461

RHBZ status:

Originally opened as #1598592 (private) but closed as a duplicate of #1551742 (private).
Currently marked "Verified".
toracat

toracat

2018-10-10 15:15

manager   ~0032902

I am planning to build a new test version of kernel-plus that might fix other alsa-related issues.
toracat

toracat

2018-10-13 01:03

manager   ~0032918

The following two patches have been added:

commit 1967158fff819b38f4e46763ca8df067b4b69f59
"ALSA: x86: fix error return code in hdmi_lpe_audio_probe()"

commit 7229b12f5da33d5c376ee264f063703844b8092d
"ALSA: x86: hdmi: Add single_port option for compatible behavior"

A patched set of kernel-plus is available for testing:

https://people.centos.org/toracat/kernel/7/plus/bug14779_2/
(kernel-plus-3.10.0-862.14.4.el7.centos.plus.6)

Feedback appreciated.
toracat

toracat

2018-10-30 21:54

manager   ~0033023

RHEL 7.6 is out. The distro kernel (3.10.0-957.el7) now has all three patches in this bug report.

Closing as 'resolved'.

Issue History

Date Modified Username Field Change
2018-05-12 10:08 newton New Issue
2018-05-12 10:08 newton File Added: Screenshot_20180512_110722a.png
2018-05-12 10:08 newton Tag Attached: kerneloops
2018-05-15 23:26 mikerotec Note Added: 0031835
2018-05-22 20:54 mikerotec Note Added: 0031893
2018-05-24 17:08 tomkep Note Added: 0031912
2018-06-04 20:39 mikerotec Note Added: 0032006
2018-06-06 16:14 jsmith Note Added: 0032023
2018-06-10 15:58 toracat Note Added: 0032050
2018-06-11 22:54 toracat Status new => feedback
2018-06-11 23:28 newton Note Added: 0032058
2018-06-11 23:28 newton Status feedback => assigned
2018-06-12 00:39 toracat Note Added: 0032059
2018-06-12 04:59 toracat Note Added: 0032060
2018-06-12 07:52 madhatta File Added: IMG_4136.JPG
2018-06-12 07:52 madhatta Note Added: 0032062
2018-06-12 14:40 toracat Note Added: 0032067
2018-06-13 06:00 toracat Note Added: 0032074
2018-06-13 06:58 madhatta Note Added: 0032075
2018-06-13 07:19 newton Note Added: 0032076
2018-06-13 16:16 toracat Note Added: 0032079
2018-06-16 05:50 toracat Note Added: 0032099
2018-06-18 22:59 mikerotec Note Added: 0032108
2018-06-18 23:12 TrevorH Note Added: 0032109
2018-06-19 00:03 mikerotec File Added: 20180618_164727_001.jpg
2018-06-19 00:03 mikerotec File Added: 01_20180618_164347.jpg
2018-06-19 00:03 mikerotec File Added: 02_20180618_164450.jpg
2018-06-19 00:03 mikerotec Note Added: 0032110
2018-07-05 18:35 toracat Note Added: 0032183
2018-08-10 17:23 toracat Note Added: 0032461
2018-10-10 15:15 toracat Note Added: 0032902
2018-10-13 01:03 toracat Note Added: 0032918
2018-10-30 21:54 toracat Status assigned => resolved
2018-10-30 21:54 toracat Resolution open => fixed
2018-10-30 21:54 toracat Note Added: 0033023