View Issue Details

IDProjectCategoryView StatusLast Update
0007645CentOS-7CentOS-7-Pluspublic2016-11-21 18:08
Reporterjimj 
PrioritynormalSeveritycrashReproducibilityalways
Status resolvedResolutionfixed 
Platformx86_64OSCentOSOS Version7
Product Version7.0-1406 
Target VersionFixed in Version 
Summary0007645: Backport patch into the kernel-plus kernel to fix crash after AMD CPU suspend/resume
DescriptionMy CentOS 7 AMD based desktop becomes unstable (gnome crashes) after I suspend or hibernate. I believe this patch will resolve the problem:
http://marc.info/?l=linux-kernel&m=138979791121554

This appears to be fixed in kernel 3.10.28 per this change log (search for "Fix waking up from S3 for AMD family 10h"):
https://www.kernel.org/pub/linux/kernel/v3.0/ChangeLog-3.10.28

Since this isn't a security patch I assume that RedHat isn't going to backport it into their kernel. Any chance this patch could be included in the kernel-plus kernel?
Steps To ReproduceSuspend and then resume desktop
Tagskernel
abrt_hash
URL

Activities

toracat

toracat

2014-09-28 21:08

manager   ~0021015

Sure, as far as the patch cleanly applies. Let me check on that.
toracat

toracat

2014-09-28 21:14

manager  

7645.patch (4,017 bytes)
commit bee09ed91cacdbffdbcd3b05de8409c77ec9fcd6
Author: Robert Richter <rric@kernel.org>
Date:   Wed Jan 15 15:57:29 2014 +0100

    perf/x86/amd/ibs: Fix waking up from S3 for AMD family 10h
    
    On AMD family 10h we see following error messages while waking up from
    S3 for all non-boot CPUs leading to a failed IBS initialization:
    
     Enabling non-boot CPUs ...
     smpboot: Booting Node 0 Processor 1 APIC 0x1
     [Firmware Bug]: cpu 1, try to use APIC500 (LVT offset 0) for vector 0x400, but the register is already in use for vector 0xf9 on another cpu
     perf: IBS APIC setup failed on cpu #1
     process: Switch to broadcast mode on CPU1
     CPU1 is up
     ...
     ACPI: Waking up from system sleep state S3
    
    Reason for this is that during suspend the LVT offset for the IBS
    vector gets lost and needs to be reinialized while resuming.
    
    The offset is read from the IBSCTL msr. On family 10h the offset needs
    to be 1 as offset 0 is used for the MCE threshold interrupt, but
    firmware assings it for IBS to 0 too. The kernel needs to reprogram
    the vector. The msr is a readonly node msr, but a new value can be
    written via pci config space access. The reinitialization is
    implemented for family 10h in setup_ibs_ctl() which is forced during
    IBS setup.
    
    This patch fixes IBS setup after waking up from S3 by adding
    resume/supend hooks for the boot cpu which does the offset
    reinitialization.
    
    Marking it as stable to let distros pick up this fix.
    
    Signed-off-by: Robert Richter <rric@kernel.org>
    Signed-off-by: Peter Zijlstra <peterz@infradead.org>
    Cc: <stable@vger.kernel.org> v3.2..
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Link: http://lkml.kernel.org/r/1389797849-5565-1-git-send-email-rric.net@gmail.com
    Signed-off-by: Ingo Molnar <mingo@kernel.org>

diff --git a/arch/x86/kernel/cpu/perf_event_amd_ibs.c b/arch/x86/kernel/cpu/perf_event_amd_ibs.c
index e09f0bf..4b8e4d3 100644
--- a/arch/x86/kernel/cpu/perf_event_amd_ibs.c
+++ b/arch/x86/kernel/cpu/perf_event_amd_ibs.c
@@ -10,6 +10,7 @@
 #include <linux/module.h>
 #include <linux/pci.h>
 #include <linux/ptrace.h>
+#include <linux/syscore_ops.h>
 
 #include <asm/apic.h>
 
@@ -816,6 +817,18 @@ out:
 	return ret;
 }
 
+static void ibs_eilvt_setup(void)
+{
+	/*
+	 * Force LVT offset assignment for family 10h: The offsets are
+	 * not assigned by the BIOS for this family, so the OS is
+	 * responsible for doing it. If the OS assignment fails, fall
+	 * back to BIOS settings and try to setup this.
+	 */
+	if (boot_cpu_data.x86 == 0x10)
+		force_ibs_eilvt_setup();
+}
+
 static inline int get_ibs_lvt_offset(void)
 {
 	u64 val;
@@ -851,6 +864,36 @@ static void clear_APIC_ibs(void *dummy)
 		setup_APIC_eilvt(offset, 0, APIC_EILVT_MSG_FIX, 1);
 }
 
+#ifdef CONFIG_PM
+
+static int perf_ibs_suspend(void)
+{
+	clear_APIC_ibs(NULL);
+	return 0;
+}
+
+static void perf_ibs_resume(void)
+{
+	ibs_eilvt_setup();
+	setup_APIC_ibs(NULL);
+}
+
+static struct syscore_ops perf_ibs_syscore_ops = {
+	.resume		= perf_ibs_resume,
+	.suspend	= perf_ibs_suspend,
+};
+
+static void perf_ibs_pm_init(void)
+{
+	register_syscore_ops(&perf_ibs_syscore_ops);
+}
+
+#else
+
+static inline void perf_ibs_pm_init(void) { }
+
+#endif
+
 static int
 perf_ibs_cpu_notifier(struct notifier_block *self, unsigned long action, void *hcpu)
 {
@@ -877,18 +920,12 @@ static __init int amd_ibs_init(void)
 	if (!caps)
 		return -ENODEV;	/* ibs not supported by the cpu */
 
-	/*
-	 * Force LVT offset assignment for family 10h: The offsets are
-	 * not assigned by the BIOS for this family, so the OS is
-	 * responsible for doing it. If the OS assignment fails, fall
-	 * back to BIOS settings and try to setup this.
-	 */
-	if (boot_cpu_data.x86 == 0x10)
-		force_ibs_eilvt_setup();
+	ibs_eilvt_setup();
 
 	if (!ibs_eilvt_valid())
 		goto out;
 
+	perf_ibs_pm_init();
 	get_online_cpus();
 	ibs_caps = caps;
 	/* make ibs_caps visible to other cpus: */
7645.patch (4,017 bytes)
toracat

toracat

2014-09-28 21:16

manager   ~0021016

The patch (commit bee09ed91cacdbffdbcd3b05de8409c77ec9fcd6) uploaded. Will be applied to the plus kernel in the next update.
jimj

jimj

2014-09-29 01:14

reporter   ~0021018

Wow, thanks for the amazingly fast response!
jimj

jimj

2014-10-31 05:03

reporter   ~0021482

I'm still seeing the same symptoms with kernel 3.10.0-123.9.2.el7.centos.plus.x86_64. I assume this patch didn't make it into 9.2 kernel update?
toracat

toracat

2014-10-31 07:08

manager   ~0021485

My apologies. It was supposed to be added in the last update, 3.10.0-123.9.2.el7.centos.plus, but missed. This now has to wait for the next update.
jimj

jimj

2014-11-01 03:22

reporter   ~0021498

Thanks for the update. I'll try it again with the next release.
toracat

toracat

2014-11-06 21:34

manager   ~0021573

kernel update 3.10.0-123.9.3.el7 is out. kernel-plus now has the patch from this bug report and will be released shortly.
jimj

jimj

2014-11-08 04:24

reporter   ~0021612

3.10.0-123.9.3.el7.centos.plus.x86_64 has resolved this problem, thanks! Now I only have one AMD resume issue left:
http://bugs.centos.org/view.php?id=7852
toracat

toracat

2014-11-08 05:27

manager   ~0021615

Thanks for reporting back. Now closing as resolved.
toracat

toracat

2016-11-21 18:08

manager   ~0027957

Just a short note to add that the patch applied to the plus kernel is now in the 7.3 distro kernel.

Issue History

Date Modified Username Field Change
2014-09-28 20:32 jimj New Issue
2014-09-28 21:08 toracat Note Added: 0021015
2014-09-28 21:14 toracat File Added: 7645.patch
2014-09-28 21:16 toracat Note Added: 0021016
2014-09-28 21:16 toracat Status new => assigned
2014-09-29 01:14 jimj Note Added: 0021018
2014-10-31 05:03 jimj Note Added: 0021482
2014-10-31 07:08 toracat Note Added: 0021485
2014-11-01 03:22 jimj Note Added: 0021498
2014-11-06 21:34 toracat Note Added: 0021573
2014-11-08 04:24 jimj Note Added: 0021612
2014-11-08 04:28 jimj Tag Attached: kernel
2014-11-08 05:27 toracat Note Added: 0021615
2014-11-08 05:27 toracat Status assigned => resolved
2014-11-08 05:27 toracat Resolution open => fixed
2016-11-21 18:08 toracat Note Added: 0027957