View Issue Details

IDProjectCategoryView StatusLast Update
0017195CentOS-8kernel-pluspublic2020-08-16 17:42
Reporterkabe 
PrioritynormalSeverityfeatureReproducibilityalways
Status acknowledgedResolutionopen 
Platformi686OSCentOSOS Version8.1
Product Version8.1.1911 
Target VersionFixed in Version 
Summary0017195: Patches needed for i686 compile of kernel-4.18.0-147.5.1.centos.plus
DescriptionI know there isn't CentOS 8 i686 yet, but I've tried and succeeded to compile and boot an
anaconda installer (with around 290 .src.rpm recompiled).

This is the patches needed to create a working i686 kernel, based on kernel-plus.
Tested on Hyper-V, Fujitsu C601 and ThinkPad R51.
Steps To Reproducerpmbuild --target=i686 -v -bb \
--without debug --without debuginfo --with baseonly \
--without kabidupchk --without kabidwchk --without kabidw_base \
--without bpftool \
SPECS/kernel.spec
Additional InformationToracat, you do not need to act on this report;
this is just and FYI report in case AltArch i686 sig become interested in CentOS 8 i686
(perhaps after 7.x had gone EOL).

I wished koji.mbox.centos.org had made the compiled .rpm results public, just like buildlogs.centos.org did;
otherwise most of the packages didn't need a recompile.
Tagsi386

Relationships

related to 0017674 acknowledgedtoracat Patches needed for i686 compile of kernel-4.18.0-193.6.3.centos.plus 

Activities

kabe

kabe

2020-03-29 06:16

reporter   ~0036599

This is the only patch to make compile succeed, but it doesn't result in a workiing kernel.
More patches follow...

i686-netlink_callback-s64.patch (572 bytes)
Newer kernel deprecates args[6] as union of u8[48],
which assumes args[] is 8 byte width.

diff -up ./include/linux/netlink.h.netlink ./include/linux/netlink.h
--- ./include/linux/netlink.h.netlink	2020-03-07 20:47:11.179127920 +0900
+++ ./include/linux/netlink.h	2020-03-07 20:47:52.349395715 +0900
@@ -189,7 +189,7 @@ struct netlink_callback {
 	u16			family;
 	u16			min_dump_alloc;
 	unsigned int		prev_seq, seq;
-	long			args[6];
+	s64			args[6];
 	RH_KABI_EXTEND(struct netlink_ext_ack *extack)
 	RH_KABI_EXTEND(bool strict_check)
 	RH_KABI_EXTEND(u16 answer_flags)
kabe

kabe

2020-03-29 06:17

reporter   ~0036600

Cherry-picked some patches from upstream...

patch-PROPERTY_ENTRY_STRING.patch (1,551 bytes)
commit c835b4417c18fc2868a38d4689274e3daed5c32b
Author: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Date:   Wed Jan 23 17:44:16 2019 +0300

    device property: Fix the length used in PROPERTY_ENTRY_STRING()
    
    commit 2b6e492467c78183bb629bb0a100ea3509b615a5 upstream.
    
    With string type property entries we need to use
    sizeof(const char *) instead of the number of characters as
    the length of the entry.
    
    If the string was shorter then sizeof(const char *),
    attempts to read it would have failed with -EOVERFLOW. The
    problem has been hidden because all build-in string
    properties have had a string longer then 8 characters until
    now.
    
    Fixes: a85f42047533 ("device property: helper macros for property entry creation")
    Cc: 4.5+ <stable@vger.kernel.org> # 4.5+
    Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
    Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
    Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

diff --git a/include/linux/property.h b/include/linux/property.h
index ac8a1ebc4c1b..1a12364050d8 100644
--- a/include/linux/property.h
+++ b/include/linux/property.h
@@ -258,7 +258,7 @@ struct property_entry {
 #define PROPERTY_ENTRY_STRING(_name_, _val_)		\
 (struct property_entry) {				\
 	.name = _name_,					\
-	.length = sizeof(_val_),			\
+	.length = sizeof(const char *),			\
 	.type = DEV_PROP_STRING,			\
 	{ .value = { .str = _val_ } },			\
 }
kabe

kabe

2020-03-29 06:18

reporter   ~0036601

Cherry-picked some patches from upstream...

patch-sock-sk_stamp.patch (7,500 bytes)
commit 60f05dddf1eb5db3595e011f293eefa37cefae2e
Author: Deepa Dinamani <deepa.kernel@gmail.com>
Date:   Thu Dec 27 18:55:09 2018 -0800

    sock: Make sock->sk_stamp thread-safe
    
    [ Upstream commit 3a0ed3e9619738067214871e9cb826fa23b2ddb9 ]
    
    Al Viro mentioned (Message-ID
    <20170626041334.GZ10672@ZenIV.linux.org.uk>)
    that there is probably a race condition
    lurking in accesses of sk_stamp on 32-bit machines.
    
    sock->sk_stamp is of type ktime_t which is always an s64.
    On a 32 bit architecture, we might run into situations of
    unsafe access as the access to the field becomes non atomic.
    
    Use seqlocks for synchronization.
    This allows us to avoid using spinlocks for readers as
    readers do not need mutual exclusion.
    
    Another approach to solve this is to require sk_lock for all
    modifications of the timestamps. The current approach allows
    for timestamps to have their own lock: sk_stamp_lock.
    This allows for the patch to not compete with already
    existing critical sections, and side effects are limited
    to the paths in the patch.
    
    The addition of the new field maintains the data locality
    optimizations from
    commit 9115e8cd2a0c ("net: reorganize struct sock for better data
    locality")
    
    Note that all the instances of the sk_stamp accesses
    are either through the ioctl or the syscall recvmsg.
    
    Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

(Ported to linux-4.18.0-80.11.2.el8_0.centos.plus)

diff -up ./include/net/sock.h.60f05dddf1eb5 ./include/net/sock.h
--- ./include/net/sock.h.60f05dddf1eb5	2019-09-15 19:14:11.000000000 +0900
+++ ./include/net/sock.h	2020-01-12 20:38:11.884065686 +0900
@@ -300,6 +300,7 @@ struct sock_common {
   *	@sk_filter: socket filtering instructions
   *	@sk_timer: sock cleanup timer
   *	@sk_stamp: time stamp of last packet received
+  *	@sk_stamp_seq: lock for accessing sk_stamp on 32 bit architectures only
   *	@sk_tsflags: SO_TIMESTAMPING socket options
   *	@sk_tskey: counter to disambiguate concurrent tstamp requests
   *	@sk_zckey: counter to order MSG_ZEROCOPY notifications
@@ -476,6 +477,9 @@ struct sock {
 	const struct cred	*sk_peer_cred;
 	long			sk_rcvtimeo;
 	ktime_t			sk_stamp;
+#if BITS_PER_LONG==32
+	seqlock_t		sk_stamp_seq;
+#endif
 	u16			sk_tsflags;
 	u8			sk_shutdown;
 	u32			sk_tskey;
@@ -2321,6 +2325,34 @@ static inline void sk_drops_add(struct s
 	atomic_add(segs, &sk->sk_drops);
 }
 
+static inline ktime_t sock_read_timestamp(struct sock *sk)
+{
+#if BITS_PER_LONG==32
+	unsigned int seq;
+	ktime_t kt;
+
+	do {
+		seq = read_seqbegin(&sk->sk_stamp_seq);
+		kt = sk->sk_stamp;
+	} while (read_seqretry(&sk->sk_stamp_seq, seq));
+
+	return kt;
+#else
+	return sk->sk_stamp;
+#endif
+}
+
+static inline void sock_write_timestamp(struct sock *sk, ktime_t kt)
+{
+#if BITS_PER_LONG==32
+	write_seqlock(&sk->sk_stamp_seq);
+	sk->sk_stamp = kt;
+	write_sequnlock(&sk->sk_stamp_seq);
+#else
+	sk->sk_stamp = kt;
+#endif
+}
+
 void __sock_recv_timestamp(struct msghdr *msg, struct sock *sk,
 			   struct sk_buff *skb);
 void __sock_recv_wifi_status(struct msghdr *msg, struct sock *sk,
@@ -2345,7 +2377,7 @@ sock_recv_timestamp(struct msghdr *msg,
 	     (sk->sk_tsflags & SOF_TIMESTAMPING_RAW_HARDWARE)))
 		__sock_recv_timestamp(msg, sk, skb);
 	else
-		sk->sk_stamp = kt;
+		sock_write_timestamp(sk, kt);
 
 	if (sock_flag(sk, SOCK_WIFI_STATUS) && skb->wifi_acked_valid)
 		__sock_recv_wifi_status(msg, sk, skb);
@@ -2366,9 +2398,9 @@ static inline void sock_recv_ts_and_drop
 	if (sk->sk_flags & FLAGS_TS_OR_DROPS || sk->sk_tsflags & TSFLAGS_ANY)
 		__sock_recv_ts_and_drops(msg, sk, skb);
 	else if (unlikely(sock_flag(sk, SOCK_TIMESTAMP)))
-		sk->sk_stamp = skb->tstamp;
+		sock_write_timestamp(sk, skb->tstamp);
 	else if (unlikely(sk->sk_stamp == SK_DEFAULT_STAMP))
-		sk->sk_stamp = 0;
+		sock_write_timestamp(sk, 0);
 }
 
 void __sock_tx_timestamp(__u16 tsflags, __u8 *tx_flags);
diff -up ./net/compat.c.60f05dddf1eb5 ./net/compat.c
--- ./net/compat.c.60f05dddf1eb5	2020-01-12 20:38:11.904065691 +0900
+++ ./net/compat.c	2020-01-12 20:40:46.817107258 +0900
@@ -468,12 +468,13 @@ int compat_sock_get_timestamp(struct soc
 	err = -ENOENT;
 	if (!sock_flag(sk, SOCK_TIMESTAMP))
 		sock_enable_timestamp(sk, SOCK_TIMESTAMP);
-	tv = ktime_to_timeval(sk->sk_stamp);
+	tv = ktime_to_timeval(sock_read_timestamp(sk));
 	if (tv.tv_sec == -1)
 		return err;
 	if (tv.tv_sec == 0) {
-		sk->sk_stamp = ktime_get_real();
-		tv = ktime_to_timeval(sk->sk_stamp);
+		ktime_t kt = ktime_get_real();
+		sock_write_timestamp(sk, kt);
+		tv = ktime_to_timeval(kt);
 	}
 	err = 0;
 	if (put_user(tv.tv_sec, &ctv->tv_sec) ||
@@ -496,12 +497,13 @@ int compat_sock_get_timestampns(struct s
 	err = -ENOENT;
 	if (!sock_flag(sk, SOCK_TIMESTAMP))
 		sock_enable_timestamp(sk, SOCK_TIMESTAMP);
-	ts = ktime_to_timespec(sk->sk_stamp);
+	ts = ktime_to_timespec(sock_read_timestamp(sk));
 	if (ts.tv_sec == -1)
 		return err;
 	if (ts.tv_sec == 0) {
-		sk->sk_stamp = ktime_get_real();
-		ts = ktime_to_timespec(sk->sk_stamp);
+		ktime_t kt = ktime_get_real();
+		sock_write_timestamp(sk, kt);
+		ts = ktime_to_timespec(kt);
 	}
 	err = 0;
 	if (put_user(ts.tv_sec, &ctv->tv_sec) ||
diff -up ./net/core/sock.c.60f05dddf1eb5 ./net/core/sock.c
--- ./net/core/sock.c.60f05dddf1eb5	2019-09-15 19:14:11.000000000 +0900
+++ ./net/core/sock.c	2020-01-12 20:42:49.167140088 +0900
@@ -2846,6 +2846,9 @@ void sock_init_data(struct socket *sock,
 	sk->sk_sndtimeo		=	MAX_SCHEDULE_TIMEOUT;
 
 	sk->sk_stamp = SK_DEFAULT_STAMP;
+#if BITS_PER_LONG==32
+	seqlock_init(&sk->sk_stamp_seq);
+#endif
 	atomic_set(&sk->sk_zckey, 0);
 
 #ifdef CONFIG_NET_RX_BUSY_POLL
@@ -2945,12 +2948,13 @@ int sock_get_timestamp(struct sock *sk,
 	struct timeval tv;
 	if (!sock_flag(sk, SOCK_TIMESTAMP))
 		sock_enable_timestamp(sk, SOCK_TIMESTAMP);
-	tv = ktime_to_timeval(sk->sk_stamp);
+	tv = ktime_to_timeval(sock_read_timestamp(sk));
 	if (tv.tv_sec == -1)
 		return -ENOENT;
 	if (tv.tv_sec == 0) {
-		sk->sk_stamp = ktime_get_real();
-		tv = ktime_to_timeval(sk->sk_stamp);
+		ktime_t kt = ktime_get_real();
+		sock_write_timestamp(sk, kt);
+		tv = ktime_to_timeval(kt);
 	}
 	return copy_to_user(userstamp, &tv, sizeof(tv)) ? -EFAULT : 0;
 }
@@ -2961,11 +2965,12 @@ int sock_get_timestampns(struct sock *sk
 	struct timespec ts;
 	if (!sock_flag(sk, SOCK_TIMESTAMP))
 		sock_enable_timestamp(sk, SOCK_TIMESTAMP);
-	ts = ktime_to_timespec(sk->sk_stamp);
+	ts = ktime_to_timespec(sock_read_timestamp(sk));
 	if (ts.tv_sec == -1)
 		return -ENOENT;
 	if (ts.tv_sec == 0) {
-		sk->sk_stamp = ktime_get_real();
+		ktime_t kt = ktime_get_real();
+		sock_write_timestamp(sk, kt);
 		ts = ktime_to_timespec(sk->sk_stamp);
 	}
 	return copy_to_user(userstamp, &ts, sizeof(ts)) ? -EFAULT : 0;
diff -up ./net/sunrpc/svcsock.c.60f05dddf1eb5 ./net/sunrpc/svcsock.c
--- ./net/sunrpc/svcsock.c.60f05dddf1eb5	2019-09-15 19:14:11.000000000 +0900
+++ ./net/sunrpc/svcsock.c	2020-01-12 20:38:11.960065706 +0900
@@ -574,7 +574,7 @@ static int svc_udp_recvfrom(struct svc_r
 		/* Don't enable netstamp, sunrpc doesn't
 		   need that much accuracy */
 	}
-	svsk->sk_sk->sk_stamp = skb->tstamp;
+	sock_write_timestamp(svsk->sk_sk, skb->tstamp);
 	set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags); /* there may be more data... */
 
 	len  = skb->len;
kabe

kabe

2020-03-29 06:20

reporter   ~0036602

Cherry-picked some patches from upstream...

patch-zero-out-vma.patch (4,374 bytes)
commit a670468f5e0b5fad4db6e4d195f15915dc2a35c1
Author: Andrew Morton <akpm@linux-foundation.org>
Date:   Tue Aug 21 21:53:06 2018 -0700

    mm: zero out the vma in vma_init()
    
    Rather than in vm_area_alloc().  To ensure that the various oddball
    stack-based vmas are in a good state.  Some of the callers were zeroing
    them out, others were not.
    
    Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
    Cc: Russell King <rmk+kernel@arm.linux.org.uk>
    Cc: Dmitry Vyukov <dvyukov@google.com>
    Cc: Oleg Nesterov <oleg@redhat.com>
    Cc: Andrea Arcangeli <aarcange@redhat.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c
index d9c299133111..82ab015bf42b 100644
--- a/arch/arm/kernel/process.c
+++ b/arch/arm/kernel/process.c
@@ -330,16 +330,15 @@ unsigned long arch_randomize_brk(struct mm_struct *mm)
  * atomic helpers. Insert it into the gate_vma so that it is visible
  * through ptrace and /proc/<pid>/mem.
  */
-static struct vm_area_struct gate_vma = {
-	.vm_start	= 0xffff0000,
-	.vm_end		= 0xffff0000 + PAGE_SIZE,
-	.vm_flags	= VM_READ | VM_EXEC | VM_MAYREAD | VM_MAYEXEC,
-};
+static struct vm_area_struct gate_vma;
 
 static int __init gate_vma_init(void)
 {
 	vma_init(&gate_vma, NULL);
 	gate_vma.vm_page_prot = PAGE_READONLY_EXEC;
+	gate_vma.vm_start = 0xffff0000;
+	gate_vma.vm_end	= 0xffff0000 + PAGE_SIZE;
+	gate_vma.vm_flags = VM_READ | VM_EXEC | VM_MAYREAD | VM_MAYEXEC;
 	return 0;
 }
 arch_initcall(gate_vma_init);
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 346a146c7617..32920a10100e 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -410,7 +410,6 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart,
 	int i, freed = 0;
 	bool truncate_op = (lend == LLONG_MAX);
 
-	memset(&pseudo_vma, 0, sizeof(struct vm_area_struct));
 	vma_init(&pseudo_vma, current->mm);
 	pseudo_vma.vm_flags = (VM_HUGETLB | VM_MAYSHARE | VM_SHARED);
 	pagevec_init(&pvec);
@@ -595,7 +594,6 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset,
 	 * allocation routines.  If NUMA is configured, use page index
 	 * as input to create an allocation policy.
 	 */
-	memset(&pseudo_vma, 0, sizeof(struct vm_area_struct));
 	vma_init(&pseudo_vma, mm);
 	pseudo_vma.vm_flags = (VM_HUGETLB | VM_MAYSHARE | VM_SHARED);
 	pseudo_vma.vm_file = file;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index a3cae495f9ce..3a4b87d1a59a 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -456,6 +456,7 @@ static inline void vma_init(struct vm_area_struct *vma, struct mm_struct *mm)
 {
 	static const struct vm_operations_struct dummy_vm_ops = {};
 
+	memset(vma, 0, sizeof(*vma));
 	vma->vm_mm = mm;
 	vma->vm_ops = &dummy_vm_ops;
 	INIT_LIST_HEAD(&vma->anon_vma_chain);
diff --git a/kernel/fork.c b/kernel/fork.c
index 5ee74c113381..8c760effa42e 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -310,8 +310,9 @@ static struct kmem_cache *mm_cachep;
 
 struct vm_area_struct *vm_area_alloc(struct mm_struct *mm)
 {
-	struct vm_area_struct *vma = kmem_cache_zalloc(vm_area_cachep, GFP_KERNEL);
+	struct vm_area_struct *vma;
 
+	vma = kmem_cache_alloc(vm_area_cachep, GFP_KERNEL);
 	if (vma)
 		vma_init(vma, mm);
 	return vma;
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 01f1a14facc4..4861ba738d6f 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -2504,7 +2504,6 @@ void mpol_shared_policy_init(struct shared_policy *sp, struct mempolicy *mpol)
 			goto put_new;
 
 		/* Create pseudo-vma that contains just the policy */
-		memset(&pvma, 0, sizeof(struct vm_area_struct));
 		vma_init(&pvma, NULL);
 		pvma.vm_end = TASK_SIZE;	/* policy covers entire file */
 		mpol_set_shared_policy(sp, &pvma, new); /* adds ref */
diff --git a/mm/shmem.c b/mm/shmem.c
index c48c79018a7c..fb04baacc9fa 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1421,7 +1421,6 @@ static void shmem_pseudo_vma_init(struct vm_area_struct *vma,
 		struct shmem_inode_info *info, pgoff_t index)
 {
 	/* Create a pseudo vma that just contains the policy */
-	memset(vma, 0, sizeof(*vma));
 	vma_init(vma, NULL);
 	/* Bias interleave by inode number to distribute better across nodes */
 	vma->vm_pgoff = index + info->vfs_inode.i_ino;
patch-zero-out-vma.patch (4,374 bytes)
kabe

kabe

2020-03-29 06:21

reporter   ~0036603

Cherry-picked some patches from upstream... preventing 32bit overflow arithmetic

patch-DIV_ROUND_UP_ULL.patch (1,663 bytes)
commit 2656ee5a5ad59300bbe183d0833867a582910dcc
Author: Vinod Koul <vkoul@kernel.org>
Date:   Fri Jun 28 12:07:21 2019 -0700

    linux/kernel.h: fix overflow for DIV_ROUND_UP_ULL
    
    [ Upstream commit 8f9fab480c7a87b10bb5440b5555f370272a5d59 ]
    
    DIV_ROUND_UP_ULL adds the two arguments and then invokes
    DIV_ROUND_DOWN_ULL.  But on a 32bit system the addition of two 32 bit
    values can overflow.  DIV_ROUND_DOWN_ULL does it correctly and stashes
    the addition into a unsigned long long so cast the result to unsigned
    long long here to avoid the overflow condition.
    
    [akpm@linux-foundation.org: DIV_ROUND_UP_ULL must be an rval]
    Link: http://lkml.kernel.org/r/20190625100518.30753-1-vkoul@kernel.org
    Signed-off-by: Vinod Koul <vkoul@kernel.org>
    Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
    Cc: Bjorn Andersson <bjorn.andersson@linaro.org>
    Cc: Randy Dunlap <rdunlap@infradead.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 3d83ebb302cf..f6f94e54ab96 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -118,7 +118,8 @@
 #define DIV_ROUND_DOWN_ULL(ll, d) \
 	({ unsigned long long _tmp = (ll); do_div(_tmp, d); _tmp; })
 
-#define DIV_ROUND_UP_ULL(ll, d)		DIV_ROUND_DOWN_ULL((ll) + (d) - 1, (d))
+#define DIV_ROUND_UP_ULL(ll, d) \
+	DIV_ROUND_DOWN_ULL((unsigned long long)(ll) + (d) - 1, (d))
 
 #if BITS_PER_LONG == 32
 # define DIV_ROUND_UP_SECTOR_T(ll,d) DIV_ROUND_UP_ULL(ll, d)
kabe

kabe

2020-03-29 06:22

reporter   ~0036604

Cherry-picked some patches from upstream... needed for bare-metal machines to boot

patch-BSS_MAIN.patch (2,009 bytes)
commit 3b51d71365e0801e19fe81b66b34f2f19935a9ed
Author: Sami Tolvanen <samitolvanen@google.com>
Date:   Mon Apr 15 09:49:56 2019 -0700

    x86/build/lto: Fix truncated .bss with -fdata-sections
    
    [ Upstream commit 6a03469a1edc94da52b65478f1e00837add869a3 ]
    
    With CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=y, we compile the kernel with
    -fdata-sections, which also splits the .bss section.
    
    The new section, with a new .bss.* name, which pattern gets missed by the
    main x86 linker script which only expects the '.bss' name. This results
    in the discarding of the second part and a too small, truncated .bss
    section and an unhappy, non-working kernel.
    
    Use the common BSS_MAIN macro in the linker script to properly capture
    and merge all the generated BSS sections.
    
    Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
    Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
    Reviewed-by: Kees Cook <keescook@chromium.org>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: Kees Cook <keescook@chromium.org>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Nicholas Piggin <npiggin@gmail.com>
    Cc: Nick Desaulniers <ndesaulniers@google.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Link: http://lkml.kernel.org/r/20190415164956.124067-1-samitolvanen@google.com
    [ Extended the changelog. ]
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

(ported to linux-4.18.0-147.5.1.el8_1.centos.plus)

diff -up ./arch/x86/kernel/vmlinux.lds.S.bssmain ./arch/x86/kernel/vmlinux.lds.S
--- ./arch/x86/kernel/vmlinux.lds.S.bssmain	2020-03-07 20:53:20.744531788 +0900
+++ ./arch/x86/kernel/vmlinux.lds.S	2020-03-07 20:55:07.842228414 +0900
@@ -359,7 +359,7 @@ SECTIONS
 	.bss : AT(ADDR(.bss) - LOAD_OFFSET) {
 		__bss_start = .;
 		*(.bss..page_aligned)
-		*(.bss)
+		*(BSS_MAIN)
 		BSS_DECRYPTED
 		. = ALIGN(PAGE_SIZE);
 		__bss_stop = .;
patch-BSS_MAIN.patch (2,009 bytes)
kabe

kabe

2020-03-29 06:23

reporter   ~0036605

Cherry-picked some patches from upstream...

patch-__end_rodata_aligned.patch (4,358 bytes)
commit 39d668e04edad25abe184fb329ce35a131146ee5
Author: Joerg Roedel <jroedel@suse.de>
Date:   Wed Jul 18 11:41:04 2018 +0200

    x86/mm/pti: Make pti_clone_kernel_text() compile on 32 bit
    
    The pti_clone_kernel_text() function references __end_rodata_hpage_align,
    which is only present on x86-64.  This makes sense as the end of the rodata
    section is not huge-page aligned on 32 bit.
    
    Nevertheless a symbol is required for the function that points at the right
    address for both 32 and 64 bit. Introduce __end_rodata_aligned for that
    purpose and use it in pti_clone_kernel_text().
    
    Signed-off-by: Joerg Roedel <jroedel@suse.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Tested-by: Pavel Machek <pavel@ucw.cz>
    Cc: "H . Peter Anvin" <hpa@zytor.com>
    Cc: linux-mm@kvack.org
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Andy Lutomirski <luto@kernel.org>
    Cc: Dave Hansen <dave.hansen@intel.com>
    Cc: Josh Poimboeuf <jpoimboe@redhat.com>
    Cc: Juergen Gross <jgross@suse.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: Jiri Kosina <jkosina@suse.cz>
    Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
    Cc: Brian Gerst <brgerst@gmail.com>
    Cc: David Laight <David.Laight@aculab.com>
    Cc: Denys Vlasenko <dvlasenk@redhat.com>
    Cc: Eduardo Valentin <eduval@amazon.com>
    Cc: Greg KH <gregkh@linuxfoundation.org>
    Cc: Will Deacon <will.deacon@arm.com>
    Cc: aliguori@amazon.com
    Cc: daniel.gruss@iaik.tugraz.at
    Cc: hughd@google.com
    Cc: keescook@google.com
    Cc: Andrea Arcangeli <aarcange@redhat.com>
    Cc: Waiman Long <llong@redhat.com>
    Cc: "David H . Gutteridge" <dhgutteridge@sympatico.ca>
    Cc: joro@8bytes.org
    Link: https://lkml.kernel.org/r/1531906876-13451-28-git-send-email-joro@8bytes.org

(ported to 4.18.0-147.5.1.el8.centos.plus)

diff -up ./arch/x86/include/asm/sections.h.rodat ./arch/x86/include/asm/sections.h
--- ./arch/x86/include/asm/sections.h.rodat	2020-01-14 23:54:17.000000000 +0900
+++ ./arch/x86/include/asm/sections.h	2020-03-07 21:01:14.842615598 +0900
@@ -7,6 +7,7 @@
 
 extern char __brk_base[], __brk_limit[];
 extern struct exception_table_entry __stop___ex_table[];
+extern char __end_rodata_aligned[];
 
 #if defined(CONFIG_X86_64)
 extern char __end_rodata_hpage_align[];
diff -up ./arch/x86/kernel/vmlinux.lds.S.rodat ./arch/x86/kernel/vmlinux.lds.S
--- ./arch/x86/kernel/vmlinux.lds.S.rodat	2020-03-07 21:01:14.832615533 +0900
+++ ./arch/x86/kernel/vmlinux.lds.S	2020-03-07 21:05:25.579167858 +0900
@@ -55,11 +55,12 @@ jiffies_64 = jiffies;
  * so we can enable protection checks as well as retain 2MB large page
  * mappings for kernel text.
  */
-#define X64_ALIGN_RODATA_BEGIN	. = ALIGN(HPAGE_SIZE);
+#define X86_ALIGN_RODATA_BEGIN	. = ALIGN(HPAGE_SIZE);
 
-#define X64_ALIGN_RODATA_END					\
+#define X86_ALIGN_RODATA_END					\
 		. = ALIGN(HPAGE_SIZE);				\
-		__end_rodata_hpage_align = .;
+		__end_rodata_hpage_align = .;			\
+		__end_rodata_aligned = .;
 
 #define ALIGN_ENTRY_TEXT_BEGIN	. = ALIGN(PMD_SIZE);
 #define ALIGN_ENTRY_TEXT_END	. = ALIGN(PMD_SIZE);
@@ -83,8 +84,10 @@ jiffies_64 = jiffies;
 
 #else
 
-#define X64_ALIGN_RODATA_BEGIN
-#define X64_ALIGN_RODATA_END
+#define X86_ALIGN_RODATA_BEGIN
+#define X86_ALIGN_RODATA_END					\
+		. = ALIGN(PAGE_SIZE);				\
+		__end_rodata_aligned = .;
 
 #define ALIGN_ENTRY_TEXT_BEGIN
 #define ALIGN_ENTRY_TEXT_END
@@ -149,9 +152,9 @@ SECTIONS
 
 	/* .text should occupy whole number of pages */
 	. = ALIGN(PAGE_SIZE);
-	X64_ALIGN_RODATA_BEGIN
+	X86_ALIGN_RODATA_BEGIN
 	RO_DATA(PAGE_SIZE)
-	X64_ALIGN_RODATA_END
+	X86_ALIGN_RODATA_END
 
 	/* Data */
 	.data : AT(ADDR(.data) - LOAD_OFFSET) {
diff -up ./arch/x86/mm/pti.c.rodat ./arch/x86/mm/pti.c
--- ./arch/x86/mm/pti.c.rodat	2020-01-14 23:54:17.000000000 +0900
+++ ./arch/x86/mm/pti.c	2020-03-07 21:01:14.843615604 +0900
@@ -502,7 +502,7 @@ void pti_clone_kernel_text(void)
 	 * clone the areas past rodata, they might contain secrets.
 	 */
 	unsigned long start = PFN_ALIGN(_text);
-	unsigned long end_clone  = (unsigned long)__end_rodata_hpage_align;
+	unsigned long end_clone  = (unsigned long)__end_rodata_aligned;
 	unsigned long end_global = PFN_ALIGN((unsigned long)__stop___ex_table);
 
 	if (!pti_kernel_image_global_ok())
kabe

kabe

2020-03-29 06:25

reporter   ~0036606

Reported as https://bugzilla.kernel.org/show_bug.cgi?id=206181#c12 .
Otherwise panics on Hyper-V.

patch-hv_balloon-hotadd-panic.patch (1,369 bytes)
This fixes panic on Hyper-V, which occurs around 66 seconds after boot,
when there's memory pressure, and Hyper-V host tries to hot-add memory 
to guest.

Workaround: "hv_balloon.hot_add=0" kernel command line.

Since we're hot-adding then online-ing the page,
we shouldn't free the page.

Posted as https://bugzilla.kernel.org/show_bug.cgi?id=206181#c12

diff -up ./drivers/hv/hv_balloon.c.ha00 ./drivers/hv/hv_balloon.c
--- ./drivers/hv/hv_balloon.c.ha00	2020-01-14 23:54:17.000000000 +0900
+++ ./drivers/hv/hv_balloon.c	2020-03-07 21:12:44.446718500 +0900
@@ -692,7 +692,7 @@ static void hv_page_online_one(struct hv
 	/* This frame is currently backed; online the page. */
 	__online_page_set_limits(pg);
 	__online_page_increment_counters(pg);
-	__online_page_free(pg);
+	/*__online_page_free(pg);*/
 
 	lockdep_assert_held(&dm_device.ha_lock);
 	dm_device.num_pages_onlined++;
@@ -740,6 +740,8 @@ static void hv_mem_hot_add(unsigned long
 		dm_device.ha_waiting = !memhp_auto_online;
 
 		nid = memory_add_physaddr_to_nid(PFN_PHYS(start_pfn));
+
+		pr_debug("%s: calling add_memory(nid=%d, ((start_pfn=0x%lx) << PAGE_SHIFT)=0x%llx, (HA_CHUNK << PAGE_SHIFT)=%lu)\n", __func__, nid, start_pfn, ((unsigned long long)start_pfn << PAGE_SHIFT), ((unsigned long)HA_CHUNK << PAGE_SHIFT));
 		ret = add_memory(nid, PFN_PHYS((start_pfn)),
 				(HA_CHUNK << PAGE_SHIFT));
 
kabe

kabe

2020-03-29 06:26

reporter   ~0036607

Reported, and suggested in https://bugzilla.kernel.org/show_bug.cgi?id=206401
With this patch,memory hot-add will propely work under Hyper-V.

patch-bhe-hyperv-hotplug.patch (1,944 bytes)
See 

Bug 206401 Summary: kernel panic on Hyper-V after 5 minutes due to memory hot-add 
https://bugzilla.kernel.org/show_bug.cgi?id=206401

for rationale of this patch.

diff -up ./mm/memory_hotplug.c.ha00 ./mm/memory_hotplug.c
--- ./mm/memory_hotplug.c.ha00	2019-09-15 19:14:11.000000000 +0900
+++ ./mm/memory_hotplug.c	2020-02-21 11:15:28.372889966 +0900
@@ -820,15 +820,19 @@ static struct zone *default_kernel_zone_
 {
 	struct pglist_data *pgdat = NODE_DATA(nid);
 	int zid;
+	enum  zone_type default_zone = ZONE_NORMAL; /*9faf47bd*/
 
-	for (zid = 0; zid <= ZONE_NORMAL; zid++) {
+#ifdef CONFIG_HIGHMEM				/*9faf47bd*/
+	default_zone = ZONE_HIGHMEM;		/*9faf47bd*/
+#endif						/*9faf47bd*/
+	for (zid = 0; zid <= default_zone; zid++) { /*9faf47bd*/
 		struct zone *zone = &pgdat->node_zones[zid];
 
 		if (zone_intersects(zone, start_pfn, nr_pages))
 			return zone;
 	}
 
-	return &pgdat->node_zones[ZONE_NORMAL];
+	return &pgdat->node_zones[default_zone]; /*9faf47bd*/
 }
 
 static inline struct zone *default_zone_for_pfn(int nid, unsigned long start_pfn,
diff -up ./mm/sparse.c.ha00 ./mm/sparse.c
--- ./mm/sparse.c.ha00	2019-09-15 19:14:11.000000000 +0900
+++ ./mm/sparse.c	2020-02-21 11:18:04.615247199 +0900
@@ -594,16 +594,21 @@ static struct page *__kmalloc_section_me
 	page = alloc_pages(GFP_KERNEL|__GFP_NOWARN, get_order(memmap_size));
 	if (page)
 		goto got_map_page;
+	pr_debug("%s: alloc_pages() returned 0x%px (should be 0), reverting to vmalloc(memmap_size=%lu)\n", __func__, page, memmap_size);
+	BUG_ON(page != 0);
 
 	ret = vmalloc(memmap_size);
+	pr_debug("%s: vmalloc(%lu) returned 0x%px\n", __func__, memmap_size, ret);
 	if (ret)
 		goto got_map_ptr;
 
 	return NULL;
 got_map_page:
 	ret = (struct page *)pfn_to_kaddr(page_to_pfn(page));
+	pr_debug("%s: allocated struct page *page=0x%px\n", __func__, page);
 got_map_ptr:
 
+	pr_debug("%s: returning struct page * =0x%px\n", __func__, ret);
 	return ret;
 }
 
kabe

kabe

2020-03-29 06:28

reporter   ~0036608

Patch for i915.ko VGA devices. Reported and fixed upstream as https://gitlab.freedesktop.org/drm/intel/issues/1027
(don't know when it will be merged into kernel.org codebase)

drm-i915-Wean-off-drm_pci_alloc-drm_pci_free-el8.patch (7,497 bytes)
commit c6790dc22312f592c1434577258b31c48c72d52a
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Sun Feb 2 15:39:34 2020 +0000

    drm/i915: Wean off drm_pci_alloc/drm_pci_free
    
    drm_pci_alloc and drm_pci_free are just very thin wrappers around
    dma_alloc_coherent, with a note that we should be removing them.
    Furthermore since
    
    commit de09d31dd38a50fdce106c15abd68432eebbd014
    Author: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
    Date:   Fri Jan 15 16:51:42 2016 -0800
    
        page-flags: define PG_reserved behavior on compound pages
    
        As far as I can see there's no users of PG_reserved on compound pages.
        Let's use PF_NO_COMPOUND here.
    
    drm_pci_alloc has been declared broken since it mixes GFP_COMP and
    SetPageReserved. Avoid this conflict by weaning ourselves off using the
    abstraction and using the dma functions directly.
    
    Reported-by: Taketo Kabe
    Closes: https://gitlab.freedesktop.org/drm/intel/issues/1027
    Fixes: de09d31dd38a ("page-flags: define PG_reserved behavior on compound pages")
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: <stable@vger.kernel.org> # v4.5+
    Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
    Link: https://patchwork.freedesktop.org/patch/msgid/20200202153934.3899472-1-chris@chris-wilson.co.uk


Ported to CentOS 8 kernel by T.Kabe

diff -up ./drivers/gpu/drm/i915/i915_gem.c.915 ./drivers/gpu/drm/i915/i915_gem.c
--- ./drivers/gpu/drm/i915/i915_gem.c.915	2020-01-14 23:54:17.000000000 +0900
+++ ./drivers/gpu/drm/i915/i915_gem.c	2020-03-07 21:15:26.779661958 +0900
@@ -278,77 +278,73 @@ i915_gem_get_aperture_ioctl(struct drm_d
 static int i915_gem_object_get_pages_phys(struct drm_i915_gem_object *obj)
 {
 	struct address_space *mapping = obj->base.filp->f_mapping;
-	drm_dma_handle_t *phys;
-	struct sg_table *st;
 	struct scatterlist *sg;
-	char *vaddr;
+	struct sg_table *st;
+	dma_addr_t dma;
+	void *vaddr;
+	void *dst;
 	int i;
-	int err;
 
 	if (WARN_ON(i915_gem_object_needs_bit17_swizzle(obj)))
 		return -EINVAL;
 
-	/* Always aligning to the object size, allows a single allocation
+	/*
+	 * Always aligning to the object size, allows a single allocation
 	 * to handle all possible callers, and given typical object sizes,
 	 * the alignment of the buddy allocation will naturally match.
 	 */
-	phys = drm_pci_alloc(obj->base.dev,
-			     roundup_pow_of_two(obj->base.size),
-			     roundup_pow_of_two(obj->base.size));
-	if (!phys)
+	vaddr = dma_alloc_coherent(&obj->base.dev->pdev->dev,
+				   roundup_pow_of_two(obj->base.size),
+				   &dma, GFP_KERNEL);
+	if (!vaddr)
 		return -ENOMEM;
 
-	vaddr = phys->vaddr;
+	st = kmalloc(sizeof(*st), GFP_KERNEL);
+	if (!st)
+		goto err_pci;
+
+	if (sg_alloc_table(st, 1, GFP_KERNEL))
+		goto err_st;
+
+	sg = st->sgl;
+	sg->offset = 0;
+	sg->length = obj->base.size;
+
+	sg_assign_page(sg, (struct page *)vaddr);
+	sg_dma_address(sg) = dma;
+	sg_dma_len(sg) = obj->base.size;
+
+	dst = vaddr;
 	for (i = 0; i < obj->base.size / PAGE_SIZE; i++) {
 		struct page *page;
-		char *src;
+		void *src;
 
 		page = shmem_read_mapping_page(mapping, i);
-		if (IS_ERR(page)) {
-			err = PTR_ERR(page);
-			goto err_phys;
-		}
+		if (IS_ERR(page))
+			goto err_st;
 
 		src = kmap_atomic(page);
-		memcpy(vaddr, src, PAGE_SIZE);
-		drm_clflush_virt_range(vaddr, PAGE_SIZE);
+		memcpy(dst, src, PAGE_SIZE);
+		drm_clflush_virt_range(dst, PAGE_SIZE);
 		kunmap_atomic(src);
 
 		put_page(page);
-		vaddr += PAGE_SIZE;
+		dst += PAGE_SIZE;
 	}
 
 	i915_gem_chipset_flush(to_i915(obj->base.dev));
 
-	st = kmalloc(sizeof(*st), GFP_KERNEL);
-	if (!st) {
-		err = -ENOMEM;
-		goto err_phys;
-	}
-
-	if (sg_alloc_table(st, 1, GFP_KERNEL)) {
-		kfree(st);
-		err = -ENOMEM;
-		goto err_phys;
-	}
-
-	sg = st->sgl;
-	sg->offset = 0;
-	sg->length = obj->base.size;
-
-	sg_dma_address(sg) = phys->busaddr;
-	sg_dma_len(sg) = obj->base.size;
-
-	obj->phys_handle = phys;
-
 	__i915_gem_object_set_pages(obj, st, sg->length);
 
 	return 0;
 
-err_phys:
-	drm_pci_free(obj->base.dev, phys);
-
-	return err;
+err_st:
+	kfree(st);
+err_pci:
+	dma_free_coherent(&obj->base.dev->pdev->dev,
+			  roundup_pow_of_two(obj->base.size),
+			  vaddr, dma);
+	return -ENOMEM;
 }
 
 static void __start_cpu_write(struct drm_i915_gem_object *obj)
@@ -381,11 +377,14 @@ static void
 i915_gem_object_put_pages_phys(struct drm_i915_gem_object *obj,
 			       struct sg_table *pages)
 {
+	dma_addr_t dma = sg_dma_address(pages->sgl);
+	void *vaddr = sg_page(pages->sgl);
+
 	__i915_gem_object_release_shmem(obj, pages, false);
 
 	if (obj->mm.dirty) {
 		struct address_space *mapping = obj->base.filp->f_mapping;
-		char *vaddr = obj->phys_handle->vaddr;
+		void *src = vaddr;
 		int i;
 
 		for (i = 0; i < obj->base.size / PAGE_SIZE; i++) {
@@ -397,15 +396,16 @@ i915_gem_object_put_pages_phys(struct dr
 				continue;
 
 			dst = kmap_atomic(page);
-			drm_clflush_virt_range(vaddr, PAGE_SIZE);
-			memcpy(dst, vaddr, PAGE_SIZE);
+			drm_clflush_virt_range(src, PAGE_SIZE);
+			memcpy(dst, src, PAGE_SIZE);
 			kunmap_atomic(dst);
 
 			set_page_dirty(page);
 			if (obj->mm.madv == I915_MADV_WILLNEED)
 				mark_page_accessed(page);
 			put_page(page);
-			vaddr += PAGE_SIZE;
+
+			src += PAGE_SIZE;
 		}
 		obj->mm.dirty = false;
 	}
@@ -413,7 +413,9 @@ i915_gem_object_put_pages_phys(struct dr
 	sg_free_table(pages);
 	kfree(pages);
 
-	drm_pci_free(obj->base.dev, obj->phys_handle);
+	dma_free_coherent(&obj->base.dev->pdev->dev,
+			  roundup_pow_of_two(obj->base.size),
+			  vaddr, dma);
 }
 
 static void
@@ -689,7 +691,7 @@ i915_gem_phys_pwrite(struct drm_i915_gem
 		     struct drm_i915_gem_pwrite *args,
 		     struct drm_file *file)
 {
-	void *vaddr = obj->phys_handle->vaddr + args->offset;
+	void *vaddr = sg_page(obj->mm.pages->sgl) + args->offset;
 	char __user *user_data = u64_to_user_ptr(args->data_ptr);
 
 	/* We manually control the domain here and pretend that it
@@ -1553,10 +1555,10 @@ i915_gem_pwrite_ioctl(struct drm_device
 		ret = i915_gem_gtt_pwrite_fast(obj, args);
 
 	if (ret == -EFAULT || ret == -ENOSPC) {
-		if (obj->phys_handle)
-			ret = i915_gem_phys_pwrite(obj, args, file);
-		else
+		if (i915_gem_object_has_struct_page(obj))
 			ret = i915_gem_shmem_pwrite(obj, args);
+		else
+			ret = i915_gem_phys_pwrite(obj, args, file);
 	}
 
 	i915_gem_object_unpin_pages(obj);
diff -up ./drivers/gpu/drm/i915/i915_gem_object.h.915 ./drivers/gpu/drm/i915/i915_gem_object.h
--- ./drivers/gpu/drm/i915/i915_gem_object.h.915	2020-01-14 23:54:17.000000000 +0900
+++ ./drivers/gpu/drm/i915/i915_gem_object.h	2020-03-07 21:15:26.779661958 +0900
@@ -289,9 +289,6 @@ struct drm_i915_gem_object {
 		void *gvt_info;
 	};
 
-	/** for phys allocated objects */
-	struct drm_dma_handle *phys_handle;
-
 	struct reservation_object __builtin_resv;
 };
 
diff -up ./drivers/gpu/drm/i915/intel_display.c.915 ./drivers/gpu/drm/i915/intel_display.c
--- ./drivers/gpu/drm/i915/intel_display.c.915	2020-03-07 21:15:26.789662016 +0900
+++ ./drivers/gpu/drm/i915/intel_display.c	2020-03-07 21:16:57.082186784 +0900
@@ -10311,7 +10311,7 @@ static u32 intel_cursor_base(const struc
 	u32 base;
 
 	if (INTEL_INFO(dev_priv)->display.cursor_needs_physical)
-		base = obj->phys_handle->busaddr;
+		base = sg_dma_address(obj->mm.pages->sgl);
 	else
 		base = intel_plane_ggtt_offset(plane_state);
 
diff -up ./drivers/gpu/drm/i915/intel_overlay.c.915 ./drivers/gpu/drm/i915/intel_overlay.c

kabe

kabe

2020-03-29 06:29

reporter   ~0036609

.config file for i686. Drop it in as SOURCES/kernel-i686.config .

kernel-i686.config (183,317 bytes)
kabe

kabe

2020-03-29 06:32

reporter   ~0036610

SPECS/kernel.spec file I'm using, based on kernel-plus.

That's all. Thank you for your attention.

kernel.spec (1,689,520 bytes)
toracat

toracat

2020-03-29 17:31

manager   ~0036621

@kabe,

Thanks, as always, for your superb work. We will certainly make use of your patches if/when the 32-bit kernel for C8 is to be developed.
kabe

kabe

2020-04-20 13:24

reporter   ~0036728

This kernel still seems to have a problem:
It spits out exactly 60 instances of below BUG (with incrementing pfn) when there's over 80MiB of Highmem (for instance, giving kernel option highmem=81M) .
Vanilla kernel.org 4.19.0, 4.19.116 does not have this problem, so I'm looking at the difference, but I'm running out of clues.
...
[ 0.000000] 143MB HIGHMEM available.
[ 0.000000] 879MB LOWMEM available.
[ 0.000000] mapped low ram: 0 - 36ffe000
[ 0.000000] low ram: 0 - 36ffe000
[ 0.000000] Zone ranges:
[ 0.000000] DMA [mem 0x0000000000001000-0x0000000000ffffff]
[ 0.000000] Normal [mem 0x0000000001000000-0x0000000036ffdfff]
[ 0.000000] HighMem [mem 0x0000000036ffe000-0x000000003ffeffff]
[ 0.000000] Movable zone start for each node
[ 0.000000] Early memory node ranges
[ 0.000000] node 0: [mem 0x0000000000001000-0x000000000009efff]
[ 0.000000] node 0: [mem 0x0000000000100000-0x000000003ffeffff]
...
[ 0.000000] Initializing CPU#0
[ 0.000000] Initializing HighMem for node 0 (00036ffe:0003fff0)
[ 0.000000] BUG: Bad page state in process swapper pfn:3331a800
[ 0.000000] page:f2b24000 count:0 mapcount:1 mapping:00000000 index:0x0
[ 0.000000] flags: 0x0()
[ 0.000000] raw: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 0.000000] raw: 00000000 00000000
[ 0.000000] page dumped because: nonzero mapcount
[ 0.000000] Modules linked in:
[ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.18.0-147.5.1.el8_1.centos.plus.v2.i586 #17
[ 0.000000] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090006 05/23/2012
[ 0.000000] Call Trace:
[ 0.000000] dump_stack+0x58/0x7e
[ 0.000000] bad_page.cold.143+0x8d/0xc4
[ 0.000000] free_pages_check_bad+0x40/0x90
[ 0.000000] free_unref_page+0x15f/0x1a0
[ 0.000000] __free_pages+0x1e/0x40
[ 0.000000] free_highmem_page+0x33/0x80
[ 0.000000] add_highpages_with_active_regions+0xd7/0x105
[ 0.000000] set_highmem_pages_init+0x60/0x76
[ 0.000000] mem_init+0x30/0x1f6
[ 0.000000] start_kernel+0x1f7/0x4b5
[ 0.000000] ? early_idt_handler_common+0x50/0x50
[ 0.000000] i386_start_kernel+0xac/0xb0
[ 0.000000] startup_32_smp+0x164/0x170
[ 0.000000] Disabling lock debugging due to kernel taint
kabe

kabe

2020-04-25 08:42

reporter   ~0036766

Resolved as that CONFIG_DEFERRED_STRUCT_PAGE_INIT was not supported for 32bit.
This upstream kernel.org patch disables CONFIG_DEFERRED_STRUCT_PAGE_INIT on 32bit, so
there's no code exists for supporting this configuration.

patch-889c695d-no-DEFERRED_STRUCT_PAGE_INIT-32bit.patch (3,904 bytes)
commit 889c695d419f19e5db52592dafbaf26143c36d1f
Author: Pasha Tatashin <Pavel.Tatashin@microsoft.com>
Date:   Thu Sep 20 12:22:30 2018 -0700

    mm: disable deferred struct page for 32-bit arches
    
    Deferred struct page init is needed only on systems with large amount of
    physical memory to improve boot performance.  32-bit systems do not
    benefit from this feature.
    
    Jiri reported a problem where deferred struct pages do not work well with
    x86-32:
    
    [    0.035162] Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
    [    0.035725] Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
    [    0.036269] Initializing CPU#0
    [    0.036513] Initializing HighMem for node 0 (00036ffe:0007ffe0)
    [    0.038459] page:f6780000 is uninitialized and poisoned
    [    0.038460] raw: ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
    [    0.039509] page dumped because: VM_BUG_ON_PAGE(1 && PageCompound(page))
    [    0.040038] ------------[ cut here ]------------
    [    0.040399] kernel BUG at include/linux/page-flags.h:293!
    [    0.040823] invalid opcode: 0000 [#1] SMP PTI
    [    0.041166] CPU: 0 PID: 0 Comm: swapper Not tainted 4.19.0-rc1_pt_jiri #9
    [    0.041694] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014
    [    0.042496] EIP: free_highmem_page+0x64/0x80
    [    0.042839] Code: 13 46 d8 c1 e8 18 5d 83 e0 03 8d 04 c0 c1 e0 06 ff 80 ec 5f 44 d8 c3 8d b4 26 00 00 00 00 ba 08 65 28 d8 89 d8 e8 fc 71 02 00 <0f> 0b 8d 76 00 8d bc 27 00 00 00 00 ba d0 b1 26 d8 89 d8 e8 e4 71
    [    0.044338] EAX: 0000003c EBX: f6780000 ECX: 00000000 EDX: d856cbe8
    [    0.044868] ESI: 0007ffe0 EDI: d838df20 EBP: d838df00 ESP: d838defc
    [    0.045372] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00210086
    [    0.045913] CR0: 80050033 CR2: 00000000 CR3: 18556000 CR4: 00040690
    [    0.046413] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
    [    0.046913] DR6: fffe0ff0 DR7: 00000400
    [    0.047220] Call Trace:
    [    0.047419]  add_highpages_with_active_regions+0xbd/0x10d
    [    0.047854]  set_highmem_pages_init+0x5b/0x71
    [    0.048202]  mem_init+0x2b/0x1e8
    [    0.048460]  start_kernel+0x1d2/0x425
    [    0.048757]  i386_start_kernel+0x93/0x97
    [    0.049073]  startup_32_smp+0x164/0x168
    [    0.049379] Modules linked in:
    [    0.049626] ---[ end trace 337949378db0abbb ]---
    
    We free highmem pages before their struct pages are initialized:
    
    mem_init()
     set_highmem_pages_init()
      add_highpages_with_active_regions()
       free_highmem_page()
        .. Access uninitialized struct page here..
    
    Because there is no reason to have this feature on 32-bit systems, just
    disable it.
    
    Link: http://lkml.kernel.org/r/20180831150506.31246-1-pavel.tatashin@microsoft.com
    Fixes: 2e3ca40f03bb ("mm: relax deferred struct page requirements")
    Signed-off-by: Pavel Tatashin <pavel.tatashin@microsoft.com>
    Reported-by: Jiri Slaby <jslaby@suse.cz>
    Acked-by: Michal Hocko <mhocko@suse.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

(ported to linux-4.18.0-147.5.1.el8_1)
diff -up ./mm/Kconfig.889c695d ./mm/Kconfig
--- ./mm/Kconfig.889c695d	2020-04-25 14:33:40.177165049 +0900
+++ ./mm/Kconfig	2020-04-25 14:34:39.256262894 +0900
@@ -631,8 +631,9 @@ config DEFERRED_STRUCT_PAGE_INIT
 	bool "Defer initialisation of struct pages to kthreads"
 	default n
 	depends on NO_BOOTMEM
-	depends on !FLATMEM
+ 	depends on SPARSEMEM
 	depends on !NEED_PER_CPU_KM
+	depends on 64BIT
 	help
 	  Ordinarily all struct pages are initialised during early boot in a
 	  single thread. On very large machines this can take a considerable
kabe

kabe

2020-04-25 08:45

reporter   ~0036767

Updated .config. just removal of CONFIG_DEFERRED_STRUCT_PAGE_INIT

kernel-i686-2.config (183,282 bytes)
kabe

kabe

2020-04-25 08:47

reporter   ~0036768

Updated SPECS/kernel.spec

kernel-2.spec (1,690,100 bytes)

Issue History

Date Modified Username Field Change
2020-03-29 06:14 kabe New Issue
2020-03-29 06:14 kabe Tag Attached: i386
2020-03-29 06:16 kabe File Added: i686-netlink_callback-s64.patch
2020-03-29 06:16 kabe Note Added: 0036599
2020-03-29 06:17 kabe File Added: patch-PROPERTY_ENTRY_STRING.patch
2020-03-29 06:17 kabe Note Added: 0036600
2020-03-29 06:18 kabe File Added: patch-sock-sk_stamp.patch
2020-03-29 06:18 kabe Note Added: 0036601
2020-03-29 06:20 kabe File Added: patch-zero-out-vma.patch
2020-03-29 06:20 kabe Note Added: 0036602
2020-03-29 06:21 kabe File Added: patch-DIV_ROUND_UP_ULL.patch
2020-03-29 06:21 kabe Note Added: 0036603
2020-03-29 06:22 kabe File Added: patch-BSS_MAIN.patch
2020-03-29 06:22 kabe Note Added: 0036604
2020-03-29 06:23 kabe File Added: patch-__end_rodata_aligned.patch
2020-03-29 06:23 kabe Note Added: 0036605
2020-03-29 06:25 kabe File Added: patch-hv_balloon-hotadd-panic.patch
2020-03-29 06:25 kabe Note Added: 0036606
2020-03-29 06:26 kabe File Added: patch-bhe-hyperv-hotplug.patch
2020-03-29 06:26 kabe Note Added: 0036607
2020-03-29 06:28 kabe File Added: drm-i915-Wean-off-drm_pci_alloc-drm_pci_free-el8.patch
2020-03-29 06:28 kabe Note Added: 0036608
2020-03-29 06:29 kabe File Added: kernel-i686.config
2020-03-29 06:29 kabe Note Added: 0036609
2020-03-29 06:32 kabe File Added: kernel.spec
2020-03-29 06:32 kabe Note Added: 0036610
2020-03-29 17:31 toracat Status new => acknowledged
2020-03-29 17:31 toracat Note Added: 0036621
2020-04-20 13:24 kabe Note Added: 0036728
2020-04-25 08:42 kabe File Added: patch-889c695d-no-DEFERRED_STRUCT_PAGE_INIT-32bit.patch
2020-04-25 08:42 kabe Note Added: 0036766
2020-04-25 08:45 kabe File Added: kernel-i686-2.config
2020-04-25 08:45 kabe Note Added: 0036767
2020-04-25 08:47 kabe File Added: kernel-2.spec
2020-04-25 08:47 kabe Note Added: 0036768
2020-08-16 17:42 toracat Relationship added related to 0017674