View Issue Details

IDProjectCategoryView StatusLast Update
0017584CentOS-7kernelpublic2020-07-09 22:36
Reportermattmix 
PrioritynormalSeverityminorReproducibilitysometimes
Status newResolutionopen 
PlatformOSCentOS Linux 7 (Core)OS Version7.8.2003
Product Version7.8-2003 
Target VersionFixed in Version 
Summary0017584: squashfs stuck after OOM
DescriptionAfter an OOM event due to cgroups limits, processes running in a Singularity container may become unrecoverably stuck in the squashfs portion of the kernel. The issue is not consistent, but we have enough nodes and users that several every day may become broken in this way.

An example stack from procfs for a process that is stuck:

# cat /proc/21627/stack
[<ffffffffc07ce725>] squashfs_cache_get+0x105/0x3c0 [squashfs]
[<ffffffffc07ceff1>] squashfs_get_datablock+0x21/0x30 [squashfs]
[<ffffffffc07d0272>] squashfs_readpage+0x8a2/0xc30 [squashfs]
[<ffffffffadbcb6f8>] __do_page_cache_readahead+0x248/0x260
[<ffffffffadbcbce1>] ra_submit+0x21/0x30
[<ffffffffadbc0e25>] filemap_fault+0x105/0x420
[<ffffffffadbedf4a>] __do_fault.isra.61+0x8a/0x100
[<ffffffffadbee4fc>] do_read_fault.isra.63+0x4c/0x1b0
[<ffffffffadbf5d60>] handle_mm_fault+0xa20/0xfb0
[<ffffffffae18d653>] __do_page_fault+0x213/0x500
[<ffffffffae18d975>] do_page_fault+0x35/0x90
[<ffffffffae189778>] page_fault+0x28/0x30
[<ffffffffffffffff>] 0xffffffffffffffff

This process will never recover from this state, even if left for several days. It is also unkillable.

Additionally, reads to /proc/21627/cmdline will hang in uninterruptible sleep. I believe this is because squashfs is holding a lock to the memory map for that process.
Steps To ReproduceI can somewhat consistently reproduce by using a script that reads, with high multi-process concurrency, a file out of a Singularity image. The script also has a memory leak to trigger OOM. I run this in a memory constrained cgroup and, after the OOM, one or more of the reading processes might be stuck as above. Though, it is not uncommon for multiple OOM events to have to occur first.
Additional InformationSingularity uses Squashfs/Namespaces in the backend. I do not believe this is a Singularity issue as it just creates the environment for you, which normally runs fine.
TagsNo tags attached.
abrt_hash
URL

Activities

There are no notes attached to this issue.

Issue History

Date Modified Username Field Change
2020-07-09 22:36 mattmix New Issue