View Issue Details

IDProjectCategoryView StatusLast Update
0017140CentOS-8plymouthpublic2020-05-23 19:50
Reporterolahaye74 
PrioritynormalSeveritymajorReproducibilityalways
Status newResolutionopen 
Product Version8.1.1911 
Target VersionFixed in Version 
Summary0017140: Plymouth crash in initrd
DescriptionBug similar to https://bugs.centos.org/view.php?id=9644 but for CentOS-8.1

Hitting this issue on CentOS-8.1(CentOS Linux release 8.1.1911 (Core) )
plymouth-0.9.3-15.el8.x86_64

Crash:
access("/run/udev/control", F_OK) = 0
socket(AF_NETLINK, SOCK_RAW|SOCK_CLOEXEC|SOCK_NONBLOCK, NETLINK_KOBJECT_UEVENT) = 7
setsockopt(7, SOL_SOCKET, SO_ATTACH_FILTER, {len=18, filter=0x7fffa210cc00}, 16) = 0
bind(7, {sa_family=AF_NETLINK, nl_pid=0, nl_groups=0x000002}, 12) = 0
getsockname(7, {sa_family=AF_NETLINK, nl_pid=4681, nl_groups=0x000002}, [12]) = 0
setsockopt(7, SOL_SOCKET, SO_PASSCRED, [1], 4) = 0
epoll_ctl(3, EPOLL_CTL_ADD, 7, {EPOLLERR|EPOLLHUP, {u32=952875488, u64=94412923976160}}) = 0
epoll_ctl(3, EPOLL_CTL_MOD, 7, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=952875488, u64=94412923976160}}) = 0
getpid() = 4681
gettid() = 4681
getrandom("\xfa\xd7\x9b\xf5\x0d\xec\xb1\x40\xd5\x66\xbc\x19\xdf\xa2\xf0\x55", 16, GRND_NONBLOCK) = 16
access("/sys/sub = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/sys/bus", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 9
fstat(9, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
getdents64(9, /* 34 entries */, 32768) = 952
getdents64(9, /* 0 entries */, 32768) = 0
close(9) = 0
openat(AT_FDCWD, "/sys/class", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 9
fstat(9, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
getdents64(9, /* 50 entries */, 32768) = 1448
getdents64(9, /* 0 entries */, 32768) = 0
close(9) = 0
write(2, "plymouthd: ply-event-loop.c:913: ply_event_loop_watch_for_timeout: Assertion `seconds > 0.0' failed.\n", 101) = 101
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f6dcee12000
rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1], [], 8) = 0
getpid() = 4681
gettid() = 4681
tgkill(4681, 4681, SIGABRT) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
--- SIGABRT {si_signo=SIGABRT, si_code=SI_TKILL, si_pid=4681, si_uid=0} ---
openat(AT_FDCWD, "/dev/tty1", O_RDWR|O_NOCTTY) = 9
ioctl(9, KDSETMODE, 0) = 0
write(9, "\33[?25h\0", 7) = 7
ioctl(9, TCGETS, {B38400 opost isig icanon echo ...}) = 0
ioctl(9, SNDCTL_TMR_CONTINUE or TCSETSF, {B38400 opost isig icanon echo ...}) = 0
close(9) = 0
unlink("/var/run/plymouth/pid") = 0
Steps To ReproduceUse those kernels and ignited.img. run them using Qemu. Plymouth will crash
when you reach the shell, you'll have many commands to debug including strace.
http://olivier.lahaye1.free.fr/SystemImager/plymouth_crash/
TagsNo tags attached.

Activities

olahaye74

olahaye74

2020-05-18 14:38

reporter   ~0036960

Plymouth strace and dmesg information.
Bug reported upstream as well: https://bugzilla.redhat.com/show_bug.cgi?id=1836828
Relavant info in strace:
access("/sys/devices/virtual/drm/ttm/uevent", F_OK) = 0
readlinkat(AT_FDCWD, "/sys/devices/virtual/drm/ttm/subsystem", "../../../../class/drm", 99) = 21
openat(AT_FDCWD, "/sys/devices/virtual/drm/ttm/uevent", O_RDONLY|O_CLOEXEC) = 10
fstat(10, {st_mode=S_IFREG|0644, st_size=4096, ...}) = 0
fstat(10, {st_mode=S_IFREG|0644, st_size=4096, ...}) = 0
read(10, "DEVTYPE=ttm\n", 4096) = 12
read(10, "", 4096) = 0
close(10) = 0
openat(AT_FDCWD, "/run/udev/data/+drm:ttm", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
write(2, "@lymouthd: ply-event-loop.c:913: ply_event_loop_watch_for_timeout: Assertion `seconds > 0.0' failed.\n", 101) = 101
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f1f31b20000
rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1], [], 8) = 0
getpid() = 5844
gettid() = 5844
tgkill(5844, 5844, SIGABRT) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
--- SIGABRT {si_signo=SIGABRT, si_code=SI_TKILL, si_pid=5844, si_uid=0} ---

dmesg.log (34,215 bytes)
Plymouth_udev.log (70,507 bytes)
olahaye74

olahaye74

2020-05-23 13:10

reporter   ~0036976

I've reported the bug to Plymouth bugzilla as well.
https://gitlab.freedesktop.org/plymouth/plymouth/-/issues/112
All material needed to reproduce the problem has been made available. Read the ticket description that tells what to download and how to test.
olahaye74

olahaye74

2020-05-23 15:17

reporter   ~0036977

When doing: plymouth --debug
The Plymouth log is:
00:16:52.991 main.c:854:plymouth_should_show_default_splash : checking if plymouth should show default splash
00:16:52.992 main.c:882:plymouth_should_show_default_splash : using default splash because kernel command line has option "splash"
00:16:52.994 ply-device-manager.c:886:create_devices_from_terminals : checking for consoles
00:16:52.994 ply-device-manager.c:664:add_consoles_from_file : opening /sys/class/tty/console/active
00:16:52.995 ply-device-manager.c:672:add_consoles_from_file : reading file
00:16:52.996 ply-device-manager.c:710:add_consoles_from_file : console /dev/tty7 found!
00:16:52.997 ply-device-manager.c:486:watch_for_udev_events : watching for udev graphics device add and remove events
00:16:52.998 ply-device-manager.c:346:create_devices_for_subsystem : creating objects for drm devices

Then:
@lymouthd: ply-event-loop.c:913: ply_event_loop_watch_for_timeout: Assertion `seconds > 0.0' failed.

When looking at dmesg:
access("/sys/devices/virtual/drm/ttm/uevent", F_OK) = 0
openat(AT_FDCWD, "/sys/devices/virtual/drm/ttm/uevent", O_RDONLY|O_CLOEXEC) = 10
fstat(10, {st_mode=S_IFREG|0644, st_size=4096, ...}) = 0
fstat(10, {st_mode=S_IFREG|0644, st_size=4096, ...}) = 0
read(10, "DEVTYPE=ttm\n", 4096) = 12
read(10, "", 4096) = 0
close(10) = 0
readlinkat(AT_FDCWD, "/sys/devices/virtual/drm/ttm/subsystem", "../../../../class/drm", 99) = 21
openat(AT_FDCWD, "/run/udev/data/+drm:ttm", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)

Can't it be a missing dev rule?
olahaye74

olahaye74

2020-05-23 19:50

reporter   ~0036978

Bug tracked down; please fix.
Looking at where the problem seems to occur, I suspected that the problem was here:
in src/libply-splash-core/ply-device-manager.c at line 974 ply_event_loop_watch_for_timeout (in ply_device_manager_watch_devices() function))
Indeed, the console is correctly setup and this occurs at line 954 of that file, and there after, there is a specific ifdef HAVE_UDEV
According to the ABRT error, the device_timout variable which is a double is incorrect (not > 0.0). So either it is equal 0.0 (not initialized) or NaN or corrupted and seen as negative.

ply_device_manager_watch_devices() is called from main.c:1093 in load_devices(). The timeout value comes from state->device_timeout.
BUT at line 2241 in main(), state.device_timeout is set to NAN and at line 2278, the load_devices() is called with &state parameter!!!!
So, between line 2241 where state.device_timeout is set to NAN and line 2278 where it is used, the state.device_timout MUST be set to something > 0.0...

At line 2257, find_distribution_default_splash(&state) is called
in find_distribution_default_splash() line 428, load_settings is called
In my settings file initrd:/usr/share/plymouth/plymouthd.defaults I have the following content:
# SystemImager is the default theme
[Daemon]
Theme=systemimager
ShowDelay=0
# DeviceTimeout=5

in main.c:310, we have: timeout_string = ply_key_file_get_value (key_file, "Daemon", "DeviceTimeout");
Below, timeout_string is checked against NULL.
As the trace "Device timout is set to ...." is not in the log, This means that timeout _string is NULL.
And IT IS as ply_key_file_get_value() returns NULL if ply_key_file_get_raw_value() returns NULL, and ply_key_file_get_raw_value() returns NULL if group ("Daemon" here) is not found. (and it is not found as it is commented out)

THUS, state->device_timeout is still NAN value!!! BUG #1 (no default value AFAICT)

IMHO, there should be a else statement at line 317 that would set state->device_timeout to a default value (maybe 5.0) ?

BUG #2: at line 313 the result of atof() is not checked against value of 0.0 while the ply_event_loop_watch_for_timeout() require NON-NULL double.

WARNING #3: at line 313: strtod() should be used instead of atof() according to atof manual. More over, you don't take care of locale LC_NU MERIC! (use ply_key_file_get_double() which does instead)

BUG #4: at line 310: you should use state->device_timeout = ply_key_file_get_double (key_file, "Daemon", "DeviceTimeout", 5.0);
 By that way, a default value would exists. Be careful, if strtod() fails in ply_key_file_get_double(), a value of 0.0 is returned. A test against this invalid value for DeviceTimeout MUST exists!
 (also you can have NaN in config file that atof() know how to deal with. So MAKE TESTS!

BUG #5: Same problem for ShowDelay (though, I assume value of 0 is acceptable)

BUG #6: Same problem for DeviceScale (but there is no ply_key_file_get_unsigned_long() and surtout() provides 0 as default value)

Updating settings with this content allows Plymouthd to start:
# SystemImager is the default theme
[Daemon]
Theme=systemimager
ShowDelay=0.1
DeviceTimeout=2.0

IMHO, in order to fix things:

1 - replace lines 296 to 304 with:
    state->splash_delay = ply_key_file_get_double (key_file, "Daemon", "DeviceTimeout", 5.0); /* replace 5.0 with define PLYMOUTH_DEFAULT_SPLASH_DELAY (hardcoded value is BAD) */
    if (!(state->splash_delay > 0.0)) {
        ply_trace ("Splash delay is invalid. defaulting to %d", PLYMOUTH_DEFAULT_SPLASH_DELAY);
    state->splash_delay= PLYMOUTH_DEFAULT_SPLASH_DELAY;
    }

2 - same operation for lines 308 to 317 regarding DeviceTimeout test must use ">= instead of > if zero is acceptable)

3 - Maybe check that what surtout() line 323 returns something that Plymouth can handle.

Please fix.

Issue History

Date Modified Username Field Change
2020-03-12 09:20 olahaye74 New Issue
2020-05-18 14:38 olahaye74 File Added: dmesg.log
2020-05-18 14:38 olahaye74 File Added: Plymouth_udev.log
2020-05-18 14:38 olahaye74 Note Added: 0036960
2020-05-23 13:10 olahaye74 Note Added: 0036976
2020-05-23 15:17 olahaye74 Note Added: 0036977
2020-05-23 19:50 olahaye74 Note Added: 0036978