View Issue Details

IDProjectCategoryView StatusLast Update
0015695CentOS-7systemdpublic2019-03-28 08:04
Reporterbingshen 
PriorityhighSeveritymajorReproducibilityrandom
Status newResolutionopen 
Platformalibaba cloudOScentosOS Version7.4
Product Version7.4.1708 
Target VersionFixed in Version 
Summary0015695: runc hang on systemd dbus invoke when systemd cgroup driver
DescriptionYesterday one node of my kubernetes cluster became notready. ps -ef showed some docker-runc processes had been running many days

```
root 26579 1303 0 2018 ? 00:00:00 docker-runc --systemd-cgroup=true events --stats c29996ea9566f16616505e7118315635582714308564ba0d9a70f8fb8cf73f0a
root 27841 2913 0 2018 ? 00:00:00 docker-runc --systemd-cgroup=true kill --all 8561b78c9cb19c0d883e30eafc8ff41ddf3007043985271386ffdbafc24d4376 SIGKILL
root 28293 1303 0 2018 ? 00:00:00 docker-runc --systemd-cgroup=true delete 25660e4c1f66593ec33ae57823def641a4c4a9ae1a7c6840afd081961b66e66e
```

After some investigation, I found docker-runc hang when calling systemd.UseSystemd. Below is the stack.

In fact, any dbus method call send to org.freedesktop.systemd1 was not responsed, for example, the below command would wait forever:

dbus-send --system --dest=org.freedesktop.systemd1 --type=method_call --print-reply /org/freedesktop/systemd1 org.freedesktop.DBus.Introspectable.Introspect

Also there were many systemd errors in /var/log/messages:
Jan 4 11:56:31 host-k8s-node001 systemd: Failed to propagate agent release message: Operation not supported

busctl tree reported Failed to introspect object / of service org.freedesktop.systemd1: Connection timed out

Resolved by restarting systemd: systemctl daemon-reexec

more stack info ref: https://github.com/opencontainers/runc/issues/1959
Steps To ReproduceI can not reproduce it by many runc operations. But I get this issue several times on my production environment。
TagsNo tags attached.
abrt_hash
URL

Activities

bingshen

bingshen

2019-03-28 06:46

reporter   ~0034122

This issue fixed by https://github.com/systemd/systemd/pull/11818 in systemd upstream.

Will the centos embedded systemd cherry-pick this fix? and witch version will resolve this?
TrevorH

TrevorH

2019-03-28 07:25

manager   ~0034123

*CentOS* doesn't cherrypick it at all. Redhat will need to do that for RHEL and, once they have and have released the patched version, then CentOS will rebuild and release it. I would suggest raising a ticket on bugzilla.redhat.com to see if they will backport it to the RHEL systemd.

Also, 7.4 is 2 point releases and nearly two years out of date. yum update.
bingshen

bingshen

2019-03-28 08:04

reporter   ~0034124

@TrevorH Thanks, I submitted an issue to rhel buglist.

Issue History

Date Modified Username Field Change
2019-01-11 06:33 bingshen New Issue
2019-03-28 06:46 bingshen Note Added: 0034122
2019-03-28 07:25 TrevorH Note Added: 0034123
2019-03-28 08:04 bingshen Note Added: 0034124