View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0018075 | CentOS-7 | libvirt | public | 2021-02-18 16:49 | 2021-02-18 16:52 |
Reporter | ThomasLef | Assigned To | |||
Priority | high | Severity | major | Reproducibility | random |
Status | new | Resolution | open | ||
OS | CentOS 7 RT | OS Version | 7.6 | ||
Product Version | 7.6.1810 | ||||
Summary | 0018075: when starting VMs sometimes one VM will get to 100% cpu usage immediatly and libvirtd service will freeze. | ||||
Description | I have 9 VMs that i start with some sleep time in between. Most of the time they all start and work just fine, but sometimes, at start, one VM will see its cpu usage ramp up to 100% in virt-manager and stay stuck, subsequent vm start will fail with the following errors: Domain VM1 started Domain VM2 started error: Failed to start domain VM3 error: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken. error: failed to connect to the hypervisor error: error from service: CheckAuthorization: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken. error: failed to connect to the hypervisor error: error from service: CheckAuthorization: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken. error: failed to connect to the hypervisor error: error from service: CheckAuthorization: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken. error: failed to connect to the hypervisor error: error from service: CheckAuthorization: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken. error: failed to connect to the hypervisor error: error from service: CheckAuthorization: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken. error: failed to connect to the hypervisor error: error from service: CheckAuthorization: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken. The script I use do reproduce that does the following: for i in $(seq 1 100) do virsh start vm0 sleep 1 virsh start vm1 sleep 20 virsh start vm2 sleep 1 virsh start vm3 sleep 1 virsh start vm4 sleep 1 virsh start vm5 sleep 1 virsh start vm6 sleep 1 virsh start vm7 sleep 1 virsh start vm8 vm_count=$(pgrep qemu-kvm |wc -l); echo "AFTER START : VM COUNT = $vm_count"; if [ "$vm_count" -ne "9" ]; then echo "failed start at iteration $i" echo "only $vm_count/9 vm started" exit -1; fi; sleep 60 #waiting for all vm to be fully started virsh shutdown vm0 sleep 1 virsh shutdown vm2 virsh shutdown vm3 virsh shutdown vm4 virsh shutdown vm5 virsh shutdown vm6 virsh shutdown vm7 virsh shutdown vm8 sleep 15 virsh shutdown vm1 sleep 15; vm_count=$(pgrep qemu-kvm |wc -l); echo "AFTER STOP : VM COUNT = $vm_count"; if [ "$vm_count" -ne "0" ]; then echo "failed stop at iteration $i" echo "$vm_count/9 not stopped" exit -1; fi; done => most recent experiment got the failure on the 23rd iteration of this loop. the cpu-maxed vm as seen in top : #top top - 16:49:05 up 2 days, 4:34, 2 users, load average: 79.24, 78.90, 76.41 Tasks: 2605 total, 68 running, 2527 sleeping, 0 stopped, 10 zombie %Cpu(s): 9.9 us, 0.0 sy, 0.0 ni, 90.1 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem : 79204140+total, 76074483+free, 23655132 used, 7641428 buff/cache KiB Swap: 0 total, 0 free, 0 used. 76729036+avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 76846 qemu 20 0 32.1g 1.0g 11436 R 1824 0.1 1076:48 qemu-kvm i've tried to attach gdb to this pid to apply "thread apply all bt", to no avail, gdb also hangs once attached. Same result trying to attach gdb to libvirtd process. going root and trying to monitor libvirtd.service also hangs without any info displayed : # su - # systemctl status libvirtd.service ... system details : # rpm -qa |grep qemu libvirt-daemon-driver-qemu-4.5.0-33.el7_8.1.x86_64 qemu-system-moxie-2.0.0-1.el7.6.x86_64 qemu-system-m68k-2.0.0-1.el7.6.x86_64 ipxe-roms-qemu-20180825-2.git133f4c.el7.noarch qemu-system-alpha-2.0.0-1.el7.6.x86_64 qemu-system-arm-2.0.0-1.el7.6.x86_64 qemu-system-microblaze-2.0.0-1.el7.6.x86_64 qemu-system-x86-2.0.0-1.el7.6.x86_64 qemu-system-s390x-2.0.0-1.el7.6.x86_64 qemu-system-xtensa-2.0.0-1.el7.6.x86_64 qemu-2.0.0-1.el7.6.x86_64 qemu-img-1.5.3-173.el7_8.3.x86_64 qemu-kvm-1.5.3-173.el7_8.3.x86_64 qemu-common-2.0.0-1.el7.6.x86_64 qemu-system-unicore32-2.0.0-1.el7.6.x86_64 qemu-system-cris-2.0.0-1.el7.6.x86_64 qemu-system-lm32-2.0.0-1.el7.6.x86_64 qemu-user-2.0.0-1.el7.6.x86_64 qemu-guest-agent-2.12.0-3.el7.x86_64 qemu-system-sh4-2.0.0-1.el7.6.x86_64 qemu-kvm-tools-1.5.3-173.el7_8.3.x86_64 qemu-kvm-common-1.5.3-173.el7_8.3.x86_64 qemu-system-or32-2.0.0-1.el7.6.x86_64 qemu-system-mips-2.0.0-1.el7.6.x86_64 # rpm -qa |grep libvirt libvirt-client-4.5.0-33.el7_8.1.x86_64 libvirt-daemon-driver-qemu-4.5.0-33.el7_8.1.x86_64 libvirt-daemon-driver-storage-4.5.0-33.el7_8.1.x86_64 libvirt-glib-1.0.0-1.el7.x86_64 libvirt-daemon-4.5.0-33.el7_8.1.x86_64 libvirt-daemon-driver-storage-disk-4.5.0-33.el7_8.1.x86_64 libvirt-python-4.5.0-1.el7.x86_64 libvirt-daemon-config-nwfilter-4.5.0-33.el7_8.1.x86_64 libvirt-4.5.0-33.el7_8.1.x86_64 libvirt-gobject-1.0.0-1.el7.x86_64 libvirt-daemon-driver-storage-core-4.5.0-33.el7_8.1.x86_64 libvirt-daemon-driver-storage-logical-4.5.0-33.el7_8.1.x86_64 libvirt-daemon-driver-storage-gluster-4.5.0-33.el7_8.1.x86_64 libvirt-daemon-driver-nwfilter-4.5.0-33.el7_8.1.x86_64 libvirt-bash-completion-4.5.0-33.el7_8.1.x86_64 libvirt-daemon-driver-lxc-4.5.0-33.el7_8.1.x86_64 libvirt-daemon-driver-network-4.5.0-33.el7_8.1.x86_64 libvirt-daemon-driver-storage-iscsi-4.5.0-33.el7_8.1.x86_64 libvirt-daemon-driver-storage-mpath-4.5.0-33.el7_8.1.x86_64 libvirt-daemon-driver-secret-4.5.0-33.el7_8.1.x86_64 libvirt-daemon-kvm-4.5.0-33.el7_8.1.x86_64 libvirt-daemon-driver-storage-scsi-4.5.0-33.el7_8.1.x86_64 libvirt-daemon-driver-interface-4.5.0-33.el7_8.1.x86_64 libvirt-gconfig-1.0.0-1.el7.x86_64 libvirt-daemon-config-network-4.5.0-33.el7_8.1.x86_64 libvirt-libs-4.5.0-33.el7_8.1.x86_64 libvirt-daemon-driver-storage-rbd-4.5.0-33.el7_8.1.x86_64 libvirt-daemon-driver-nodedev-4.5.0-33.el7_8.1.x86_64 # uname -r 3.10.0-1127.rt56.1093.el7.x86_64 # lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 192 On-line CPU(s) list: 0-191 Thread(s) per core: 2 Core(s) per socket: 24 Socket(s): 4 NUMA node(s): 4 Vendor ID: GenuineIntel CPU family: 6 Model: 85 Model name: Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz Stepping: 7 CPU MHz: 1572.052 CPU max MHz: 3700.0000 CPU min MHz: 1000.0000 BogoMIPS: 4200.00 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 1024K L3 cache: 36608K NUMA node0 CPU(s): 0-23,96-119 NUMA node1 CPU(s): 24-47,120-143 NUMA node2 CPU(s): 48-71,144-167 NUMA node3 CPU(s): 72-95,168-191 Any idea on this issue and or how to pin-down the root cause ? | ||||
Tags | No tags attached. | ||||
abrt_hash | |||||
URL | |||||