View Issue Details

IDProjectCategoryView StatusLast Update
0017385CentOS-7qemu-kvmpublic2020-05-26 05:40
Reportercliff.chen 
PrioritynormalSeveritymajorReproducibilityalways
Status newResolutionopen 
Product Version7.5.1804 
Target VersionFixed in Version 
Summary0017385: VM will hung there until reset from openstack gui because "CPU 0/KVM" uses more than 99%CPU
DescriptionHi Expert,
VM will hung there until reset from openstack gui because "CPU 0/KVM" uses more than 99%CPU.

Do you have any suggestion except try an high version of qemu-kvm?
I think this is qemu-kvm issue because it shouldn't keep using CPU more than 99%

Steps To Reproduce1) Use below GuestVM flavor to create this VM on openstack
Guest VM:
  OS: RHEL8.1
  CPU: 1
  MEM:3G
  disk:16G

2) After 2hs, 3hs, 5 hs, and the longest hours is 10hours, then VM will hung there.
it can't input any infor on VM even on VM console

3)Reset VM by openstack dashboad
however, it will hung again after serveral hours.
Additional InformationSee below for detailed info found so far on host(computer node)
  1) kernel version
Linux overcloud-us30ovscompute-4 3.10.0-862.9.1.el7.x86_64
  2) qemu version
     /usr/libexec/qemu-kvm -version
     QEMU emulator version 2.10.0(qemu-kvm-ev-2.10.0-21.el7_5.4.1)
  3) once the issue is occurred, the CPU of "CPU0 /KVM" is more than 99% by com "top -p VM_pro_ID"
       PID UDER PR NI RES S % CPU %MEM TIME+ COMMAND
      872067 qemu 20 0 1.6g R 99.9 0.6 37:08.87 CPU 0/KVM
  4) use "pstack 493307" and below is function trace
Thread 1 (Thread 0x7f2572e73040 (LWP 872067)):
#0 0x00007f256cad8fcf in ppoll () from /lib64/libc.so.6
#1 0x000055ff34bdf4a9 in qemu_poll_ns ()
#2 0x000055ff34be02a8 in main_loop_wait ()
#3 0x000055ff348bfb1a in main ()

  5) use strace "strace -tt -ff -p 872067 -o cfx" and below log keep printing
21:24:02.977833 ppoll([{fd=4, events=POLLIN}, {fd=6, events=POLLIN}, {fd=8, events=POLLIN}, {fd=9, events=POLLIN}, {fd=80, events=POLLIN}, {fd=82, events=POLLIN}, {fd=84, events=POLLIN}, {fd=115, events=POLLIN}, {fd=121, events=POLLIN}], 9, {0, 0}, NULL, 8) = 0 (Timeout)
21:24:02.977918 ppoll([{fd=4, events=POLLIN}, {fd=6, events=POLLIN}, {fd=8, events=POLLIN}, {fd=9, events=POLLIN}, {fd=80, events=POLLIN}, {fd=82, events=POLLIN}, {fd=84, events=POLLIN}, {fd=115, events=POLLIN}, {fd=121, events=POLLIN}], 9, {0, 911447}, NULL, 8) = 0 (Timeout)
21:24:02.978945 ppoll([{fd=4, events=POLLIN}, {fd=6, events=POLLIN}, {fd=8, events=POLLIN}, {fd=9, events=POLLIN}, {fd=80, events=POLLIN}, {fd=82, events=POLLIN}, {fd=84, events=POLLIN}, {fd=115, events=POLLIN}, {fd=121, events=POLLIN}], 9, {0, 0}, NULL, 8) = 0 (Timeout)

.....
event are like below:
qemu-kvm 493227 qemu 4u a_inode 0,10 0 7542 [signalfd]
qemu-kvm 493227 qemu 5u a_inode 0,10 0 7542 [eventpoll]
qemu-kvm 493227 qemu 6u a_inode 0,10 0 7542 [eventfd]
qemu-kvm 493227 qemu 7u a_inode 0,10 0 7542 [eventpoll]
qemu-kvm 493227 qemu 8u a_inode 0,10 0 7542 [eventfd]
qemu-kvm 493227 qemu 9u a_inode 0,10 0 7542 [eventfd]
.....
qemu-kvm 493227 qemu 115u a_inode 0,10 0 7542 [eventfd]
qemu-kvm 493227 qemu 116u a_inode 0,10 0 7542 [eventfd]
qemu-kvm 493227 qemu 117u a_inode 0,10 0 7542 [eventfd]
qemu-kvm 493227 qemu 118u a_inode 0,10 0 7542 [eventfd]
qemu-kvm 493227 qemu 119u a_inode 0,10 0 7542 [eventfd]
qemu-kvm 493227 qemu 120u a_inode 0,10 0 7542 [eventfd]
qemu-kvm 493227 qemu 121u a_inode 0,10 0 7542 [eventfd]


Therefore, I think the thread "CPU 0/KVM" is in tight loop. it can see above ppoll is called per ms.
Also if thread isn't consistent because it is from different reproduce.

  6) use reset can recover this issue. however, it will reoccurred again.

Current work around is increase one CPU for this VM, then issue is gone.
Tagsqemu-kvm KVM
abrt_hash
URL

Activities

cliff.chen

cliff.chen

2020-05-22 07:34

reporter  

hostcpu.PNG (83,723 bytes)
hostcpu.PNG (83,723 bytes)
cliff.chen

cliff.chen

2020-05-22 07:46

reporter   ~0036974

The only strange thing occured on guest VM before hunging, the Wa is high sometime. see below output by command (top -d 1 -b -H)
.....
%Cpu(s): 2.0 us, 2.0 sy, 0.0 ni, 70.6 id, 23.5 wa, 1.0 hi, 0.0 si, 1.0 st
%Cpu(s): 0.0 us, 2.0 sy, 0.0 ni, 0.0 id, 96.9 wa, 0.0 hi, 1.0 si, 0.0 st
%Cpu(s): 2.0 us, 3.0 sy, 1.0 ni, 0.0 id, 93.1 wa, 1.0 hi, 0.0 si, 0.0 st
%Cpu(s): 1.0 us, 1.0 sy, 0.0 ni, 0.0 id, 97.0 wa, 1.0 hi, 0.0 si, 0.0 st
%Cpu(s): 1.0 us, 2.0 sy, 0.0 ni, 29.7 id, 65.3 wa, 1.0 hi, 1.0 si, 0.0 st
......
cliff.chen

cliff.chen

2020-05-25 02:42

reporter   ~0036981

Hi Expert,
Any suggestions are really appreciated!
Thanks
CLiff
tigalch

tigalch

2020-05-25 15:10

manager   ~0036990

Please update your CentOS-Installation to the latest available packages. We only support the current version which is 7.8.
cliff.chen

cliff.chen

2020-05-26 01:08

reporter   ~0036991

Hi Tigalch,
Ok, Can you confirm this is a qemu-kvm bug?
There are lots of C7K HP box there and it isn't easy to do it quickly.
thank you very much in advance!
Cliff
tigalch

tigalch

2020-05-26 05:22

manager   ~0036992

No I can't. None the less, that release is around 2 years old. And we only support the current release, which is 7.8.
cliff.chen

cliff.chen

2020-05-26 05:28

reporter   ~0036993

Ok, thank you very much!
7.8 means Centos7.8, right?
tigalch

tigalch

2020-05-26 05:28

manager   ~0036994

right
cliff.chen

cliff.chen

2020-05-26 05:40

reporter   ~0036995

Got it. Appreciated your reply!
Cliff

Issue History

Date Modified Username Field Change
2020-05-22 07:34 cliff.chen New Issue
2020-05-22 07:34 cliff.chen File Added: hostcpu.PNG
2020-05-22 07:34 cliff.chen Tag Attached: qemu-kvm KVM
2020-05-22 07:46 cliff.chen Note Added: 0036974
2020-05-25 02:42 cliff.chen Note Added: 0036981
2020-05-25 15:10 tigalch Note Added: 0036990
2020-05-26 01:08 cliff.chen Note Added: 0036991
2020-05-26 05:22 tigalch Note Added: 0036992
2020-05-26 05:28 cliff.chen Note Added: 0036993
2020-05-26 05:28 tigalch Note Added: 0036994
2020-05-26 05:40 cliff.chen Note Added: 0036995