0014170CentOS-7kernelpublic2017-11-23 07:00
Summary0014170: During shutdown or reboot pressure test, the server is hung in the syscall (reboot(RB_POWER_OFF)) or syscall(reboot(RB_AUTOBOOT)
DescriptionDear CentOS engineers,
   Recently in our server machine, when we do the cold reboot(shutdown then wakeup) pressure test. about 4~24 hours later, the Centos7.3 will be hung in the shutdown process, as you can see in the attachment. besides that, our test engineers report that when warm reset pressure test, the server also can be hung in the syscall(reboot(RB_AUTOBOOT)).Same with Rhel7.3. I also provide the sosreport.
   In order to find the problem, I also add some kernel debugging code. including printk and writing specail data to the port of '0x80' with function outb(xx,0x80) in some step. BMC can catch the number from port 0x80, and print in screen or log file.
    You know when system tries to power off the machine, the systemctl service wil first call syscall reboot(RB_ENABLE_CAD), then print "Powering off." ,after that call syscall reboot(RB_POWER_OFF). So i print some log in the SYSCALL_DEFINE4(reboot,...) in the file “kernel/sys.c".
     We can see it hang on syscall reboot(RB_POWER_OFF) in the "systemctl.c" from the the last picture on the screen. As I mentioned above, I write special data to the port (0x80) in some step, and BMC reads and print data from port (0x80). I am sure OS is stuck in the call function "device_shutdown" in file "drivers/base/core.c". To be more accurate, OS is stuck in the "while(...)" loop in the call function "device_shutdown" , where all device that mount on the dir "/sys" call the bus'shutdown or driver's shutdown.
     Same with reboot pressure test, when do warm reboot pressure test. It will also hanged on the "Rebooting.", and comparing with this cold reboot, They have the common function "device_shutdown", So I believe it is the same.
      But in my soft view, I also don't know how to debug. In the normal shutdown process, I know it will loop more than 2200 times in the "while(...)" loop in the call function "device_shutdown". It is impossible to add some debug in every loop(I did before, but not work well).
      I suggest some one device is hung during call the shutdown fuction. Would you please help me or how to debug now? Thank you in advance.
Steps To ReproduceWhen enter into OS,run the bash script Automatically.Main process shows below.
Step1 sleep 120s
Step2 Set wake-up time to 90s (/sys/class/rtc/rtc0/wakealarm)
Step3 run init 0
then repeated again.
info.rar (755,954 bytes)

