View Issue Details

IDProjectCategoryView StatusLast Update
0014258CentOS-7generalpublic2018-06-21 15:00
Reporterjazaman Assigned To 
PriorityurgentSeverityblockReproducibilityrandom
Status assignedResolutionopen 
PlatformCentosOS7OS Version7.4.1708
Summary0014258: xen domu freezes with kernel 4.9.63-29.el7.x86_64
DescriptionDom-U system simply freezes. The logs end at some point until the new reboot. Sometimes it's still possible to log on to the system, but nothing really works. It is like all IO to the virtual block devices is suspended indefinitely.
Until this happens, the systems seems to work without issues.

Something like 'ls' on a directory listed before still gets a result,
but everything 'new', i.e. 'vim somefile' will cause the shell to stall. sar -u reveals hist I/O wait.

Similar problem is reported for xen for other kernel [https://www.novell.com/support/kb/doc.php?id=7018590] and following their suggestion I have raised gnttab_max_frames=xxx to 256. It was stable 1 weak and then one of the dom-U hangs.
Steps To ReproduceIt happens randomly but very frequently on the production system.
Additional Informationxl -info :

release : 4.9.63-29.el7.x86_64
version : #1 SMP Mon Nov 20 14:39:22 UTC 2017
machine : x86_64
nr_cpus : 32
max_cpu_id : 191
nr_nodes : 2
cores_per_socket : 8
threads_per_core : 2
cpu_mhz : 2100
hw_caps : bfebfbff:2c100800:00000000:00007f00:77fefbff:00000000:00000121:021cbfbb
virt_caps : hvm hvm_directio
total_memory : 130978
free_memory : 68109
sharing_freed_memory : 0
sharing_used_memory : 0
outstanding_claims : 0
free_cpus : 0
xen_major : 4
xen_minor : 6
xen_extra : .6-6.el7
xen_version : 4.6.6-6.el7
xen_caps : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-x86_64
xen_scheduler : credit
xen_pagesize : 4096
platform_params : virt_start=0xffff800000000000
xen_changeset : Fri Nov 17 18:32:23 2017 +0000 git:a559dc3-dirty
xen_commandline : placeholder dom0_mem=2048M,max:2048M cpuinfo com1=115200,8n1 console=com1,tty loglvl=all guest_loglvl=all gnttab_max_frames=256
cc_compiler : gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-16)
cc_compile_by : mockbuild
cc_compile_domain : centos.org
cc_compile_date : Mon Nov 20 12:28:41 UTC 2017
xend_config_format : 4
Tagsxen, Xen4CentOS
abrt_hash
URL

Activities

deltadarren

deltadarren

2018-04-13 10:21

reporter   ~0031614

I can confirm the same issue. We have been migrating hypervisors to CentOS 7 using xen and have been experiencing DomU's locking up. Across different environments we have different numbers of VMs all doing a variety of jobs. Some are fairly light (such as a Salt master) that don't do much most of the time and some are backup database hosts that are consistently using a lot of CPU & IO. The DomU's are all Linux of differing CentOS versions

We have tried increasing the ''gnttab_max_frames'' to 256 as per the original poster's change (and Gentoo & Novell both advise this too), all was fine for around a week, and then we started seeing the DomU's lock up. We're unable to login at all. Sometimes, we can get as far as typing a username, but no password prompt appears, other times, we can't even do that. We've tried changing the vm_dirty settings but to no avail. I've tried increasing debug levels, but there's nothing being shown prior to the lockup and no unusual behaviour; the DomU just stops.

Hypervisor is running kernel 4.9.75-29.el7.x86_64 and xl info is as follows:
release : 4.9.75-29.el7.x86_64
version : #1 SMP Fri Jan 5 19:42:28 UTC 2018
machine : x86_64
nr_cpus : 40
max_cpu_id : 191
nr_nodes : 2
cores_per_socket : 10
threads_per_core : 2
cpu_mhz : 2197
hw_caps : bfebfbff:2c100800:00000000:00007f00:77fefbff:00000000:00000121:021cbfbb
virt_caps : hvm hvm_directio
total_memory : 81826
free_memory : 13777
sharing_freed_memory : 0
sharing_used_memory : 0
outstanding_claims : 0
free_cpus : 0
xen_major : 4
xen_minor : 6
xen_extra : .6-10.el7
xen_version : 4.6.6-10.el7
xen_caps : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-x86_64
xen_scheduler : credit
xen_pagesize : 4096
platform_params : virt_start=0xffff800000000000
xen_changeset : Thu Mar 1 17:24:01 2018 -0600 git:2a1e1e0-dirty
xen_commandline : placeholder dom0_mem=4096M,max:4096M cpuinfo com1=115200,8n1 console=com1,tty loglvl=all guest_loglvl=all gnttab_max_frames=256
cc_compiler : gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-16)
cc_compile_by : mockbuild
cc_compile_domain : centos.org
cc_compile_date : Mon Mar 5 18:00:43 UTC 2018
xend_config_format : 4

I'm struggling massively to find anyone else still having issues after making the ''gnttab_max_frames'' change and can't believe there's only 2 of us still seeing this error. If any additional debug output is required, please let me know and I'll be happy to provide
deltadarren

deltadarren

2018-06-19 11:16

reporter   ~0032112

Just an update on this problem. We eventually upgraded Xen to 4.8 and changed the kernel that the VMs use to the LTS kernel from elrepo. Since doing this, we haven't had any lock ups or freezes in over a month. It might be that the 4.8 upgrade is enough, but we tried a newer kernel first as the initial attempt to update to 4.8 failed.
toracat

toracat

2018-06-19 14:45

manager   ~0032114

@deltadarren

"LTS kernel from elrepo" = kernel-lt ? So it is kernel 4.4.x that worked.
jazaman

jazaman

2018-06-21 15:00

reporter   ~0032126

@toracat and @deltadarren I am kinda new to all these so some more information would be really helpful. How did you upgraded to xen 4.8? My dom0 Kernel was CentOS7 and yum check-update shows only 4.6.6-12.el7 is available.

Issue History

Date Modified Username Field Change
2017-12-09 06:37 jazaman New Issue
2017-12-09 06:37 jazaman Tag Attached: xen
2017-12-09 06:37 jazaman Tag Attached: Xen4CentOS
2017-12-09 20:16 avij Project Xen4 => CentOS-7
2018-04-13 10:21 deltadarren Note Added: 0031614
2018-06-19 11:16 deltadarren Note Added: 0032112
2018-06-19 14:45 toracat Note Added: 0032114
2018-06-19 14:46 toracat Status new => assigned
2018-06-21 15:00 jazaman Note Added: 0032126