View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0014258 | CentOS-7 | general | public | 2017-12-09 06:37 | 2018-06-21 15:00 |
Reporter | jazaman | Assigned To | |||
Priority | urgent | Severity | block | Reproducibility | random |
Status | assigned | Resolution | open | ||
Platform | Centos | OS | 7 | OS Version | 7.4.1708 |
Summary | 0014258: xen domu freezes with kernel 4.9.63-29.el7.x86_64 | ||||
Description | Dom-U system simply freezes. The logs end at some point until the new reboot. Sometimes it's still possible to log on to the system, but nothing really works. It is like all IO to the virtual block devices is suspended indefinitely. Until this happens, the systems seems to work without issues. Something like 'ls' on a directory listed before still gets a result, but everything 'new', i.e. 'vim somefile' will cause the shell to stall. sar -u reveals hist I/O wait. Similar problem is reported for xen for other kernel [https://www.novell.com/support/kb/doc.php?id=7018590] and following their suggestion I have raised gnttab_max_frames=xxx to 256. It was stable 1 weak and then one of the dom-U hangs. | ||||
Steps To Reproduce | It happens randomly but very frequently on the production system. | ||||
Additional Information | xl -info : release : 4.9.63-29.el7.x86_64 version : #1 SMP Mon Nov 20 14:39:22 UTC 2017 machine : x86_64 nr_cpus : 32 max_cpu_id : 191 nr_nodes : 2 cores_per_socket : 8 threads_per_core : 2 cpu_mhz : 2100 hw_caps : bfebfbff:2c100800:00000000:00007f00:77fefbff:00000000:00000121:021cbfbb virt_caps : hvm hvm_directio total_memory : 130978 free_memory : 68109 sharing_freed_memory : 0 sharing_used_memory : 0 outstanding_claims : 0 free_cpus : 0 xen_major : 4 xen_minor : 6 xen_extra : .6-6.el7 xen_version : 4.6.6-6.el7 xen_caps : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-x86_64 xen_scheduler : credit xen_pagesize : 4096 platform_params : virt_start=0xffff800000000000 xen_changeset : Fri Nov 17 18:32:23 2017 +0000 git:a559dc3-dirty xen_commandline : placeholder dom0_mem=2048M,max:2048M cpuinfo com1=115200,8n1 console=com1,tty loglvl=all guest_loglvl=all gnttab_max_frames=256 cc_compiler : gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-16) cc_compile_by : mockbuild cc_compile_domain : centos.org cc_compile_date : Mon Nov 20 12:28:41 UTC 2017 xend_config_format : 4 | ||||
Tags | xen, Xen4CentOS | ||||
abrt_hash | |||||
URL | |||||
I can confirm the same issue. We have been migrating hypervisors to CentOS 7 using xen and have been experiencing DomU's locking up. Across different environments we have different numbers of VMs all doing a variety of jobs. Some are fairly light (such as a Salt master) that don't do much most of the time and some are backup database hosts that are consistently using a lot of CPU & IO. The DomU's are all Linux of differing CentOS versions We have tried increasing the ''gnttab_max_frames'' to 256 as per the original poster's change (and Gentoo & Novell both advise this too), all was fine for around a week, and then we started seeing the DomU's lock up. We're unable to login at all. Sometimes, we can get as far as typing a username, but no password prompt appears, other times, we can't even do that. We've tried changing the vm_dirty settings but to no avail. I've tried increasing debug levels, but there's nothing being shown prior to the lockup and no unusual behaviour; the DomU just stops. Hypervisor is running kernel 4.9.75-29.el7.x86_64 and xl info is as follows: release : 4.9.75-29.el7.x86_64 version : #1 SMP Fri Jan 5 19:42:28 UTC 2018 machine : x86_64 nr_cpus : 40 max_cpu_id : 191 nr_nodes : 2 cores_per_socket : 10 threads_per_core : 2 cpu_mhz : 2197 hw_caps : bfebfbff:2c100800:00000000:00007f00:77fefbff:00000000:00000121:021cbfbb virt_caps : hvm hvm_directio total_memory : 81826 free_memory : 13777 sharing_freed_memory : 0 sharing_used_memory : 0 outstanding_claims : 0 free_cpus : 0 xen_major : 4 xen_minor : 6 xen_extra : .6-10.el7 xen_version : 4.6.6-10.el7 xen_caps : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-x86_64 xen_scheduler : credit xen_pagesize : 4096 platform_params : virt_start=0xffff800000000000 xen_changeset : Thu Mar 1 17:24:01 2018 -0600 git:2a1e1e0-dirty xen_commandline : placeholder dom0_mem=4096M,max:4096M cpuinfo com1=115200,8n1 console=com1,tty loglvl=all guest_loglvl=all gnttab_max_frames=256 cc_compiler : gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-16) cc_compile_by : mockbuild cc_compile_domain : centos.org cc_compile_date : Mon Mar 5 18:00:43 UTC 2018 xend_config_format : 4 I'm struggling massively to find anyone else still having issues after making the ''gnttab_max_frames'' change and can't believe there's only 2 of us still seeing this error. If any additional debug output is required, please let me know and I'll be happy to provide |
|
Just an update on this problem. We eventually upgraded Xen to 4.8 and changed the kernel that the VMs use to the LTS kernel from elrepo. Since doing this, we haven't had any lock ups or freezes in over a month. It might be that the 4.8 upgrade is enough, but we tried a newer kernel first as the initial attempt to update to 4.8 failed. | |
@deltadarren "LTS kernel from elrepo" = kernel-lt ? So it is kernel 4.4.x that worked. |
|
@toracat and @deltadarren I am kinda new to all these so some more information would be really helpful. How did you upgraded to xen 4.8? My dom0 Kernel was CentOS7 and yum check-update shows only 4.6.6-12.el7 is available. | |
Date Modified | Username | Field | Change |
---|---|---|---|
2017-12-09 06:37 | jazaman | New Issue | |
2017-12-09 06:37 | jazaman | Tag Attached: xen | |
2017-12-09 06:37 | jazaman | Tag Attached: Xen4CentOS | |
2017-12-09 20:16 | avij | Project | Xen4 => CentOS-7 |
2018-04-13 10:21 | deltadarren | Note Added: 0031614 | |
2018-06-19 11:16 | deltadarren | Note Added: 0032112 | |
2018-06-19 14:45 | toracat | Note Added: 0032114 | |
2018-06-19 14:46 | toracat | Status | new => assigned |
2018-06-21 15:00 | jazaman | Note Added: 0032126 |