View Issue Details

IDProjectCategoryView StatusLast Update
0002968CentOS-5xenpublic2013-05-14 17:01
Reportersimonb 
PrioritynormalSeveritymajorReproducibilityalways
Status resolvedResolutionfixed 
Product Version5.2 
Target VersionFixed in Version5.5 
Summary0002968: xm save fails with 32-bit domU on 64-bit dom0 and locks out further xm commands
DescriptionDom0 kernel is kernel-xen-2.6.18-92.1.6.el5 x86_64
DomU kernel is kernel-xen-2.6.18-92.1.6.el5 i686

Both Dom0 and DomU are updated as of 9th July 2008.

xm save results in a very small saved state file and no error message reported to the console. No further xm command works.
For example,

[root@xen1 xen]# xm save template32 /var/lib/xen/save/template32.save
[root@xen1 xen]# xm list
Error: Device 0 not connected
Usage: xm list [options] [Domain, ...]

List information about all/some domains.
  -l, --long Output all VM details in SXP
  --label Include security labels

Additional InformationSee attached file for xend.log during minute of failed xm save attempt.
TagsNo tags attached.

Activities

2008-07-09 10:58

 

xend.log (124,378 bytes)
herrold

herrold

2008-09-02 17:56

reporter   ~0007915

Looks valid here

[root@centos-5 ~]# ls -al /var/lib/xen/save
ls: /var/lib/xen/save: No such file or directory
[root@centos-5 ~]# mkdir -p /var/lib/xen/save
[root@centos-5 ~]# xm list
Name ID Mem(MiB) VCPUs State Time(s)
32bitcentos5 81 255 1 -b---- 1098.7
64bitcentos5 78 255 1 -b---- 970.0
Domain-0 0 2904 2 r----- 533363.2
win2000 83 519 1 ------ 168.1
[root@centos-5 ~]# xm save 32bitcentos5 /var/lib/xen/save/32bitcentos5.save
[root@centos-5 ~]# xm list
Error: Device 0 not connected
Usage: xm list [options] [Domain, ...]

List information about all/some domains.
  -l, --long Output all VM details in SXP
  --label Include security labels

[root@centos-5 ~]# xm list -l
Error: Device 0 not connected
Usage: xm list [options] [Domain, ...]

List information about all/some domains.
  -l, --long Output all VM details in SXP
  --label Include security labels

[root@centos-5 ~]# ls -al /var/lib/xen/save
total 24
drwxr-xr-x 2 root root 4096 Sep 2 13:50 .
drwxr-xr-x 8 root root 4096 Sep 2 13:49 ..
-rwxr-xr-x 1 root root 960 Sep 2 13:50 32bitcentos5.save
[root@centos-5 ~]# ls -al /var/lib/xen/save/32bitcentos5.save
-rwxr-xr-x 1 root root 960 Sep 2 13:50 /var/lib/xen/save/32bitcentos5.save
[root@centos-5 ~]# less /var/lib/xen/save/32bitcentos5.save
"/var/lib/xen/save/32bitcentos5.save" may be a binary file. See it anyway?
[root@centos-5 ~]# man xm
[root@centos-5 ~]# xm restore /var/lib/xen/save/32bitcentos5.save
Error: Restore failed
Usage: xm restore <CheckpointFile>

Restore a domain from a saved state.
[root@centos-5 ~]#


and a tonne of these in the xend.log

[2008-09-02 13:54:33 xend.XendDomainInfo 8886] INFO (XendDomainInfo:947) Domain has shutdown: name=32bitcentos5 id=81 reason=suspend.
[2008-09-02 13:54:33 xend.XendDomainInfo 8886] INFO (XendDomainInfo:947) Domain has shutdown: name=32bitcentos5 id=81 reason=suspend.
[2008-09-02 13:54:33 xend.XendDomainInfo 8886] INFO (XendDomainInfo:947) Domain has shutdown: name=32bitcentos5 id=81 reason=suspend.


AND it is continuously adding new entries to that effect on a runaway basis

Wonder if this is upstream?
herrold

herrold

2008-09-02 18:17

reporter   ~0007916

restarting the xend ends the message spew, and permits control with xm again.
arrfab

arrfab

2008-09-02 19:04

administrator   ~0007917

i tested the same thing (Dom0 x86_64 and DomU i386) and i have the same results (but i have tested with kernel 2.6.18-92.1.10.el5xen on both the dom0 and domU.
The xm save / xm restore works ok with a x86_64 domU but not with a i386 domU.
It's true that some xm commands doesn't work (except if you restarted the xend service) but for example xm top still works and reports data (and virt-manager as well)
Other domU are not affected by the bug : they continue to work as expected.
It's maybe worth trying to test the same thing on a dom0 i386 to see the results
cgs

cgs

2008-09-11 19:37

reporter   ~0007950

Were the ``xm save'' and ``xm list'' commands in quick succession? I've run into a lot of problems with ``xm list'' after certain operations, including ``xm shutdown''. There seems to be a race condition that I haven't yet pinned down. Try delaying or not using ``xm list'' after other VM state commands.
herrold

herrold

2008-09-11 20:49

reporter   ~0007951

progress notes:

since running the test, the domU in question has refused to start

In trying to track this down, I have installed and uninstalled xen and its component pieces, and the /var/run files, etc many fimes

Today, in manually running the xendomains initscript, I am presented with:

[root@centos-5 init.d]# service xendomains restart
Shutting down Xen domains:Restoring Xen domains: 32bitcentos5.saveError: Restore failed
Usage: xm restore <CheckpointFile>

Restore a domain from a saved state.
!.
[root@centos-5 init.d]#

So there is SOME state file still hanging around ... digging further
herrold

herrold

2008-09-11 20:55

reporter   ~0007952

ok -- it is leaving state here:

[root@centos-5 subsys]# service xendomains restart
Shutting down Xen domains:Restoring Xen domains: 32bitcentos5.saveError: Restore failed
Usage: xm restore <CheckpointFile>

Restore a domain from a saved state.
!.
[root@centos-5 subsys]# locate 32bitcentos5.save
/var/lib/xen/save/32bitcentos5.save
[root@centos-5 subsys]# ls -al /var/lib/xen/save/
total 24
drwxr-xr-x 2 root root 4096 Sep 2 13:50 .
drwxr-xr-x 8 root root 4096 Sep 11 16:42 ..
-rwxr-xr-x 1 root root 960 Sep 2 13:50 32bitcentos5.save
[root@centos-5 subsys]# cd /var/lib/xen/save/
[root@centos-5 save]# cat 32bitcentos5.save
LinuxGuestRecord�(domain (domid 81) (uuid 39bad8b9-3f5f-31b1-51a4-9e7fbb15573e) (vcpus 1) (vcpu_avail 1) (cpu_weight 1.0) (memory 256) (shadow_memory 0) (maxmem 256) (bootloader /usr/bin/pygrub) (features ) (name 32bitcentos5) (on_poweroff destroy) (on_reboot restart) (on_crash restart) (image (linux (ramdisk /var/lib/xen/boot_ramdisk.3xDU5w) (kernel /var/lib/xen/boot_kernel.6JOEU2) (args 'ro root=LABEL=/ console=xvc0'))) (device (vif (backend 0) (script vif-bridge) (bridge xenbr0) (mac 00:16:3e:1f:f0:ea))) (device (vkbd (backend 0))) (device (vfb (backend 0) (type vnc) (vncdisplay 2) (vnclisten 0.0.0.0) (vncpasswd 123456) (display :0.0) (xauthority /root/.xauthoLVxmg))) (device (tap (backend 0) (dev xvda:disk) (uname tap:aio:/opt/xen/32bitcentos5.img) (mode w))) (state -b----) (shutdown_reason poweroff) (cpu_time 1098.76967484) (online_vcpus 1) (up_time 426867.958503) (start_time 1219950954.86) (store_mfn 532247) (console_mfn 532246)[root@centos-5 save]#
herrold

herrold

2008-09-11 21:26

reporter   ~0007953

definite problems in xendomains script ...
[root@centos-5 ~]# service xendomains stop
Shutting down Xen domains: 32bitcentos5(save)../etc/init.d/xendomains: line 181: 7911 Terminated watchdog_xm save
 64bitcentos5(save)/etc/init.d/xendomains: line 299: 7911 Terminated watchdog_xm save
.../etc/init.d/xendomains: line 181: 8078 Terminated watchdog_xm save
/etc/init.d/xendomains: line 262: 8078 Terminated watchdog_xm save
Error: Device 0 not connected
/etc/init.d/xendomains: line 181: 8078 Terminated watchdog_xm save
 SHUTDOWN_ALL .Domain Zombie-migrating-64bitcentos5 terminated
herrold

herrold

2008-09-11 22:05

reporter   ~0007955

possible related upstream to the same effect:
   http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1064

and mentioned on our ML:
   http://lists.centos.org/pipermail/centos-virt/2008-March/000283.html

and that Debian closed it without ever expressly running down why it was happening:
   http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=488284

I think the outreference there to paravirt ops does not apply as I am in inFull virt HW
   http://wiki.xensource.com/xenwiki/XenParavirtOps
dustinblack

dustinblack

2008-09-12 13:44

reporter   ~0007958

Related Red Hat bug:
https://bugzilla.redhat.com/show_bug.cgi?id=425411
herrold

herrold

2008-09-12 17:53

reporter   ~0007963

sadly takeaway is:

http://lists.xensource.com/archives/html/xen-devel/2008-02/msg00425.html

and

  ------- Comment #9 From Don Domingo (ddomingo@redhat.com ) 2008-03-09 20:17:52 EDT (-) [reply] -------

revised note:

<quote>
You can now run 32-bit guests on 64-bit hosts. This capability is now included
as a technology preview. Note, that the ability to save, restore, and migrate
32-bit guests on 64-bit hosts is not functional, and as such should not be
attempted.
</quote>

please advise if any further revisions are required. thanks!

Not much more to say until 5.3 issues, I guess
herrold

herrold

2008-09-12 17:54

reporter   ~0007964

set into confirmed status
herrold

herrold

2008-09-12 17:54

reporter   ~0007965

really set to confirmed this time
dustinblack

dustinblack

2009-02-02 20:18

reporter   ~0008666

Patch posted for Red Hat bug 425211

https://bugzilla.redhat.com/show_bug.cgi?id=425411

I've successfully tested this patch with 32-bit CentOS 5.2 guests live-migrated between 64-bit CentOS 5.2 hosts.
dustinblack

dustinblack

2009-02-02 20:20

reporter   ~0008667

Mistype: 425211 should have been 425411, but the link is correct.
tigalch

tigalch

2013-05-14 17:01

manager   ~0017420

Marked as SLOVED upstream with ERRATA http://rhn.redhat.com/errata/RHBA-2009-1328.html

Issue History

Date Modified Username Field Change
2008-07-09 10:58 simonb New Issue
2008-07-09 10:58 simonb File Added: xend.log
2008-09-02 17:56 herrold Note Added: 0007915
2008-09-02 18:17 herrold Note Added: 0007916
2008-09-02 19:04 arrfab Note Added: 0007917
2008-09-11 19:37 cgs Note Added: 0007950
2008-09-11 20:49 herrold Note Added: 0007951
2008-09-11 20:55 herrold Note Added: 0007952
2008-09-11 21:26 herrold Note Added: 0007953
2008-09-11 22:05 herrold Note Added: 0007955
2008-09-12 13:44 dustinblack Note Added: 0007958
2008-09-12 17:53 herrold Note Added: 0007963
2008-09-12 17:54 herrold Note Added: 0007964
2008-09-12 17:54 herrold Note Added: 0007965
2008-09-12 17:54 herrold Status new => confirmed
2009-02-02 20:18 dustinblack Note Added: 0008666
2009-02-02 20:20 dustinblack Note Added: 0008667
2013-05-14 17:01 tigalch Note Added: 0017420
2013-05-14 17:01 tigalch Status confirmed => resolved
2013-05-14 17:01 tigalch Fixed in Version => 5.5
2013-05-14 17:01 tigalch Resolution open => fixed