View Issue Details

IDProjectCategoryView StatusLast Update
0002706CentOS-5yumpublic2009-01-07 17:33
Reporteroroboros 
PrioritynormalSeverityminorReproducibilityrandom
Status feedbackResolutionopen 
Product Version5.1 
Target VersionFixed in Version 
Summary0002706: Yum randomly fails with floating point exception
Description[root@dnvr-xcos-2 ~]# yum update
Floating point exception
[root@dnvr-xcos-2 ~]# uname -a
Linux dnvr-xcos-2.rockynet.com 2.6.18-53.1.4.el5xen #1 SMP Fri Nov 30 01:53:35 EST 2007 i686 i686 i386 GNU/Linux
[root@dnvr-xcos-2 ~]#
Additional InformationThis is from a very default 5.1 install where the only additional package selected was "virtualization". Xen kernel is running and it doesn't seem to matter whether there are VMs running or not. Also LDAP authentication was configured.

From moment to moment yum will randomly work or not. Often even invoking "yum" with no arguments to get the help will also produce this same floating point exception.

Have reproduced across several different hardware platforms (all x86) using different sets of installation media (another co-worker in another city has also run into it).
TagsNo tags attached.

Activities

kbsingh@karan.org

kbsingh@karan.org

2008-02-23 23:10

administrator   ~0006932

you need to provide more details on howto reproduce this issue. Attach anaconda install files ( all 3 from /root - and also anaconda logs from syslog ), also include kickstarts you might have used as well as specific system information.
range

range

2008-02-24 13:16

administrator   ~0006933

Did you update the machine from CentOS 4 via yum?
schlisko

schlisko

2008-03-10 11:38

reporter   ~0007011

I see same issue on two servers of one of our customers. I is a fresh install with 5.0 updated to 5.1.

kernel-xen-2.6.18-53.1.4.el5
xen-libs-3.0.3-41.el5
xen-3.0.3-41.el5

strace yum install unison

ends up with

...
open("/etc/rpm/platform", O_RDONLY|O_LARGEFILE) = 6
fcntl64(6, F_SETFD, FD_CLOEXEC) = 0
fstat64(6, {st_mode=S_IFREG|0644, st_size=18, ...}) = 0
mmap2(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7c83000
poll([{fd=6, events=POLLIN, revents=POLLIN}], 1, 1000) = 1
gettimeofday({1205139784, 308562}, NULL) = 0
nanosleep({0, 20000000}, {3085993304, 4818720}) = 0
gettimeofday({1205139784, 308562}, NULL) = 0
--- SIGFPE (Floating point exception) @ 0 (0) ---
+++ killed by SIGFPE +++
dsdee

dsdee

2008-03-26 20:06

reporter   ~0007068

I'm seeing the same thing on two of my systems, one is a base, Dom0 system running CentOS 5.1 on an Intel Quad Core 2 Duo, and the other is on a DomU on the same Dom0. It is sporadic -- sometimes it works fine, sometimes, I get the FPE repeatedly until I let it sit for a bit, then it is fine.
senomoto

senomoto

2008-04-03 15:27

reporter   ~0007081

I have the same problem with CentOS 5.0 updated to 5.1 .

kernel-xen-2.6.18-53.1.13.el5
xen-3.0.3-41.el5
xen-libs-3.0.3-41.el5
2x Dual-Core AMD Opteron(tm) Processor 2210, 4GB RAM.

I saw this problem using rpm, but at present moment I can't reproduce the problem.
oroboros

oroboros

2008-04-03 16:01

reporter   ~0007087

Preliminary evidence suggests this may be related to a clock synchronization problem as configuring ntp may resolve it.

Our installs where this failed were fresh installs of 5.1 from DVD onto newly formatted filesystems.
dsdee

dsdee

2008-04-03 16:04

reporter   ~0007088

I'm not sure I could agree with that; I do run ntpdate hourly on both my DomU and Dom0 machines that I am receiving this error on. Some DomU's get this inconsistently, and some don't get it at all.

Also, I should have stated earlier that usually when I get the error with YUM, I also can get it with RPM.

Next time I see it, I will try to remember to capture an STRACE of the execution.
Jeff_S

Jeff_S

2008-04-04 21:24

developer   ~0007098

This seems to be a xen bug. Here's the end of a big thread on a similar crash:
http://osdir.com/ml/emulators.xen.devel/2004/msg02546.html

If you go to one of the older messages in the thread, there's a test program provided which seems to be able to reproduce the crash more consistently.
pileofrogs

pileofrogs

2008-12-24 20:03

reporter   ~0008506

I also have this problem, but I seem to have found the solution:

This issue is a problem in RPM having something to do with sleeping and counting cycles.

According to this post:

http://lists.xensource.com/archives/html/xen-devel/2008-07/msg00609.html

More recent versions of RPM have fixed this problem. So, if CentOS were to upgrade, this problem would be fixed.

If you didn't want to go that route, here is a message containing a patch, which I have applied to my own SRPM and am using now. I haven't been running it long enough to say weather or not it actually fixes the problem

http://lists.xensource.com/archives/html/xen-devel/2008-07/msg00605.html

Note, that this is a patch containing a quick hack, not the changes applied to later versions of RPM. So, it might be better to compare older and newer versions of RPM and create a patch from that and add that to CentOS instead.

Thanks
-Dylan
herrold

herrold

2009-01-06 23:04

reporter   ~0008545

I note that the last prior comment's link states:

> That sum_cycles stuff looks like it was added on 04-07-2003 and it was ripped > out on 4-21-2008. I haven't updated my server since probably January and I bet > if I update my rpm to the latest version, the problem is gone.

co-incidence and change are not correlation nor causation. I see no reason to see any confidence on a change in rpm being anything other than voo-doo, from what is cited

The second quote, calling reading: it uses
gettimeofday() to measure the time taken by nanosleep(20ms)

may be bogus in hindsight (I do not agree) ... but it is certainly sour grapes to the seond post to have so characterized it.

Xen in CentOS 5 updated current does NOT meet LSB specs for readability of the system timer. There are ample upstream bugs noting that time skew issues abound in upstream's version of the Xen kernel, and they are long past overdue for an update. See eg,

https://bugzilla.redhat.com/show_bug.cgi?id=426861
Reported: 2007-12-27 11:28 EDT by Rik van Riel (riel@redhat.com)
Modified: 2009-01-05 05:10 EDT (History)

https://bugzilla.redhat.com/show_bug.cgi?id=472393
https://bugzilla.redhat.com/show_bug.cgi?id=464455

-- Russ herrold
pileofrogs

pileofrogs

2009-01-07 17:33

reporter   ~0008548

It's been two weeks since I applied this patch (same as referenced in my earlier post) to the SRPM for rpm and added to my system. I haven't seen the floating point exception since.

Here's a link to the SRPM containing that fix:

http://seattlecentral.edu/~dmartin/dist/rpm-4.4.2-48.1.src.rpm

Usual disclaimers, caveat-emptor, this will probably erase your hard drive, eat your cat, increase global warming etc.. etc..

Issue History

Date Modified Username Field Change
2008-02-22 20:29 oroboros New Issue
2008-02-23 23:10 kbsingh@karan.org Note Added: 0006932
2008-02-23 23:10 kbsingh@karan.org Status new => feedback
2008-02-24 13:16 range Note Added: 0006933
2008-03-10 11:38 schlisko Note Added: 0007011
2008-03-26 20:06 dsdee Note Added: 0007068
2008-04-03 15:27 senomoto Note Added: 0007081
2008-04-03 16:01 oroboros Note Added: 0007087
2008-04-03 16:04 dsdee Note Added: 0007088
2008-04-04 21:24 Jeff_S Note Added: 0007098
2008-12-24 20:03 pileofrogs Note Added: 0008506
2009-01-06 23:04 herrold Note Added: 0008545
2009-01-07 17:33 pileofrogs Note Added: 0008548