2017-08-23 02:11 UTC

View Issue Details Jump to Notes ]
IDProjectCategoryView StatusLast Update
0004465CentOS-5glibcpublic2010-07-30 06:00
Reportersamiam 
PrioritynormalSeverityminorReproducibilityalways
StatusnewResolutionopen 
Product Version5.5 
Target VersionFixed in Version 
Summary0004465: select()'s timeout breaks POSIX spec in CentOS 5.5
DescriptionLet's look at the following program:

/* Tiny program to show CentOS 5.5 bug. Public domain */

#include <sys/select.h>
#include <stdio.h>

main() {
        struct timeval tv;
        tv.tv_sec = 1;
        tv.tv_usec = 0;
        select(0,NULL,NULL,NULL,&tv);
        return 0;
}

The above program should, as per the POSIX specification, take a little over a second to run. And, indeed, it did up through CentOS 5.4.

CentOS 5.5, however, violates POSIX by having the above program take a little over *two* seconds to run.

To reproduce this bug:

The above small program, when compiled as select.delay using 'cc -o select.delay select.delay.c', and invoked with the time command, has the following output in CentOS 5.4:

$ time ./select.delay
real 0m1.098s
user 0m0.000s
sys 0m0.000s

And the following output in CentOS 5.5:

$ time ./select.delay
real 0m2.002s
user 0m0.000s
sys 0m0.005s

This is a bug because the POSIX specification for "select()" at http://www.opengroup.org/onlinepubs/000095399/functions/select.html states that "the timeout period is given in seconds and microseconds" and the POSIX specification for "time.h" at http://www.opengroup.org/onlinepubs/009695399/basedefs/time.h.html points out the tv_sec is seconds, e.g. "time_t tv_sec Seconds".
Additional InformationI am claiming this is a glibc issue, but this could either be a glibc issue or a kernel issue.

Knowing how kernel devs have broken POSIX before (I have to do a "fcntl(socket_number,F_SETFL,O_NONBLOCK)" before running select on incoming UDP connections because Linux incorrectly reports a socket as being non-blocking and ready for input when, in truth, doing a read() from the socket in question blocks), I have a feeling this is a kernel issue.
TagsNo tags attached.
Attached Files

-Relationships
+Relationships

-Notes

~0011693

herrold (reporter)

The standard provides for a minimum wait, but does not require any particular maximum

     The timeout parameter controls how long the pselect() or select()
     function shall take before timing out

How is there a bug?

Second, you are in the wrong venue, if you want this formally addressed in code, as we would not patch that, but rather direct you upstream to file your concern

-- Russ herrold

~0011694

samiam (reporter)

>How is there a bug?

I use CentOS 5.5 to develop open-source software. Part of the development is SQA testing. I have a number of automated tests; one of the tests makes sure that things time out when they are supposed to time out. Changing something as fundamental as how long select() waits is a bug because it breaks programs that expect select() to have a particular wait time.

In addition, the select() man page states the following: "Some code calls select() with all three sets empty, n zero, and a non-NULL timeout as a fairly portable way to sleep with subsecond precision." Unfortunately, CentOS 5.5, which was supposed to be a bugfix-onlu update to CentOS 5.4, breaks this behavior.

Also, I didn't see the word "minimum" anywhere in the above quoted POSIX specs. Could you please show me the POSIX spec that states the wait time in question is only a suggested minimum wait time.

Finally, while this is probably an issue upstream should fix, it would benefit upstream to at least know about the bug, so they can decide whether to fix it.

Looks like I'll just have to figure out how to make the SQA test in question work with CentOS 5.4's working and CentOS 5.5's broken select() timeout.

~0011698

samiam (reporter)

I have updated my SQA regression test which exposed this bug to work with both the older and newer select() timeout behavior. As an aside, I have posted about the issue here:

http://woodlane.webconquest.com/pipermail/list/2010-July/000629.html

As an aside, do people feel upstream listens to reports of these kinds of issues? Would it be worth my time to join an upstream mailing list (or what not) and report this problem?

~0011701

Evolution (administrator)

Upstream listens quite well. They even have an additional area in bugzilla for reports coming from centos.

Now whether they choose to act on the bug is entirely something else. I have had cases that get resolved very quickly and others that languish for a year or more. To me, things like this are always worth posting to the bugzilla, but not worth joining the mailing list.

~0011702

samiam (reporter)

Thank you for the information. I will report this upstream.

~0011703

samiam (reporter)

I’ve files it upstream: https://bugzilla.redhat.com/show_bug.cgi?id=619664
+Notes

-Issue History
Date Modified Username Field Change
2010-07-29 21:29 samiam New Issue
2010-07-29 21:57 herrold Note Added: 0011693
2010-07-29 22:12 samiam Note Added: 0011694
2010-07-29 22:59 samiam Note Added: 0011698
2010-07-30 01:04 Evolution Note Added: 0011701
2010-07-30 04:31 samiam Note Added: 0011702
2010-07-30 06:00 samiam Note Added: 0011703
+Issue History