View Issue Details

IDProjectCategoryView StatusLast Update Ecosystem Testingpublic2019-07-31 13:52
Status newResolutionopen 
Summary0016306: instances fail qemu-img
DescriptionWe are currently getting weird test failures [1] on some of our tests. This is the root cause, in `oc rsh centosci-tasks-6lx8h`:

$ ls -l fedora-30-caf27ed0e34e2019cbf8db0b02d97a006f9df21fbe32b3c90a95c5833acf1496.qcow2
-rw----r--. 1 nobody nobody 1954721792 Jul 19 02:33 fedora-30-caf27ed0e34e2019cbf8db0b02d97a006f9df21fbe32b3c90a95c5833acf1496.qcow2

$ sha256sum fedora-30-caf27ed0e34e2019cbf8db0b02d97a006f9df21fbe32b3c90a95c5833acf1496.qcow2
caf27ed0e34e2019cbf8db0b02d97a006f9df21fbe32b3c90a95c5833acf1496 fedora-30-caf27ed0e34e2019cbf8db0b02d97a006f9df21fbe32b3c90a95c5833acf1496.qcow2

$ qemu-img info fedora-30-caf27ed0e34e2019cbf8db0b02d97a006f9df21fbe32b3c90a95c5833acf1496.qcow2
qemu-img: Could not open '/build/images/fedora-30-caf27ed0e34e2019cbf8db0b02d97a006f9df21fbe32b3c90a95c5833acf1496.qcow2': Could not read qcow2 header: Input/output error

In other words, reading the file is fine (the SHA sum is correct when comparing it to other pods), but qemu-img is upset.

However, this lives on a shared volume, all other pods are fine with it. I validated this with
for pod in $(oc get -o name -l infra=cockpit-tasks pods); do echo "===== $pod ===="; oc describe $pod | grep Node; oc rsh $pod qemu-img info /build/images/fedora-30-caf27ed0e34e2019cbf8db0b02d97a006f9df21fbe32b3c90a95c5833acf1496.qcow2; done

The broken pod is the only one that runs on the n22.kempty node, the others seem fine. As `strace` does not work in docker containers, I can't think of a further way to examine this. Does that node have something funky in its journal, or could you try to run `strace` on the qemu-img on the node?

Perhaps it just needs a reboot?

Note, I will kill this pod now until I catch one that runs somewhere else.

TagsNo tags attached.




2019-07-31 09:28

reporter   ~0034890

FTR, I restarted a new pod, it landed on n22 again and has the same error. So it's not some weird state inside the pod.


2019-07-31 13:52

reporter   ~0034891

It seems to get worse -- creating new pods on n22 ( now says "Error syncing pod
(8 times in the last minute)".

Issue History

Date Modified Username Field Change
2019-07-31 09:26 Martin.Pitt New Issue
2019-07-31 09:28 Martin.Pitt Note Added: 0034890
2019-07-31 13:52 Martin.Pitt Note Added: 0034891