View Issue Details

IDProjectCategoryView StatusLast Update
0016208BuildsysCi.centos.org Ecosystem Testingpublic2019-09-04 14:51
ReporterMartin.Pitt 
PrioritynormalSeveritymajorReproducibilityalways
Status newResolutionopen 
Summary0016208: cockpit-images volume (os-pv-100gi-00000002) inexplicably full
DescriptionHello!

In the "cockpit" OpenShift project we use a persistent volume (https://console.apps.ci.centos.org:8443/console/project/cockpit/browse/persistentvolumeclaims/cockpit-images) as VM image cache. This is now shown as full:

    172.22.6.19:/exports/os-pv-100gi-00000002 187G 187G 1.0M 100% /cache/images

and regularly leads to ENOSPC. However, all actual files on this device are only 30 GB:

$ du -hsc /cache/images/
30G /cache/images/
30G total

Yesterday I already tried to shut down all containers that mount this volume, just in case there are processes holding open fds to deleted files, but that didn't help at all. There are no hidden files on this device either. At this point I don't know what I could do on my end.

Is there something obviously wrong on the NFS server side?

In case repairing the volume isn't easy: It wouldn't be a problem to entirely delete, or remove/re-add the PV, and re-populate it. I just copied all files to our other (Red Hat internal) store.
TagsNo tags attached.

Activities

Martin.Pitt

Martin.Pitt

2019-06-26 08:54

reporter   ~0034724

Odd, today this looks different even nothing substantial changed on our end since I filed this 4 days ago (as I shut down all pods that work with the image):

   172.22.6.19:/exports/os-pv-100gi-00000002 187G 143G 44G 77% /cache/images

I re-enabled the pods for now, as it seems it's not dangerously close to ENOSPC again. However, we still only allocate ~ 30 GB, not 143.
Martin.Pitt

Martin.Pitt

2019-08-29 05:47

reporter   ~0035029

This is still an issue today:

187G 171G 16G 92% /cache/images

$ du -hs /cache/images/
38G /cache/images/
Martin.Pitt

Martin.Pitt

2019-09-04 14:51

reporter   ~0035066

I tried to completely empty the volume file system now:

$ ls -la /cache/images/
total 0
drwxrwxrwx. 2 nobody nobody 6 Sep 4 14:45 .
drwxrwsrwx. 4 root 1000190000 34 Sep 4 09:13 ..

 du -hs /cache/images/
0 /cache/images/

$ df /cache/images/
Filesystem 1K-blocks Used Available Use% Mounted on
172.22.6.19:/exports/os-pv-100gi-00000002 195265536 142974976 52290560 74% /cache/images

I then shut down all pods that use that volume, just in case they hold open deleted fds there. But that didn't help either.

So there really is some inexplicable 137G usage that I just can't get rid of. The fun thing is that this is supposed to be a 100 GiB PV, not 187 -- 100 would be enough, but 50 is just too small. We need some 70 GiB for our stuff.

Issue History

Date Modified Username Field Change
2019-06-22 14:16 Martin.Pitt New Issue
2019-06-26 08:54 Martin.Pitt Note Added: 0034724
2019-08-29 05:47 Martin.Pitt Note Added: 0035029
2019-09-04 14:51 Martin.Pitt Note Added: 0035066