View Issue Details

IDProjectCategoryView StatusLast Update
0017574CentOS CI[All Projects] generalpublic2020-07-08 16:43
Reporterastepano 
PriorityhighSeveritymajorReproducibilityalways
Status resolvedResolutionfixed 
Summary0017574: Cannot use storage on apps.ocp.ci.centos.org
DescriptionHello,

Could you please give me a hint?

How storage supposed to be used on apps.ocp.ci.centos.org ?

I bump into 2 issues:

1. restarting app is not possible.
2. redeploy of app is not possible.


The previous run had UID: 1000590000
now:

```
$ id
uid=1000610000(1000610000) gid=0(root) groups=0(root),1000610000
```

Storage has files:

```
$ mount | grep jenkins
nfs02.ci.centos.org:/exports/ocp-prod/pv-100gi-ef62c8ce-b537-5b04-b943-8ddf890c172c on /var/jenkins_home type nfs4 (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=172.19.0.137,local_lock=none,addr=172.19.0.22)
/dev/mapper/coreos-luks-root-nocrypt on /var/jenkins_config type xfs (ro,relatime,seclabel,attr2,inode64,prjquota)
/dev/mapper/coreos-luks-root-nocrypt on /var/jenkins_plugins type xfs (rw,relatime,seclabel,attr2,inode64,prjquota)
/dev/mapper/coreos-luks-root-nocrypt on /usr/share/jenkins/ref/plugins type xfs (rw,relatime,seclabel,attr2,inode64,prjquota)
$ ls -la /var/jenkins_home
total 640
drwxrwxrwx. 16 nobody nogroup 12288 Jul 7 16:51 .
drwxr-xr-x. 1 root root 51 Jul 7 17:43 ..
drwxr-xr-x. 3 1000590000 root 4096 Jun 25 12:35 .cache
drwxr-xr-x. 3 1000590000 root 4096 Jun 25 12:35 .groovy
drwxr-xr-x. 3 1000590000 root 4096 Jun 25 12:32 .java
-rw-r--r--. 1 1000590000 root 0 Jun 25 12:35 .lastStarted
-rw-r--r--. 1 1000590000 root 59 Jul 7 13:39 .owner
-rw-r--r--. 1 1000590000 root 129 Jul 5 14:49 atomic133849868953798244tmp

```

When I try to specify runAsUser it fails with:

```
  Warning FailedCreate 39s (x15 over 118s) replicaset-controller Error creating: pods "osci-jenkins-2-579b8758c8-" is forbidden: unable to validate against any security context constraint: [fsGroup: Invalid value: []int64{1000510000}: 1000510000 is not an allowed group spec.initContainers[0].securityContext.securityContext.runAsUser: Invalid value: 1000510000: must be in the ranges: [1000610000, 1000619999] spec.containers[0].securityContext.securityContext.runAsUser: Invalid value: 1000510000: must be in the ranges: [1000610000, 1000619999] spec.containers[1].securityContext.securityContext.runAsUser: Invalid value: 1000510000: must be in the ranges: [1000610000, 1000619999]]
```

Other issue is:

if I do redeploys I say: `oc delete pvc XXX`

On next step I deploy from yaml files all project.
But, on second round PVC says: Pending.
TagsNo tags attached.

Activities

astepano

astepano

2020-07-08 10:46

reporter   ~0037336

Hello, update on the topic.
We had discussion with @bstinson.
He suggested to use native Jenkins templates.
I understand that official Jenkins templates are well tested on openshift platform.
But, we using templates is not optimal for many reasons:

1. On EKS we use official helm chart: stable/jenkins. It has great community support.
2. The helm provides great/super/easy update of already deployed apps.
3. In helm you can see what is going to change with each step.
4. We have issues with current approach using openshift templates: https://jenkins-continuous-infra.apps.ci.centos.org. Simply: it is outdated, we cannot update Jenkins core with help of templates functionality. Helm - allows to this.
5. Unification the same chart for different clusters.

In my understanding: storage permission fix happens on S2I build time for Openshift Jenkins:

https://github.com/openshift/jenkins/blob/6d8fcdf822f8f5ed07ef50d5fb4c6198024d6be2/2/Dockerfile.localdev#L91
https://github.com/openshift/jenkins/blob/master/2/contrib/jenkins/fix-permissions

Also I found a document: https://github.com/ibm-cloud-docs/openshift/blob/master/openshift_troubleshoot_storage.md

They suggest two approaches:

1. Modify docker file, and fix permission at build time. --- For me it sounds a bit strange. This approach forces to modify all Dockerfiles that come not from OpenShift.
2. Use init-container approach.

I added init-container. But it fails. (this is official IBM way)

Quotation from the document:

Is the init container failing? Because OpenShift sets restricted security context constraints, you might see an error such as chown: /opt/ibm-ucd/server/ext_lib: Operation not permitted. For your deployment, use an SCC that allows chown operations and try again. {: note}

I tried to modify SCC for init-container: but it failed with:

Warning FailedCreate 5s (x14 over 46s) replicaset-controller Error creating: pods "osci-jenkins-2-648cb66464-" is forbidden: unable to validate against any security context constraint: [spec.initContainers[0].securityContext.securityContext.runAsUser: Invalid value: 0: must be in the ranges: [1000610000, 1000619999] spec.initContainers[0].securityContext.privileged: Invalid value: true: Privileged containers are not allowed]

May I ask to enable priviledged SCC for init-containers ?
This is official suggestion from IBM for OpenShift:

https://github.com/ibm-cloud-docs/openshift/blob/master/openshift_troubleshoot_storage.md#file-storage-app-fails-when-a-non-root-user-owns-the-nfs-file-storage-mount-path

Current approach with: 1 deployment 1 storage 1 permission, no redeployment is somewhat inflexible.
astepano

astepano

2020-07-08 16:10

reporter   ~0037339

Hello, update on the ticket.

UID issue was due to re-claim free PV from different namespace that hold data.

In case of full redeploy, we remove PVC, and ask on #fedora-ci to delete old PV and create a new PV without data.

The ticket can be closed. Thank you @bstinson @siddharthvipul for support.

Issue History

Date Modified Username Field Change
2020-07-07 18:08 astepano New Issue
2020-07-07 18:08 astepano Status new => assigned
2020-07-08 10:46 astepano Note Added: 0037336
2020-07-08 16:10 astepano Note Added: 0037339
2020-07-08 16:43 siddharthvipul1 Status assigned => resolved
2020-07-08 16:43 siddharthvipul1 Resolution open => fixed