View Issue Details

IDProjectCategoryView StatusLast Update
0015720CentOS CI[All Projects] generalpublic2019-01-18 08:20
Status assignedResolutionopen 
Summary0015720: SSH key generation during a Jenkins job fails with permission denied error
DescriptionWe have a Jenkins job which uses ansible to configure multiple hosts. This job has been failing for some time now. The last successful build was on Jan 4th. However, since this is a per patch run, we expect some runs to fail due to non-infra problems.

The exact step that fails is this piece of code:
- hosts: localhost
  - name: Create an ssh keypair
    shell: ssh-keygen -b 2048 -t rsa -f $GLUSTO_WORKSPACE/glusto -q -N ""
creates: "{{ lookup('env', 'GLUSTO_WORKSPACE')}}/glusto"
Link to code:

The failure looks like this:
TASK [Create an ssh keypair] ***************************************************
task path: /home/gluster/workspace/gluster_glusto-patch-check/centosci/jobs/scripts/glusto/setup-glusto.yml:5
Using module file /home/gluster/env/lib/python2.7/site-packages/ansible/modules/commands/
<localhost> EXEC /bin/sh -c 'echo ~gluster && sleep 0'
<localhost> EXEC /bin/sh -c '( umask 77 && mkdir -p "` echo /home/gluster/.ansible/tmp/ansible-tmp-1547791810.26-138934045274774 `" && echo ansible-tmp-1547791810.26-138934045274774="` echo /home/gluster/.ansible/tmp/ansible-tmp-1547791810.26-138934045274774 `" ) && sleep 0'
<localhost> PUT /home/gluster/.ansible/tmp/ansible-local-26290Zo3BcY/tmpmU0A7q TO /home/gluster/.ansible/tmp/ansible-tmp-1547791810.26-138934045274774/
<localhost> EXEC /bin/sh -c 'chmod u+x /home/gluster/.ansible/tmp/ansible-tmp-1547791810.26-138934045274774/ /home/gluster/.ansible/tmp/ansible-tmp-1547791810.26-138934045274774/ && sleep 0'
<localhost> EXEC /bin/sh -c '/usr/bin/python /home/gluster/.ansible/tmp/ansible-tmp-1547791810.26-138934045274774/ && sleep 0'
<localhost> EXEC /bin/sh -c 'rm -f -r /home/gluster/.ansible/tmp/ansible-tmp-1547791810.26-138934045274774/ > /dev/null 2>&1 && sleep 0'
fatal: [localhost]: FAILED! => {
    "changed": true,
    "cmd": "ssh-keygen -b 2048 -t rsa -f $GLUSTO_WORKSPACE/glusto -q -N \"\"",
    "delta": "0:00:00.246300",
    "end": "2019-01-18 06:10:10.750425",
    "invocation": {
        "module_args": {
            "_raw_params": "ssh-keygen -b 2048 -t rsa -f $GLUSTO_WORKSPACE/glusto -q -N \"\"",
            "_uses_shell": true,
            "chdir": null,
            "creates": "/home/gluster/workspace/gluster_glusto-patch-check/centosci/glusto",
            "executable": null,
            "removes": null,
            "stdin": null,
            "warn": true
    "msg": "non-zero return code",
    "rc": 1,
    "start": "2019-01-18 06:10:10.504125",
    "stderr": "Saving key \"/home/gluster/workspace/gluster_glusto-patch-check/centosci/glusto\" failed: Permission denied",
    "stderr_lines": [
        "Saving key \"/home/gluster/workspace/gluster_glusto-patch-check/centosci/glusto\" failed: Permission denied"
    "stdout": "",
    "stdout_lines": []

This playbook runs successfully when I sssh into the node and try to execute. I can reproduce both the failure via Jenkins and the success via SSH consistently.
Additional InformationMy suspicion is that it has something to do with how the Jenkins process is started on slave07 and SELinux labels.

[gluster@slave07 workspace]$ ls -lZt
drwxr-xr-x. gluster gluster system_u:object_r:user_home_t:s0 gluster_glusto-patch-check
drwxr-xr-x. gluster gluster unconfined_u:object_r:user_home_t:s0 gluster_csi-driver-smoke
drwxrwxr-x. gluster gluster unconfined_u:object_r:user_home_t:s0 gluster_libgfapi-python
drwxr-xr-x. gluster gluster system_u:object_r:user_home_t:s0 gluster_run-tests-in-vagrant@2
drwxr-xr-x. gluster gluster system_u:object_r:user_home_t:s0 gluster_run-tests-in-vagrant
drwxr-xr-x. gluster gluster unconfined_u:object_r:user_home_t:s0 gluster_ansible-infra
drwxr-xr-x. gluster gluster unconfined_u:object_r:user_home_t:s0 gluster_build-rpms@4
drwxr-xr-x. gluster gluster unconfined_u:object_r:user_home_t:s0 gluster_build-rpms@3
drwxr-xr-x. gluster gluster unconfined_u:object_r:user_home_t:s0 gluster_build-rpms
drwxr-xr-x. gluster gluster unconfined_u:object_r:user_home_t:s0 gluster_build-rpms@2

[gluster@slave07 workspace]$ ps -Zaux | grep java
unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 gluster 26561 0.0 0.0 112708 976 pts/3 S+ 06:20 0:00 grep --color=auto java
system_u:system_r:initrc_t:s0 gluster 31866 0.8 5.1 5589424 389380 ? Ssl Jan08 114:58 java -jar /usr/local/bin/slave.jar -jnlpUrl

My original belief that setting an selinux-fcontext rule for ~/workspace to be unconfined_u:object_r:user_home_t:s0 and doing a restorecon on the folder will fix our troubles, but trying to test that theory out with chcon did not show any improvement in our job. At the moment, I'm a bit lost as to what's going on. Without having root access to read the logs, my ability to debug this is quite limited and i'd appreciate some help.
TagsNo tags attached.




2019-01-18 06:57

reporter   ~0033633

Adding a note that "touch foo" works from ansible, but ssh-keygen fails. So this is a bit strange. There is no strace on the machine to debug further. Is it possible to have strace on slave07 so I can debug this some more?

Also adding a note that Fabian did not see an AVC denial yesterday. So this is looking less like an SELinux issue. However, I'm also at a loss for what else could be broken.


2019-01-18 08:20

reporter   ~0033634

Alright, so I've made this a non-blocker by removing the step where we generate ssh keys in our test script and doing that on the machine before hand. This can be re-prioritized to low.

Issue History

Date Modified Username Field Change
2019-01-18 06:30 nigelb New Issue
2019-01-18 06:30 nigelb Status new => assigned
2019-01-18 06:57 nigelb Note Added: 0033633
2019-01-18 08:20 nigelb Note Added: 0033634