Can not apply cluster, "sudo: a password is required" #1209

sunshine69 · 2020-04-25T11:09:26Z

Hi team,

I am playing with docker image 0.6.0 and then the latest develop branch build as a docker image which leads to the same error message.

Let me describe the steps I did:

Create 1 master and 2 nodes run ubuntu 18.04. Master nodes has enough 2 g ram and 2 core to make kubeadm happy. nodes has 1.5G ram
Setup user admin ubuntu and ssh key so user can ssh in using key. Also user can sudo without a password.
Start the epiphany docker container with docker opt --net host so it stays the same network with the three nodes
Run

epicli init -p any newcluster1
cd newcluster1
vim newcluster1.yaml
## Set all nodes count to 0 except the kubenetes master and two nodes, I first just want to experiment with this. Save vim
epicli apply -f newcluster1.yaml

It runs for a while (quick) and got error

10:57:53 INFO cli.engine.ansible.AnsibleCommand - TASK [preflight_facts : Store preflight facts] *************************************************************
10:57:53 INFO cli.engine.ansible.AnsibleCommand - fatal: [master1]: FAILED! => {"msg": "Failed to get information on remote file (/shared/build/newcluster1/vault//../preflight_facts.yml): sudo: a password is required\n"}
10:57:53 INFO cli.engine.ansible.AnsibleCommand - 
10:57:53 INFO cli.engine.ansible.AnsibleCommand - NO MORE HOSTS LEFT *****************************************************************************************
10:57:53 INFO cli.engine.ansible.AnsibleCommand - 
10:57:53 INFO cli.engine.ansible.AnsibleCommand - PLAY RECAP *************************************************************************************************
10:57:53 INFO cli.engine.ansible.AnsibleCommand - master1                    : ok=14   changed=0    unreachable=0    failed=1    skipped=8    rescued=0    ignored=0   
10:57:53 INFO cli.engine.ansible.AnsibleCommand - node1                      : ok=12   changed=0    unreachable=0    failed=0    skipped=10   rescued=0    ignored=0   
10:57:53 INFO cli.engine.ansible.AnsibleCommand - node2                      : ok=12   changed=0    unreachable=0    failed=0    skipped=10   rescued=0    ignored=0   
10:57:53 INFO cli.engine.ansible.AnsibleCommand - 
10:57:53 ERROR cli.engine.ansible.AnsibleCommand - Error running: "ansible-playbook -i /shared/build/newcluster1/inventory /shared/build/newcluster1/ansible/preflight.yml"
10:57:53 INFO cli.engine.ansible.AnsibleCommand - Retry running playbook: 1/1
10:58:03 INFO cli.engine.ansible.AnsibleRunner - Run done in 28570ms
10:58:03 ERROR epicli - Failed running playbook after 1 retries
10:58:09 INFO dump_debug_info - Error dump has been written to: /shared/build/newcluster1/epicli_error_20200425-105803.dump
10:58:09 WARNING dump_debug_info - This dump might contain sensitive information. Check before sharing.

I wonder what I did wrong. Or a known bug?

Please note that version docker tag 0.4.2 does not have the issues, it build the cluster just fine.

I will share the dump file if requested.

Thanks team.

The text was updated successfully, but these errors were encountered:

sunshine69 · 2020-04-25T11:11:13Z

The reason I want to run the latest is that it have the option apply --skip-config which in my understanding allow me to play more with ansible generated after the first apply without it to be overridden again by epicli.

Also at some stage we need to upgrade anyway,

sk4zuzu · 2020-04-25T17:32:59Z

Hi @sunshine69!

So far I failed to reproduce the issue :(, I tried steps below (followed the docs here):

$ docker pull epiphanyplatform/epicli:0.6.0
$ docker run -it -v `pwd`:/shared --rm epiphanyplatform/epicli:0.6.0
epiuser@(redacted):/shared$ epicli apply -f any1.yml

where any1.yml is just a standard config file which looks like:

kind: epiphany-cluster
title: "Epiphany cluster Config"
provider: any
name: "any1"
specification:
  name: any1
  admin_user:
    name: ubuntu
    key_path: /shared/id_rsa
  components:
    kubernetes_master:
      count: 1
      machines:
        - default-k8s-master1
    kubernetes_node:
      count: 2
      machines:
        - default-k8s-node1
        - default-k8s-node2
    logging:
      count: 0
    monitoring:
      count: 0
    kafka:
      count: 0
    postgresql:
      count: 0
    load_balancer:
      count: 0
    rabbitmq:
      count: 0
---
kind: configuration/shared-config
title: Shared configuration that will be visible to all roles
name: default
specification:
  use_ha_control_plane: false
  promote_to_ha: false
provider: any
---
kind: infrastructure/machine
provider: any
name: default-k8s-master1
specification:
  hostname: x1a1
  ip: 10.20.2.10
---
kind: infrastructure/machine
provider: any
name: default-k8s-node1
specification:
  hostname: x1b1
  ip: 10.20.2.20
---
kind: infrastructure/machine
provider: any
name: default-k8s-node2
specification:
  hostname: x1b2
  ip: 10.20.2.21

The --net=host argument makes no difference in my environment.

Could you provide more info about what operating system you use to execute that docker container on and maybe exact steps how you enter it? Do you modify the image in any way?

Thanks for reporting the issue!

sunshine69 · 2020-04-26T02:26:29Z

Right, it might be the epiuser used as default. I - to retain ownership of my current shared volume, create a user with same uid inside the image - and when run I dont use user epiuser, but that newly created user. I do not think I enable sudo for that user -.

Let me look at that and repeat the process again. I will post update here.

Thanks a lot for looking into this.

Kind regards

sunshine69 · 2020-04-26T02:37:16Z

Yes confirmed. If the user run the epicli inside the epiphany container has sudo without root, or just use the default user epiuser (I have to edit to change the uid and gid to match with my current user as in ubuntu the first user defauled not 1000, but 1001.) - then the issues is gone.

However I got another error which is missing gpg-agent in the ubuntu system. Might create a PR to add these missing package in the ansible later on when I get all full list.

The ubuntu system I got is very minimum.

sk4zuzu · 2020-04-28T07:04:02Z

Hi @sunshine69!

I carefully reviewed the code and found that this is really unnecessary for sudo to be required for delegate_to: localhost type of ansible tasks.

Here's the pull-req that is going to fix that #1217. :)

Thanks.

sunshine69 · 2020-04-28T07:29:13Z

Thanks I will give it a test tomorrow

sunshine69 closed this as completed Apr 26, 2020

sunshine69 reopened this Apr 28, 2020

sk4zuzu changed the title ~~Can not apply cluster - fatal: [master1]: FAILED! => {"msg": "Failed to get information on remote file (/shared/build/newcluster1/vault//../preflight_facts.yml): sudo: a password is required~~ Can not apply cluster, "sudo: a password is required". Apr 28, 2020

sk4zuzu changed the title ~~Can not apply cluster, "sudo: a password is required".~~ Can not apply cluster, "sudo: a password is required" Apr 28, 2020

sk4zuzu closed this as completed Apr 30, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can not apply cluster, "sudo: a password is required" #1209

Can not apply cluster, "sudo: a password is required" #1209

sunshine69 commented Apr 25, 2020

sunshine69 commented Apr 25, 2020

sk4zuzu commented Apr 25, 2020

sunshine69 commented Apr 26, 2020

sunshine69 commented Apr 26, 2020

sk4zuzu commented Apr 28, 2020

sunshine69 commented Apr 28, 2020

Can not apply cluster, "sudo: a password is required" #1209

Can not apply cluster, "sudo: a password is required" #1209

Comments

sunshine69 commented Apr 25, 2020

sunshine69 commented Apr 25, 2020

sk4zuzu commented Apr 25, 2020

sunshine69 commented Apr 26, 2020

sunshine69 commented Apr 26, 2020

sk4zuzu commented Apr 28, 2020

sunshine69 commented Apr 28, 2020