Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] After upgrading to 3006.2 gitfs states 'remote ref does not exist' for ext_pillar #65002

Closed
4 of 9 tasks
jonny08152 opened this issue Aug 16, 2023 · 11 comments · Fixed by #65017
Closed
4 of 9 tasks
Assignees
Labels
Bug broken, incorrect, or confusing behavior Confirmed Salt engineer has confirmed bug/feature - often including a MCVE must-fix Regression The issue is a bug that breaks functionality known to work in previous releases.

Comments

@jonny08152
Copy link

jonny08152 commented Aug 16, 2023

Description
After upgrading our salt-masters from 3006.1 to 3006.2 the master is unable to checkout our ext_pillar git with the following error:

[ERROR   ] Failed to checkout master from git_pillar remote '__env__ https://gitlab.com/my/salt-pillar.git': remote ref does not exist

Downgrading the salt-master, salt-minion and salt packages back to 3006.1 immediately solves the issue.

Setup

Please be as specific as possible and give set-up details.

  • on-prem machine
  • VM (Virtualbox, KVM, etc. please specify)
  • VM running on a cloud service, please be explicit and add details
  • container (Kubernetes, Docker, containerd, etc. please specify)
  • or a combination, please be explicit
  • jails if it is FreeBSD
  • classic packaging
  • onedir packaging
  • used bootstrap to install

The masters run on a CentOS 9 Stream machine with the latest patches (see below).

Steps to Reproduce the behavior

This is the snipped from the master conf regarding the ext_pillar that causes the issue:

ext_pillar:
  - git:
    - __env__ https://gitlab.com/my/salt-pillar.git:
      - root: .
      - user: gitlab-deploy-token
      - password: **********
      - fallback: master

Running the master in debug mode prints the following:

salt-master[691168]: [DEBUG   ] Current fetch URL for git_pillar remote '__env__ https://gitlab.com/my/salt-pillar.git': https://gitlab.com/my/salt-pillar.git (desired: https://gitlab.com/my/salt-pillar.git)
salt-master[691168]: [DEBUG   ] Current refspecs for git_pillar remote '__env__ https://gitlab.com/my/salt-pillar.git': ['+refs/heads/*:refs/remotes/origin/*', '+refs/tags/*:refs/tags/*'] (desired: ['+refs/heads/*:refs/remotes/origin/*', '+refs/tags/*:refs/tags/*'])
salt-master[691168]: [DEBUG   ] Current http.sslVerify for git_pillar remote '__env__ https://gitlab.com/my/salt-pillar.git': true (desired: true)
salt-master[691168]: [ERROR   ] Failed to checkout master from git_pillar remote '__env__ https://gitlab.com/my/salt-pillar.git': remote ref does not exist

The error message originates from this line of code:

"Failed to checkout %s from %s remote '%s': remote ref does not exist",

Debugging showed the problem to be that there are no references retrieved at this point and such the refs list being empty:

refs = self.repo.listall_references()

We also reference other repositories in the ext_pillar block like this and they work fine.

  - git:
    - main https://gitlab.com/my-other/salt-pillar.git:
      - root: .
      - user: other-pillars
      - password: *********
      - env: onpremises

Expected behavior
We expect the master to checkout the pillar repository without errors like in version 3006.1

Versions Report

salt --versions-report Masters and Minions have the same version and also run on similarly configured VMs (same OS).
Salt Version:
          Salt: 3006.2

Python Version:
        Python: 3.10.12 (main, Aug  3 2023, 21:47:10) [GCC 11.2.0]

Dependency Versions:
          cffi: 1.14.6
      cherrypy: unknown
      dateutil: 2.8.1
     docker-py: Not Installed
         gitdb: Not Installed
     gitpython: Not Installed
        Jinja2: 3.1.2
       libgit2: 1.6.4
  looseversion: 1.0.2
      M2Crypto: Not Installed
          Mako: Not Installed
       msgpack: 1.0.2
  msgpack-pure: Not Installed
  mysql-python: Not Installed
     packaging: 22.0
     pycparser: 2.21
      pycrypto: Not Installed
  pycryptodome: 3.9.8
        pygit2: 1.12.2
  python-gnupg: 0.4.8
        PyYAML: 6.0.1
         PyZMQ: 23.2.0
        relenv: 0.13.3
         smmap: Not Installed
       timelib: 0.2.4
       Tornado: 4.5.3
           ZMQ: 4.3.4

System Versions:
          dist: centos 9
        locale: utf-8
       machine: x86_64
       release: 5.14.0-331.el9.x86_64
        system: Linux
       version: CentOS Stream 9
@jonny08152 jonny08152 added Bug broken, incorrect, or confusing behavior needs-triage labels Aug 16, 2023
@welcome
Copy link

welcome bot commented Aug 16, 2023

Hi there! Welcome to the Salt Community! Thank you for making your first contribution. We have a lengthy process for issues and PRs. Someone from the Core Team will follow up as soon as possible. In the meantime, here’s some information that may help as you continue your Salt journey.
Please be sure to review our Code of Conduct. Also, check out some of our community resources including:

There are lots of ways to get involved in our community. Every month, there are around a dozen opportunities to meet with other contributors and the Salt Core team and collaborate in real time. The best way to keep track is by subscribing to the Salt Community Events Calendar.
If you have additional questions, email us at [email protected]. We’re glad you’ve joined our community and look forward to doing awesome things with you!

@cmcmarrow
Copy link
Contributor

cmcmarrow commented Aug 16, 2023

@jonny08152

  • Ubuntu
  • 3006.2
  • pygit2
- git:
  - __env__ https://github.com/cmcmarrow/privrepo.git:
    - root: .
    - user: cmcmarrow
    - password: **********
    - fallback: master 

But with that said you say only one of your repos stopped working. So we need to try and find the conditions that causing the frailer.

What OS are you on?
Are you using https://gitlab.com/my/salt-pillar.git multiple times in your master file?
Are you switching branchs in https://gitlab.com/my/salt-pillar.git
Is there any notable config differences between the ones that do work from the one not working?

@cmcmarrow
Copy link
Contributor

#64999

@cmcmarrow
Copy link
Contributor

cmcmarrow commented Aug 16, 2023

@jonny08152
Could you also try:
deleating you cache var/cache/salt
Also if you could make a master config your willing to post here that produces this error would be extremally help full.

@anilsil anilsil added this to the Sulfur v3006.3 milestone Aug 16, 2023
@jonny08152
Copy link
Author

@cmcmarrow

What OS are you on?

As stated above, we are using CentOS 9 Stream with python3-pygit2-1.12.2-1.el9.x86_64.

Are you using https://gitlab.com/my/salt-pillar.git multiple times in your master file?

No this is the only time this repository is referenced.

Are you switching branchs in https://gitlab.com/my/salt-pillar.git

The repository has master as the default branch and a few other branches. But the latest commit to the repository is months old so nothing changed on that side.

Is there any notable config differences between the ones that do work from the one not working?

The only difference that stands out is the use of the __env__ placeholder to map the saltenv to existing branches and the fallback mechanism.

Could you also try:
deleating you cache var/cache/salt

This was the first thing I tried but with no success. The error always stayed the same.

Please note:

  • Under /var/cache/salt/master/git_pillar there are multiple directories containing the checkout of the https://gitlab.com/my/salt-pillar.git repository.
  • Running git show-ref in those directories shows: only a single one has all the references properly checked out. The others show no references at all.
  • I went ahead and ran git fetch -a in all of them.
  • Running git show-ref now shows the proper references for all of the checkouts.
  • The errors in the master log no longer show up after that.

Just for the sake of completeness, here is our master config as a whole, maybe there is something I omitted earlier:

Config
interface: 10.20.30.40

fileserver_backend:
  - roots
  - gitfs

jinja_env:
  trim_blocks: True
  lstrip_blocks: True
  keep_trailing_newline: False

gitfs_provider: pygit2
gitfs_global_lock: False
gitfs_update_interval: 45

keep_jobs: 1
job_cache: False
minion_data_cache: False

gitfs_remotes:
  - https://gitlab.com/my/salt-states.git:
    - name: global
    - root: global
    - user: gitlab-deploy-token
    - password: ********
    - disable_saltenv_mapping: True
    - saltenv:
      - env-1:
        - ref: master
      - env-2:
        - ref: master
      ...
      - env-n:
        - ref: master
  - https://gitlab.com/my/salt-states.git:
    - name: specific
    - root: specific
    - user: gitlab-deploy-token
    - password: ********
    - disable_saltenv_mapping: True
    - saltenv:
      - env-1:
        - ref: master
      - env-2:
        - ref: dev
  - https://gitlab.com/my-other/salt-states.git:
    - name: other
    - user: other-salt
    - password: *********
    - disable_saltenv_mapping: True
    - saltenv:
      - env-10:
        - ref: main
      - env-11:
        - ref: feature
      - env-12:
        - ref: dev
  ...

ext_pillar_first: True
ext_pillar:
  - git: # global: for all accounts and envs
    - master https://gitlab.com/my/salt-pillar-global.git:
      - all_saltenvs: master
      - root: .
      - user: gitlab-deploy-token
      - password: *********
      - env: __env__
  - git: # specific: only for specific envs.
    - __env__ https://gitlab.com/my/salt-pillar.git:
      - root: .
      - user: gitlab-deploy-token
      - password: **********
      - env: __env__
      - fallback: master
  - git:
    - main https://gitlab.com/my-other/salt-pillar.git:
      - root: .
      - user: other-pillars
      - password: ***********
      - env: env-10
  - git:
    - main https://gitlab.com/my-other/salt-pillar.git:
      - root: .
      - user: other-pillars
      - password: ***********
      - env: env-11
  - git:
    - main https://gitlab.com/my-other/salt-pillar.git:
      - root: .
      - user: other-pillars
      - password: ***********
      - env: env-12

pillar_roots:
  __env__:
    - /srv/salt/pillar/global

pillar_safe_render_error: True

@cmcmarrow cmcmarrow added the Confirmed Salt engineer has confirmed bug/feature - often including a MCVE label Aug 17, 2023
@cmcmarrow cmcmarrow mentioned this issue Aug 21, 2023
3 tasks
@OrangeDog OrangeDog added the Regression The issue is a bug that breaks functionality known to work in previous releases. label Aug 21, 2023
@cmcmarrow
Copy link
Contributor

Hi @jonny08152 I believe I got a fix for you. #65017 I would appreciate it if you give it a try and/or look.

@jonny08152
Copy link
Author

Hi @cmcmarrow ,
I ran the salt master (run.py master) from your repo including the fix: cmcmarrow@5a5adfe .
It seems like the original error is fixed as I didn't see it anymore.

Unfortunately there is another error introduced by the fix.

The master does not come up when using the following config (which is a subset of the original config from above for testing purposes):

Config
interface: 10.20.30.40

fileserver_backend:
  - roots
  - gitfs

jinja_env:
  trim_blocks: True
  lstrip_blocks: True
  keep_trailing_newline: False

gitfs_provider: pygit2
gitfs_global_lock: False
gitfs_update_interval: 15

keep_jobs_seconds: 180
job_cache: False
minion_data_cache: False

gitfs_remotes:
  - https://gitlab.com/my/salt-states.git:
    - name: global
    - root: global
    - user: gitlab-deploy-token
    - password: ***********
    - disable_saltenv_mapping: True
    - saltenv:
      - env-1:
        - ref: master
  - https://gitlab.com/my/salt-states.git:
    - name: specific
    - root: specific
    - user: gitlab-deploy-token
    - password: ***********
    - disable_saltenv_mapping: True
    - saltenv:
      - env-1:
        - ref: master

ext_pillar_first: True
ext_pillar:
  - git: # global: for all accounts and envs
    - master https://gitlab.com/my/salt-pillar.global.git:
      - all_saltenvs: master
      - root: .
      - user: gitlab-deploy-token
      - password: ***********
      - env: __env__
  - git: # specific: only for specific envs.
    - __env__ https://gitlab.com/my/salt-pillar.git:
      - root: .
      - user: gitlab-deploy-token
      - password: ***********
      - env: __env__
      - fallback: master


pillar_roots:
  __env__:
    - /srv/salt/pillar/global

pillar_safe_render_error: True

The master process does not start but instead prints the following error message:

[WARNING ] Cache version mismatch clearing: '/var/cache/salt/master/gitfs'
[CRITICAL] The following gitfs remotes have conflicting cachedirs: https://gitlab.com/my/salt-states.git, https://gitlab.com/my/salt-states.git. Resolve this using a per-remote parameter called 'name'.
[WARNING ] Cache version mismatch clearing: '/var/cache/salt/master/git_pillar'
[CRITICAL] Failed to load gitfs
[CRITICAL] Master failed pre flight checks, exiting

As you can clearly see in the config, both remotes already have their name parameter set to distinct values. The Cache version mismatch clearing: lines originate from /var/cache/salt not existing prior to starting the master.

Removing or commenting out one of the salt-state remotes lets the master come up and it then also serves the minions without errors.

@cmcmarrow
Copy link
Contributor

@jonny08152 thanks for testing

@cmcmarrow
Copy link
Contributor

@jonny08152 I believe to have fixed what you were seeing #65017

@jonny08152
Copy link
Author

@cmcmarrow lgtm. I tested cmcmarrow@aba580f and got no errors anymore and the gitfs checkout also still working fine.

@cmcmarrow
Copy link
Contributor

@jonny08152 thank you so much for double checking. I'm going to close due to it being fixed. Feel free to open another ticket if you find another gitfs issue.

@cmcmarrow cmcmarrow mentioned this issue Sep 6, 2023
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug broken, incorrect, or confusing behavior Confirmed Salt engineer has confirmed bug/feature - often including a MCVE must-fix Regression The issue is a bug that breaks functionality known to work in previous releases.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants