-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Concurrent orchestrate runners get mixed up pillar data #30353
Comments
@tnypex, thanks for reporting. |
I can confirm too that this is happening in 2015.8.7 |
I just created #38868 which seems related. Seeing this behavior in 2016.11.1 |
Seeing this frequently too. Breaking our ability to effectively use orchestration. |
This is still really killing our orchestration. Is there any hope in sight? Any work arounds? |
Okay the only thing i can replicate is sometimes the pillar data is empty on 2017.7 and it only occurs sometimes.
Here is a docker container for someone to quickly replicate:
You will need to run that ~/test.sh a couple of times. (5-10 times) and you will see empty pillar data sometimes. I found this PR in the queue: #39948 which should fix the issue. I tested with this fix and did not see pillar data missing. Can anyone here verify this fix resolves the issue for them as well? |
I'll take a look at this today. Thanks! |
I finally got my docker set up to run this over and over, and I haven't been able to catch it erroring out yet. I will keep poking. |
I haven't actually verified the PR, but... Good news: I had a test that I ran on our 2016.11.1 server where it would lose or mix up pillar data in concurrent reactor+orchestrations almost 100% of the time... (I used this test because I could run it on our live server less dangerously than the test reactor in the container) When I ran that in the docker container, it failed about 3% of the time. So something is different between our master and the docker container- and the version of docker is definitely different. So I'm hoping our success rate shoots up when we can update to 2017.7 (soon). The 3% of failures are probably what #39948 are there to fix, and they are definitely "missing" pillar errors instead of "mixed up" pillar errors. That's all nice- but I'm not sure of the best way to pull #39948 into my local clone to test that THAT fix actually closes off the rest of the errors. I'm moderately smart with github, just not that smart. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. If this issue is closed prematurely, please leave a comment and we will gladly reopen the issue. |
Hi,
salt['pillar.get']
returns mixed up values when concurrent orchestrate runners are started by the reactor.This is the same issue as in #23373 event though that was supposed to be fixed by the RequestContext implementation (PR #27843)
SLS and logs below:
the reactor sls is:
and the orchestrate sls:
When, running test.ping against multiple minions
salt '*' test.ping
, the reactor sls is correctly rendered but the following orchestrate runners get mixed/inconsistent pillar data, full debug logs pasted below:The text was updated successfully, but these errors were encountered: