-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Driver config appears to fail to parse, but it is inconsistent #6036
Comments
@ships That looks very odd. Mind if you provide My hypothesis is that the worker node is missing |
@notnoop answering your questions in reverse order: my deployment topology is a bit awkward, being not using Vagrant. If it means anything to you, my configuration is public but i can also try to answer any questions specifically if you think of them. To be clear, there are 2 "client" nodes of consul/nomad, and so the web and worker nodes i describe are groups in my nomad job. With that, the worker nodes share Nomad nodes with web nodes currently. I scaled web up to 2, in case that difference explained anything, but I can still see this huge difference in how the worker and web nodes respond: anyway, I have gotten the output requested on the two Nomad client nodes:
|
Hey there Since this issue hasn't had any activity in a while - we're going to automatically close it in 30 days. If you're still seeing this issue with the latest version of Nomad, please respond here and we'll keep this open and take another look at this. Thanks! |
This issue will be auto-closed because there hasn't been any activity for a few months. Feel free to open a new one if you still experience this problem 👍 |
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. |
Nomad version
Nomad v0.9.3 (c5e8b66)
Operating system and Environment details
I am running a Ubuntu 18.04, cluster topology is all on a single host virtualizing with VirtualBox in bridged networking mode:
Issue
This issue presents similarly to these issues:
#5680
#5694
But with a feature that I can't find a reference to, which is that my jobs eventually start up, they just fail about 3 times before succeeding. I have in my job allocation 2 instances, but on average I only get about 1.2 running at any time.
I suspect that it is related to the use of
but the funny thing is, if you look at the job file you see same directive in the
web
node, which does not fail. only thework
node fails, and only some of the time.Perhaps this variable is only some of the time set?
I also note that the failed allocations show
template with change_mode restart re-rendered
, and the successful allocations do not, which i think is what takes my happy instance and kicks it into the spiral for about 10 minutes. The web node, which does not fail on account of the driver, also gets restarted due to a template change. Notably, the web node has only 1 desired instance at the moment.Reproduction steps
I deploy this job file
Job file (if appropriate)
Nomad Client logs (if appropriate)
The text was updated successfully, but these errors were encountered: