Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scaling an app down while a build is running leads to unpredictable results #34

Open
Cryptophobia opened this issue Mar 20, 2018 · 7 comments
Labels

Comments

@Cryptophobia
Copy link
Member

From @deis-admin on January 19, 2017 23:41

From @jeff-lee on November 5, 2015 22:47

I'm running into an issue in v1.12.0 where scaling down an app while a build is running can result in either:

a) The new containers getting shut down and the build hanging
b) Zero running containers

I started a new cluster and scaled the example-go app up to 3.

$ fleetctl list-units|grep jefftest
jefftest_v74.web.1.service  a5ea5dc1.../10.10.17.144    active      running
jefftest_v74.web.2.service  6b548706.../10.10.19.9      active      running
jefftest_v74.web.3.service  6b548706.../10.10.19.9      active      running

I then started a build ( v75 ) and scaled the app down from 3 to 2 when the node started pulling the new containers down.

$ deis ps:scale web=2 -a jefftest
Scaling processes... but first, coffee!
done in 5s
=== jefftest Processes
--- web:
web.1 up (v74)
web.2 up (v74)

At this point, the v75 container gets stopped and the build ( with HEALTHCHECK_URL set ) hangs.

Thu Nov  5 22:06:45 UTC 2015
cda30e1fda8e        10.10.16.243:5000/jefftest:v75   "/runner/init start    1 seconds ago        Up Less than a second   0.0.0.0:32901->5000/tcp   jefftest_v75.web.1
2598c80e0985        10.10.16.243:5000/jefftest:v74   "/runner/init start    About a minute ago   Up About a minute       0.0.0.0:32900->5000/tcp   jefftest_v74.web.3
9d4614e6fb3f        10.10.16.243:5000/jefftest:v74   "/runner/init start    2 minutes ago        Up 2 minutes            0.0.0.0:32899->5000/tcp   jefftest_v74.web.2
Thu Nov  5 22:06:46 UTC 2015
9d4614e6fb3f        10.10.16.243:5000/jefftest:v74   "/runner/init start    2 minutes ago       Up 2 minutes        0.0.0.0:32899->5000/tcp   jefftest_v74.web.2
Thu Nov  5 22:06:47 UTC 2015
9d4614e6fb3f        10.10.16.243:5000/jefftest:v74   "/runner/init start    2 minutes ago       Up 2 minutes        0.0.0.0:32899->5000/tcp   jefftest_v74.web.2
Thu Nov  5 22:06:48 UTC 2015
9d4614e6fb3f        10.10.16.243:5000/jefftest:v74   "/runner/init start    2 minutes ago       Up 2 minutes        0.0.0.0:32899->5000/tcp   jefftest_v74.web.2
Thu Nov  5 22:06:49 UTC 2015
9d4614e6fb3f        10.10.16.243:5000/jefftest:v74   "/runner/init start    2 minutes ago       Up 2 minutes        0.0.0.0:32899->5000/tcp   jefftest_v74.web.2

I have also seen all of the containers get stopped when scaling from 3 to 2. Though I have only been able to reproduce this when HEALTHCHECK_URL is not set so far.

14:42:50 [ds12] - /Users/jefflee
$ deis ps:scale web=2 -a jefftest
Scaling processes... but first, coffee!
done in 6s
=== jefftest Processes

14:44:49 [ds12] - /Users/jefflee
$ deis info -a jefftest
=== jefftest Application
updated:  2015-11-05T22:44:49UTC
uuid:     20949ab0-ffbd-4442-b490-f7b01951976b
created:  2015-11-05T18:43:43UTC
url:      jefftest.ds12.therealreal.com
owner:    jefflee
id:       jefftest

=== jefftest Processes

=== jefftest Domains

Copied from original issue: deis/deis#4719

Copied from original issue: deis/controller#1224

@Cryptophobia
Copy link
Member Author

From @deis-admin on January 19, 2017 23:41

From @carmstrong on November 5, 2015 23:37

Though I have only been able to reproduce this when HEALTHCHECK_URL is not set so far.

Have you seen any issues with HEALTHCHECK_URL set? We strongly recommend using this as a best practice for app deploys, since by default we will consider all running containers to be live and healthy.

@Cryptophobia
Copy link
Member Author

From @deis-admin on January 19, 2017 23:41

From @jeff-lee on November 6, 2015 0:9

I haven't been able to reproduce the container=0 issue yet when HEALTHCHECK_URL is set, but the shutdown of the new containers and hang of the builder does still happen.

I have also seen 502 Bad Gateway and 404's when it's set if I scale down late enough in the deploy process.

@Cryptophobia
Copy link
Member Author

From @deis-admin on January 19, 2017 23:41

From @carmstrong on November 6, 2015 0:11

I have also seen 502 Bad Gateway and 404's when it's set if I scale down late enough in the deploy process.

In general, I don't know if we make any guarantees when scaling an app up/down while a deploy is already running - the controller is executing logic as to how many containers to scale up/down based on the current number it sees.

Is there a use case for this, @jeff-lee, or are you just doing resiliency testing?

@Cryptophobia
Copy link
Member Author

From @deis-admin on January 19, 2017 23:41

From @jeff-lee on November 6, 2015 0:56

@carmstrong I was doing resiliency testing of the build process and this popped up.

Having said that, our CI is pushing builds to staging and qa throughout the day so I don't think it would be unusual for someone to try to scale an app without knowing that a build might be in progress.

It would be less of an issue in production since that's a more controlled process.

@Cryptophobia
Copy link
Member Author

From @deis-admin on January 19, 2017 23:41

From @mboersma on November 11, 2015 15:54

I don't think it would be unusual for someone to try to scale an app without knowing that a build might be in progress

Sounds like a common case that Deis should handle gracefully.

@Cryptophobia
Copy link
Member Author

From @deis-admin on January 19, 2017 23:41

From @bacongobbler on January 22, 2016 0:50

I'm not sure if there's an easy way to resolve this reliably. There are a lot of concurrency issues related to Deis. This is one of them. Perhaps at some point we could use something that acts a single source of truth to tells us when the builder is performing a build, but I don't see an easy solution to this problem that we could tackle for the LTS release.

@Cryptophobia
Copy link
Member Author

From @deis-admin on January 19, 2017 23:41

From @bacongobbler on January 22, 2016 0:52

see also deis/deis#4746

duanhongyi added a commit to duanhongyi/controller that referenced this issue Nov 26, 2021
test(controller): add command unittest
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant