-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Global op concurrency limits across all runs (without needing Celery) #12470
Comments
Hi @gibsondan, this would be indeed nice to have as a feature. Do you have an idea how you'd implement this? I thought of a couple of approaches and wanted to get your feedback. The first one was implemented, and it seems to work as expected. Approach 1: We tried this implementation and it seemed to work. We made changes to this function by adding the incrementing functionality:
It would have been nice to pass the pool config as an OP level config, however that would have been a more intrusive change so for now we extended the executor configuration to allow setting which pool to use for which OP. Pros:
Cons:
Approach 2: Pros:
Cons:
|
Thanks so much for the detailed thoughts @yeachan153 - it'd be interesting to see your implementation of #1 in a PR if that's an option (even if it's not fully baked yet). We've also been exploring solutions that address one of the cons you listed in both of your proposals (that there's still a session for each run and each run is still running in its own isolated task for the duration of the run, even if it now might be spending much more time just waiting for an op to be available to run). Addressing that would probably require more significant changes to Dagster's execution model - instead of always having each run be in its own isolated process we would need to have some more centralized process within our daemon figuring out which steps are eligible to be launched and then launching those steps. |
Sure, it's definitely more of a functional/hacky PR rather than a mergeable PR - #13250. For some context, we're running things on Kubernetes using the multi-process executor, so it might not generalise for other executors. We're create a Pools table in an init container by calling this function (haven't really spent time trying to integrate this properly within Dagster):
Then incrementing/decrementing when OPs can start/finish. I'm not too familiar with the source code so LMK if I'm incrementing/decrementing in the wrong places.
I definitely like this aspect about Dagster. It makes things way quite a bit more scaleable and keeps things simple in the daemon, which is a definite positive IMO. |
I was thinking of trying dask distributed Lock/Semaphore for this myself, but inside the asset ops - maybe something like that could work? Like a simpler version of the dagster-dask executor that only uses a dask scheduler for centralized concurrency control, and otherwise delegates op execution to the multiprocess executor (or another) I'm not sure if this is actually viable, might run into similar problems with failing to release locks/leases. |
Hi all! |
@gustavo-delfosim - yes (though still experimental). |
Hi @sryza ! from dagster import Definitions, op, job
import time
GLOBAL_CONCURRENCY_TAG = "dagster/concurrency_key"
T = 20
@op(tags={GLOBAL_CONCURRENCY_TAG: "foo"})
def foo_op():
time.sleep(T)
@op(tags={GLOBAL_CONCURRENCY_TAG: "foo"})
def bar_op():
time.sleep(T)
@job
def foo_job():
foo_op()
bar_op()
defs = Definitions(jobs=[foo_job]) I just changed the functions to sleep for T=20 seconds. I set up the foo concurrency in the UI to 1 task and tried to run the job through the UI, expecting it to run in ~40 s, as each op should run sequentially due to concurrency limitations. However, the job runs in ~20 s. Is this a bug in this feature? Or am I missing something to set it up? |
@prha - mind taking a look at this one? |
Hi @gustavo-delfosim What version of dagster are you running, and what is your storage implementation? |
We were using SQLite as the storage, and that's the reason why it did not work. After switching to Postgres, as indicated in the docs, it worked as expected. |
What's the use case?
Setting a global limit across all runs for the number of ops with a given tag that can run at a given time, for example to protect access to a shared resources. Right now you can apply run-level concurrency in Dagster, and op-level concurrency within a single run, but you can't apply a global limit for an op (or asset) across all runs.
Ideas of implementation
No response
Additional information
No response
Message from the maintainers
Impacted by this issue? Give it a 👍! We factor engagement into prioritization.
The text was updated successfully, but these errors were encountered: