cancel python model job when dbt exit #690

gaoshihang · 2024-05-29T22:53:25Z

Resolves #684

Description

Checklist

I have run this code in development and it appears to resolve the stated issue
This PR includes tests, or tests are not required/relevant for this PR
I have updated the CHANGELOG.md and added information about my change to the "dbt-databricks next" section.

gaoshihang · 2024-05-29T22:56:50Z

Hi @benc-db Could you please help review on this PR? the code is not ready, but I want you look it first, see if the method is fine. it related to this issue: #684

I think we can utilize Databricks workspace, we create a job_run_ids dir, each python model create one file named run_id in the job_run_ids dir.

When dbt canceled, we read from this job_run_ids dir, and cancel all run_id in it, then we delete the file.

gaoshihang · 2024-05-29T22:58:23Z

And I have tested it, the job can be canceled, and fail-fast works ok.

benc-db · 2024-05-30T15:58:30Z

dbt/adapters/databricks/connections.py

+        for run_id in run_ids:
+            self._cancel_run_id(run_id_dir, run_id)
+
+        return super().cancel_open()


Below you raise an exception on a non-200, but that will interrupt cancelling the other operations. Better to log a warning on non-200 I think.

benc-db · 2024-05-30T16:02:34Z

dbt/adapters/databricks/connections.py

+
+        return super().cancel_open()
+
+    def _cancel_run_id(self, run_id_dir: str, run_id: str) -> None:


since neither of these methods rely on anything in self, I think I would prefer them as static functions in python_submissions.py, so they are closer to the code that they are cleaning up.

benc-db · 2024-05-30T16:05:27Z

Hi @benc-db Could you please help review on this PR? the code is not ready, but I want you look it first, see if the method is fine. it related to this issue: #684

I think we can utilize Databricks workspace, we create a job_run_ids dir, each python model create one file named run_id in the job_run_ids dir.

When dbt canceled, we read from this job_run_ids dir, and cancel all run_id in it, then we delete the file.

If we can locate the target folder, I think I would prefer writing there, so that we don't rely on an API operation to store and retrieve. Also, I think we need to lock around writing the file to ensure clean operation when kicking off multiple python models concurrently.

benc-db · 2024-05-30T16:09:05Z

@mikealfare we're trying to figure out how to cancel python jobs as part of cleanup, similar to what is done for SQL queries when the user ctrl-Cs. Is there a better way to communicate run_ids from the python job helper to the connection manager? We were wondering if maybe there was some global state that would help the python job helper figure out the target directory?

benc-db · 2024-05-30T16:09:57Z

@jtcohen6 as well

gaoshihang · 2024-05-30T16:12:31Z

Hi @benc-db Could you please help review on this PR? the code is not ready, but I want you look it first, see if the method is fine. it related to this issue: #684
I think we can utilize Databricks workspace, we create a job_run_ids dir, each python model create one file named run_id in the job_run_ids dir.
When dbt canceled, we read from this job_run_ids dir, and cancel all run_id in it, then we delete the file.

If we can locate the target folder, I think I would prefer writing there, so that we don't rely on an API operation to store and retrieve. Also, I think we need to lock around writing the file to ensure clean operation when kicking off multiple python models concurrently.

Hi @benc-db Many thanks for you help. I didn't found a way to get target path in python_submission.py, I'll try to find today.

benc-db · 2024-05-30T16:13:48Z

Hi @benc-db Could you please help review on this PR? the code is not ready, but I want you look it first, see if the method is fine. it related to this issue: #684
I think we can utilize Databricks workspace, we create a job_run_ids dir, each python model create one file named run_id in the job_run_ids dir.
When dbt canceled, we read from this job_run_ids dir, and cancel all run_id in it, then we delete the file.

If we can locate the target folder, I think I would prefer writing there, so that we don't rely on an API operation to store and retrieve. Also, I think we need to lock around writing the file to ensure clean operation when kicking off multiple python models concurrently.

Hi @benc-db Many thanks for you help. I didn't found a way to get target path in python_submission.py, I'll try to find today.

I'm reaching out to dbt Labs folks to see if there is a better way. In particular, @mikealfare has worked on the dbt-spark adapter, so if we figure it out, it might be good for that library too.

gaoshihang · 2024-05-30T20:05:25Z

Hi @benc-db I'm thinking can we use a class static variable to share run_ids?

benc-db · 2024-05-30T20:07:23Z

Hi @benc-db I'm thinking can we use a class static variable to share run_ids?

Good point. I'm generally so anti-global state that I didn't even think of it :P. We still need to protect it from concurrency issues, but global state is going to be our best bet until we get support from dbt-core, and better to store in memory than cloud files.

gaoshihang · 2024-05-30T20:10:18Z

Hi @benc-db I'm thinking can we use a class static variable to share run_ids?

Good point. I'm generally so anti-global state that I didn't even think of it :P. We still need to protect it from concurrency issues, but global state is going to be our best bet until we get support from dbt-core, and better to store in memory than cloud files.

Yes...I don't want to use it too...but seems like if we don't want to do things in dbt-core, it's the only way we can share the state in two class..Let's me write some code in this way!

gaoshihang · 2024-05-31T03:04:16Z

Hi @benc-db, I revise the code, using a global variable to store all the run id, then cancel them in ConnectManager. please help to review this, thank you very much!

benc-db · 2024-05-31T15:59:31Z

dbt/adapters/databricks/connections.py

@@ -475,6 +476,17 @@ class DatabricksConnectionManager(SparkConnectionManager):
    TYPE: str = "databricks"
    credentials_provider: Optional[TCredentialProvider] = None

+    def cancel_open(self) -> List[str]:
+        from dbt.adapters.databricks.python_submissions import BaseDatabricksHelper


import at the top please. We only import in place like this if the thing we're importing is too heavy to do at start up.

benc-db · 2024-05-31T16:00:18Z

dbt/adapters/databricks/connections.py

+        from dbt.adapters.databricks.python_submissions import BaseDatabricksHelper
+
+        for run_id in BaseDatabricksHelper.run_ids:
+            logger.info(f"cancel run id {run_id}")


I think this can be debug, and we should mention that it's a python model job.

benc-db · 2024-05-31T16:03:10Z

dbt/adapters/databricks/auth.py

@@ -15,9 +15,11 @@

 class token_auth(CredentialsProvider):
    _token: str
+    _host: str


Why store this on the token? It's already on the DatabricksCredentials.

Seems like I can't get DatabricksCredentials in DatabricksConnectionManager.

I can just use self.credentials_provider in DatabricksConnectionManager, but there are no host in credentials_provider, so I put a host in the token_auth class.

Could you give me a direction how to get DatabricksCredentials in DatabricksConnectionManager?

The BaseDatabricksHelper has a copy of DatabricksCredentials.

hmm, but that's an instance...let me think.

I'm going to pull down a copy of this PR and see if i can figure it out.

yes..think I can't get instance in DatabricksConnectionManager...

Thank you very much!

Oh, did you already fix this?

No, I can't found a way.. so I still use the way put host in token_auth. so that the host can be retrieve from DatabricksConnectionManager.credentials_provider.

benc-db

Mostly good, just some minor comments to clean up.

mikealfare · 2024-06-01T00:22:45Z

@mikealfare we're trying to figure out how to cancel python jobs as part of cleanup, similar to what is done for SQL queries when the user ctrl-Cs. Is there a better way to communicate run_ids from the python job helper to the connection manager? We were wondering if maybe there was some global state that would help the python job helper figure out the target directory?

It looks like we might be trying to do something similar here for dbt-bigquery. I haven't read through either PR in detail to know if this solves your problem, but I figured I'd link it here in the event it's helpful.

benc-db · 2024-06-03T19:53:31Z

dbt/adapters/databricks/connections.py

+    def cancel_open(self) -> List[str]:
+        for run_id in BaseDatabricksHelper.run_ids:
+            logger.debug(f"Cancel python model job: {run_id}")
+            BaseDatabricksHelper.cancel_run_id(run_id, self.credentials_provider.as_dict()['token'], self.credentials_provider.as_dict()['host'])


ah, this is where you need to retrieve, and here you don't have an instance...we can use singleton pattern maybe?

give me an hour to take a crack at refactoring this; I have an idea :)

Singleton maybe its a way, Let me try some code.

yeah! thank you very much!

benc-db · 2024-06-03T22:28:15Z

This is annoying, but is actually teaching me a lot about our python model support lol. It's taking me longer than I expected because I'm trying to get it to work with all credentials and all execution formats (i.e. commands vs notebooks)

gaoshihang · 2024-06-03T22:55:28Z

This is annoying, but is actually teaching me a lot about our python model support lol. It's taking me longer than I expected because I'm trying to get it to work with all credentials and all execution formats (i.e. commands vs notebooks)

No Rush! thanks for your support. please let me know when you done, and I can modify the code in this PR~

gaoshihang · 2024-06-04T13:49:26Z

Hi @benc-db I'm reaching out to see if there are any things I can help!

benc-db · 2024-06-04T15:54:48Z

Hi @benc-db I'm reaching out to see if there are any things I can help!

I'll have a new PR up shortly...got stalled because weather knocked out my internet yesterday. After I put up my PR, if you could download and validate that it works for your scenario, that would be great.

benc-db · 2024-06-04T16:06:42Z

@gaoshihang closing in favor of #693. Please take a look and verify it works for your use case.

gaoshihang · 2024-06-04T16:40:40Z

Hi @benc-db , thank you very much! will do that later, and let you know!

cancel python model job when dbt exit

5152ea6

gaoshihang requested review from andrefurlan-db, benc-db and rcypher-databricks as code owners May 29, 2024 22:53

gaoshihang mentioned this pull request May 30, 2024

Python model doesn't have cancel method #684

Closed

benc-db reviewed May 30, 2024

View reviewed changes

Use global variable to store run ids

05ad8f1

benc-db reviewed May 31, 2024

View reviewed changes

benc-db requested changes May 31, 2024

View reviewed changes

update

45b9c70

benc-db reviewed Jun 3, 2024

View reviewed changes

benc-db mentioned this pull request Jun 4, 2024

Cancel python models on Ctrl-C #693

Merged

3 tasks

benc-db closed this Jun 4, 2024


		return super().cancel_open()

		def _cancel_run_id(self, run_id_dir: str, run_id: str) -> None:

cancel python model job when dbt exit #690

cancel python model job when dbt exit #690

Conversation

gaoshihang commented May 29, 2024 • edited Loading

Description

Checklist

gaoshihang commented May 29, 2024

gaoshihang commented May 29, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

benc-db commented May 30, 2024

benc-db commented May 30, 2024

benc-db commented May 30, 2024

gaoshihang commented May 30, 2024

benc-db commented May 30, 2024

gaoshihang commented May 30, 2024 • edited Loading

benc-db commented May 30, 2024

gaoshihang commented May 30, 2024

gaoshihang commented May 31, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

benc-db left a comment

Choose a reason for hiding this comment

mikealfare commented Jun 1, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

benc-db commented Jun 3, 2024

gaoshihang commented Jun 3, 2024

gaoshihang commented Jun 4, 2024

benc-db commented Jun 4, 2024

benc-db commented Jun 4, 2024

gaoshihang commented Jun 4, 2024

gaoshihang commented May 29, 2024 •

edited

Loading

gaoshihang commented May 30, 2024 •

edited

Loading