[CT-1021] Avoid creating notebook as the default way of running python model #424

ChenyuLInx · 2022-08-09T22:29:38Z

Describe the feature

This came out of the discussion with @JCZuurmond.
Currently we create a notebook for each python model. We need to create that book in the user's workspace, which is why an additional configuration user is added to run python model.

Per our discussion, this is unexpected behavior for our users. @JCZuurmond suggested using the spark_python_task instead of notebook_task.

We should switch to use spark_python_task as default, and add a way to support the creating notebook only if user specify it specifically, and in that case we will need user to be specified in profiles.yml.

By switching to spark_python_task, we need to replace the step of creating the notebook with upload the file we need to run to DBFS. Spark session will also need to be more explicitly defined.

Describe alternatives you've considered

Keep the current way. That means we will require a user for production run.

Additional context

The way that we are going to make creating notebook as something configurable for each model sounds related to the comment here

Who will this benefit?

All users for python model

Are you interested in contributing this feature?

Yes

Outcome of this should also include whether we can configure separate cluster for python model and create seperate ticket if not

The text was updated successfully, but these errors were encountered:

JCZuurmond · 2022-08-10T07:40:59Z

And I expect that the create directory is not needed, though I do not know for sure.

Where is the python model going to be uploaded to? As mentioned in one of the comments, to me a convention like: dbfs:/dbt/<project name>/<database name>/<model name>.py makes sense. By adding the project, database and model name, we avoid collisions of multiple models. And, I think it is nice if the behavior is predictable (no random name generation) so that users can get the files to debug it when necessary.

ChenyuLInx · 2022-08-29T23:35:38Z

@JCZuurmond , @ueshin used the command API which is even better since we don't need to create anything other than just run the python job. Let me know what you think

JCZuurmond · 2022-08-30T09:40:39Z

Looks nice! Could you share a link to the docs of the command API?

ChenyuLInx · 2022-08-30T19:37:37Z

@JCZuurmond the description of this issue in dbt-databricks has likes to all docs

ChenyuLInx added enhancement New feature or request python_models issues related to python model labels Aug 9, 2022

github-actions bot changed the title ~~Avoid creating notebook as the default way of running python model~~ [CT-1021] Avoid creating notebook as the default way of running python model Aug 9, 2022

ChenyuLInx mentioned this issue Aug 10, 2022

Feature/python model v1 #377

Merged

ChenyuLInx mentioned this issue Aug 29, 2022

refactor submission method and add command API as defualt #442

Merged

4 tasks

ChenyuLInx closed this as completed in #442 Aug 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CT-1021] Avoid creating notebook as the default way of running python model #424

[CT-1021] Avoid creating notebook as the default way of running python model #424

ChenyuLInx commented Aug 9, 2022 •

edited

Loading

JCZuurmond commented Aug 10, 2022

ChenyuLInx commented Aug 29, 2022

JCZuurmond commented Aug 30, 2022

ChenyuLInx commented Aug 30, 2022

[CT-1021] Avoid creating notebook as the default way of running python model #424

[CT-1021] Avoid creating notebook as the default way of running python model #424

Comments

ChenyuLInx commented Aug 9, 2022 • edited Loading

Describe the feature

Describe alternatives you've considered

Additional context

Who will this benefit?

Are you interested in contributing this feature?

JCZuurmond commented Aug 10, 2022

ChenyuLInx commented Aug 29, 2022

JCZuurmond commented Aug 30, 2022

ChenyuLInx commented Aug 30, 2022

ChenyuLInx commented Aug 9, 2022 •

edited

Loading