Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-1021] Avoid creating notebook as the default way of running python model #424

Closed
ChenyuLInx opened this issue Aug 9, 2022 · 4 comments · Fixed by #442
Closed

[CT-1021] Avoid creating notebook as the default way of running python model #424

ChenyuLInx opened this issue Aug 9, 2022 · 4 comments · Fixed by #442
Labels
enhancement New feature or request python_models issues related to python model

Comments

@ChenyuLInx
Copy link
Contributor

ChenyuLInx commented Aug 9, 2022

Describe the feature

This came out of the discussion with @JCZuurmond.
Currently we create a notebook for each python model. We need to create that book in the user's workspace, which is why an additional configuration user is added to run python model.

Per our discussion, this is unexpected behavior for our users. @JCZuurmond suggested using the spark_python_task instead of notebook_task.

We should switch to use spark_python_task as default, and add a way to support the creating notebook only if user specify it specifically, and in that case we will need user to be specified in profiles.yml.

By switching to spark_python_task, we need to replace the step of creating the notebook with upload the file we need to run to DBFS. Spark session will also need to be more explicitly defined.

Describe alternatives you've considered

Keep the current way. That means we will require a user for production run.

Additional context

The way that we are going to make creating notebook as something configurable for each model sounds related to the comment here

Who will this benefit?

All users for python model

Are you interested in contributing this feature?

Yes

Outcome of this should also include whether we can configure separate cluster for python model and create seperate ticket if not

@ChenyuLInx ChenyuLInx added enhancement New feature or request python_models issues related to python model labels Aug 9, 2022
@github-actions github-actions bot changed the title Avoid creating notebook as the default way of running python model [CT-1021] Avoid creating notebook as the default way of running python model Aug 9, 2022
@JCZuurmond
Copy link
Collaborator

And I expect that the create directory is not needed, though I do not know for sure.

Where is the python model going to be uploaded to? As mentioned in one of the comments, to me a convention like: dbfs:/dbt/<project name>/<database name>/<model name>.py makes sense. By adding the project, database and model name, we avoid collisions of multiple models. And, I think it is nice if the behavior is predictable (no random name generation) so that users can get the files to debug it when necessary.

@ChenyuLInx
Copy link
Contributor Author

@JCZuurmond , @ueshin used the command API which is even better since we don't need to create anything other than just run the python job. Let me know what you think

@JCZuurmond
Copy link
Collaborator

Looks nice! Could you share a link to the docs of the command API?

@ChenyuLInx
Copy link
Contributor Author

@JCZuurmond the description of this issue in dbt-databricks has likes to all docs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request python_models issues related to python model
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants