You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This came out of the discussion with @JCZuurmond.
Currently we create a notebook for each python model. We need to create that book in the user's workspace, which is why an additional configuration user is added to run python model.
Per our discussion, this is unexpected behavior for our users. @JCZuurmond suggested using the spark_python_task instead of notebook_task.
We should switch to use spark_python_task as default, and add a way to support the creating notebook only if user specify it specifically, and in that case we will need user to be specified in profiles.yml.
By switching to spark_python_task, we need to replace the step of creating the notebook with upload the file we need to run to DBFS. Spark session will also need to be more explicitly defined.
Describe alternatives you've considered
Keep the current way. That means we will require a user for production run.
Additional context
The way that we are going to make creating notebook as something configurable for each model sounds related to the comment here
Who will this benefit?
All users for python model
Are you interested in contributing this feature?
Yes
Outcome of this should also include whether we can configure separate cluster for python model and create seperate ticket if not
The text was updated successfully, but these errors were encountered:
github-actionsbot
changed the title
Avoid creating notebook as the default way of running python model
[CT-1021] Avoid creating notebook as the default way of running python model
Aug 9, 2022
And I expect that the create directory is not needed, though I do not know for sure.
Where is the python model going to be uploaded to? As mentioned in one of the comments, to me a convention like: dbfs:/dbt/<project name>/<database name>/<model name>.py makes sense. By adding the project, database and model name, we avoid collisions of multiple models. And, I think it is nice if the behavior is predictable (no random name generation) so that users can get the files to debug it when necessary.
@JCZuurmond , @ueshin used the command API which is even better since we don't need to create anything other than just run the python job. Let me know what you think
Describe the feature
This came out of the discussion with @JCZuurmond.
Currently we create a notebook for each python model. We need to create that book in the user's workspace, which is why an additional configuration
user
is added to run python model.Per our discussion, this is unexpected behavior for our users. @JCZuurmond suggested using the
spark_python_task
instead ofnotebook_task
.We should switch to use
spark_python_task
as default, and add a way to support the creating notebook only if user specify it specifically, and in that case we will needuser
to be specified inprofiles.yml
.By switching to
spark_python_task
, we need to replace the step of creating the notebook with upload the file we need to run to DBFS. Spark session will also need to be more explicitly defined.Describe alternatives you've considered
Keep the current way. That means we will require a user for production run.
Additional context
The way that we are going to make creating notebook as something configurable for each model sounds related to the comment here
Who will this benefit?
All users for python model
Are you interested in contributing this feature?
Yes
Outcome of this should also include whether we can configure separate cluster for python model and create seperate ticket if not
The text was updated successfully, but these errors were encountered: