Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Databricks plugin does not update it's status and stays in running state. #4243

Closed
2 tasks done
givanovexpe opened this issue Oct 16, 2023 · 1 comment
Closed
2 tasks done
Assignees
Labels
bug Something isn't working flytepropeller

Comments

@givanovexpe
Copy link

Describe the bug

The databricks plugin does not check for all possible terminal states as returned from the Databricks Jobs API. As a result, if the Databricks jobs API returns a state which the plugin is not written to recognise, the plugin will think that the job is still in a running phase.

https://github.com/flyteorg/flyte/blob/master/flyteplugins/go/tasks/plugins/webapi/databricks/plugin.go#L227

Here we only check if the returned state is "TERMINATED" and if not we assume the job is in running state. However the Databricks API defines several terminal state - TERMINATED,SKIPPED, INTERNAL_ERROR.

https://docs.databricks.com/en/workflows/jobs/jobs-2.0-api.html#runlifecyclestate

When I ran a databricks job using the plugin recently (3 days ago) I stumbled upon the INTERNAL_ERROR status. The cause was related to EC2 or AWS networking. Either way, the Flyte databricks plugin did not capture the end of the job and reported it as RUNNING.

The slack thread describing this is here - https://flyte-org.slack.com/archives/CP2HDHKE1/p1696954554555229

Expected behavior

As a minimum we should capture all terminal states, something like this:

case http.StatusOK:
        if lifeCycleState == "TERMINATED" || lifeCycleState == "SKIPPED" || lifeCycleState == "INTERNAL_ERROR" {

Ideally we should be able to capture all states, not just the terminal ones, and show them in the flyteconsole, i.e. states like PENDING, RUNNING and TERMINATING. However this might not be possible if flyte has a predefined states for its jobs. However we need better handling of the databricks jobs statuses.

Additional context to reproduce

No response

Screenshots

No response

Are you sure this issue hasn't been raised already?

  • Yes

Have you read the Code of Conduct?

  • Yes
@givanovexpe givanovexpe added bug Something isn't working untriaged This issues has not yet been looked at by the Maintainers labels Oct 16, 2023
@pingsutw pingsutw added flytepropeller and removed untriaged This issues has not yet been looked at by the Maintainers labels Oct 16, 2023
@pingsutw pingsutw self-assigned this Oct 16, 2023
@pingsutw pingsutw mentioned this issue Oct 16, 2023
3 tasks
@givanovexpe
Copy link
Author

@pingsutw we can close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working flytepropeller
Projects
None yet
Development

No branches or pull requests

2 participants