[BUG] Databricks plugin does not update it's status and stays in running state. #4243

givanovexpe · 2023-10-16T14:16:11Z

Describe the bug

The databricks plugin does not check for all possible terminal states as returned from the Databricks Jobs API. As a result, if the Databricks jobs API returns a state which the plugin is not written to recognise, the plugin will think that the job is still in a running phase.

https://github.com/flyteorg/flyte/blob/master/flyteplugins/go/tasks/plugins/webapi/databricks/plugin.go#L227

Here we only check if the returned state is "TERMINATED" and if not we assume the job is in running state. However the Databricks API defines several terminal state - TERMINATED,SKIPPED, INTERNAL_ERROR.

https://docs.databricks.com/en/workflows/jobs/jobs-2.0-api.html#runlifecyclestate

When I ran a databricks job using the plugin recently (3 days ago) I stumbled upon the INTERNAL_ERROR status. The cause was related to EC2 or AWS networking. Either way, the Flyte databricks plugin did not capture the end of the job and reported it as RUNNING.

The slack thread describing this is here - https://flyte-org.slack.com/archives/CP2HDHKE1/p1696954554555229

Expected behavior

As a minimum we should capture all terminal states, something like this:

case http.StatusOK:
        if lifeCycleState == "TERMINATED" || lifeCycleState == "SKIPPED" || lifeCycleState == "INTERNAL_ERROR" {

Ideally we should be able to capture all states, not just the terminal ones, and show them in the flyteconsole, i.e. states like PENDING, RUNNING and TERMINATING. However this might not be possible if flyte has a predefined states for its jobs. However we need better handling of the databricks jobs statuses.

Additional context to reproduce

No response

Screenshots

No response

Are you sure this issue hasn't been raised already?

Yes

Have you read the Code of Conduct?

Yes

The text was updated successfully, but these errors were encountered:

givanovexpe · 2023-12-19T10:12:21Z

@pingsutw we can close this issue.

givanovexpe added bug Something isn't working untriaged This issues has not yet been looked at by the Maintainers labels Oct 16, 2023

pingsutw added flytepropeller and removed untriaged This issues has not yet been looked at by the Maintainers labels Oct 16, 2023

pingsutw self-assigned this Oct 16, 2023

pingsutw mentioned this issue Oct 16, 2023

Fix databricks plugin #4206

Merged

3 tasks

givanovexpe closed this as completed Dec 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Databricks plugin does not update it's status and stays in running state. #4243

[BUG] Databricks plugin does not update it's status and stays in running state. #4243

givanovexpe commented Oct 16, 2023

givanovexpe commented Dec 19, 2023

[BUG] Databricks plugin does not update it's status and stays in running state. #4243

[BUG] Databricks plugin does not update it's status and stays in running state. #4243

Comments

givanovexpe commented Oct 16, 2023

Describe the bug

Expected behavior

Additional context to reproduce

Screenshots

Are you sure this issue hasn't been raised already?

Have you read the Code of Conduct?

givanovexpe commented Dec 19, 2023