Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] AWS batch job failed and plugin report failed but flyte console shows task still running #2979

Open
2 tasks done
jw0515 opened this issue Oct 13, 2022 · 6 comments
Open
2 tasks done
Assignees
Labels
bug Something isn't working plugins Plugins related labels (backend or frontend) propeller Issues related to flyte propeller stale
Milestone

Comments

@jw0515
Copy link
Contributor

jw0515 commented Oct 13, 2022

Describe the bug

When there's an exception happened, flyte will catch the error and the AWS batch job status goes into a SUCCEEDED state and the flyte AWS batch plugin reports catch the error back. So when clicking the running task on the execution page the task "Map Execution" tab will show the AWS batch job failed. But on the execution page, the task's status is still "running" and never stops.

One can only abort the execution to stop the execution.

Expected behavior

Once the exception happened, although flyte catches it, the AWS batch job status should go to "Failed", the flyte task should failed and the execution should stop.

Additional context to reproduce

@workflow
def batch_inference_pipeline(model_path: str, scaler_path: str) -> int:

    inference_inputs = prepare_inference_inputs(model_path=model_path, scaler_path=scaler_path)
    batch_inference(inference_inputs=inference_inputs)
    return 0
config = AWSBatchConfig(
    platformCapabilities="EC2",
)
@task(requests=Resources(mem="16Gi", cpu="8"), task_config=config)
def batch_inference(inference_inputs: List[InferenceInput]) -> int:
    # pool = multiprocessing.Pool()
    # pool.map(inference, inference_inputs)
    for inference_input in inference_inputs:
        inference(inference_input)
    return 0

https://flyte-org.slack.com/archives/C01P3B761A6/p1664980783943249

Screenshots

image
image

This shows that flyte catches the error and log the error but didn't raise the exception in aws batch:
image

Are you sure this issue hasn't been raised already?

  • Yes

Have you read the Code of Conduct?

  • Yes
@jw0515 jw0515 added bug Something isn't working untriaged This issues has not yet been looked at by the Maintainers labels Oct 13, 2022
@welcome
Copy link

welcome bot commented Oct 13, 2022

Thank you for opening your first issue here! 🛠

@hamersaw hamersaw added propeller Issues related to flyte propeller plugins Plugins related labels (backend or frontend) and removed untriaged This issues has not yet been looked at by the Maintainers labels Oct 13, 2022
@hamersaw hamersaw added this to the 1.3.0 milestone Oct 13, 2022
@jw0515
Copy link
Contributor Author

jw0515 commented Oct 14, 2022

Ok, it seems no matter if the batch job state is succeeded or failed the flyte console is always hanging, this time the exception is thrown instead of captured in a batch job, so the batch job goes to the "FAILED" state. Flyte console still show it running.
image
@pingsutw

@pingsutw
Copy link
Member

@jw0515 Thanks, I'm looking at this issue. will get back to you once I know how to address this issue.

@github-actions
Copy link

Hello 👋, This issue has been inactive for over 9 months. To help maintain a clean and focused backlog, we'll be marking this issue as stale and will close the issue if we detect no activity in the next 7 days. Thank you for your contribution and understanding! 🙏

@github-actions github-actions bot added the stale label Oct 23, 2023
@github-actions
Copy link

Hello 👋, This issue has been inactive for over 9 months and hasn't received any updates since it was marked as stale. We'll be closing this issue for now, but if you believe this issue is still relevant, please feel free to reopen it. Thank you for your contribution and understanding! 🙏

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Oct 31, 2023
@eapolinario eapolinario reopened this Nov 2, 2023
@github-actions github-actions bot removed the stale label Nov 4, 2023
Copy link

github-actions bot commented Aug 3, 2024

Hello 👋, this issue has been inactive for over 9 months. To help maintain a clean and focused backlog, we'll be marking this issue as stale and will engage on it to decide if it is still applicable.
Thank you for your contribution and understanding! 🙏

@github-actions github-actions bot added the stale label Aug 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working plugins Plugins related labels (backend or frontend) propeller Issues related to flyte propeller stale
Projects
None yet
Development

No branches or pull requests

5 participants