Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DS-3139] Add timeout to background progress #51

Merged

Conversation

MykytaPanoply
Copy link
Contributor

@MykytaPanoply MykytaPanoply commented Jan 25, 2023

Description

DS-3139 | link

  • Added timeout in case when the main function stuck, but the progress keeps the source alive

Related PRs

  • [List of related pull requests]

Tasks

  • [List of tasks that must be done before merging this pull request]

@MykytaPanoply MykytaPanoply self-assigned this Jan 25, 2023
@@ -188,4 +199,4 @@ def wrapper(*args, **kwargs):

return wrapper

return _background_progress
return _background_progress

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add newline at the end of the file

with ThreadPoolExecutor(max_workers=1) as executor:
func_future = executor.submit(func, *args, **kwargs)
func_future.add_done_callback(lambda future: finished.set())

while not func_future.done():
if timeout and (time() - started_at) > timeout:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the expected behavior of the source in case the timeout exceeded?
what if a query of MySQL indeeded runs more than the timeout?
will it make sense to have a default timeout (let's say to 24hr), and provide also a possibility to set the timeout value on the source itself, for example, one source will have a timeout of 30m while another 30d?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the expected behavior of the source in case the timeout exceeded?

In this case the source will end with an Exception of Max waiting time exceeded

what if a query of MySQL indeeded runs more than the timeout?

In MySQL when we run a query, we explicity ask server not to run for more than 24 hours. User, for whom I am doing this task, either have too old server or their server stalls, so the MAX_EXECUTION_TIME does not work. This feature will control max execution time on our server, rather then trust buggy user server

will it make sense to have a default timeout (let's say to 24hr), and provide also a possibility to set the timeout value on the source itself, for example, one source will have a timeout of 30m while another 30d?

I don't think so. At least in Shopify I remember that sources were running for 24h+, so timeout on overall collection for ALL DS will not be a good idea.
Setting the timeout per DS is cool, but needs to be discussed with Product. Do you want me to talk with Alon about it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but even in a source that runs more than 24hr (like this shoupify source) doesn't need a background progress the whole 24hr+ , it needs it only when waiting for a response, that may take more than 1 hour on databases, single API/GraphQL call (I'm not sure), but waiting for one single response for more than 24hr, sounds very odd

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ilia-panoply I argee with you here. Do you want this limit be present in all DS?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I think it should be set here as part of the SDK

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alonbrody Hi Alon, o you agree with this change?
Setting the max 24 hour limit on every function which uses background_progress decorator. I think it is set only to send_request functions in database-alike datasources.
In my case I need it for MySQL DS
Task in PR description

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ilia-panoply Alon agreed

panoply/datasource.py Show resolved Hide resolved
Copy link

@ilia-panoply ilia-panoply left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add default limit of 24hr to the background progress event

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants