Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DS-3139] Add timeout to background progress #51

Merged
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion panoply/constants.py
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
__version__ = "2.0.15"
__version__ = "2.1.0"
__package_name__ = "panoply-python-sdk"
15 changes: 13 additions & 2 deletions panoply/datasource.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,10 @@
import base64
import traceback
import concurrent.futures.thread
from concurrent.futures import ThreadPoolExecutor
from functools import wraps
from threading import Event
from time import time

import backoff
import requests
Expand Down Expand Up @@ -153,7 +155,7 @@ def wrapper(*args, **kwargs):
return _validate_token


def background_progress(message, waiting_interval=10 * 60):
def background_progress(message, waiting_interval=10 * 60, timeout=None):
""" A decorator is used to emit progress while long operation is executed.
For example, for database's data sources such operations might be
declaration of the cursor or counting number of rows.
Expand All @@ -167,6 +169,9 @@ def background_progress(message, waiting_interval=10 * 60):
waiting_interval : float
Time in seconds to wait between progress emitting.
Defaults to 10 minutes
timeout : float
Time in seconds for maximum progress emiting time.
Defaults to no limit
"""

def _background_progress(func):
Expand All @@ -175,11 +180,17 @@ def wrapper(*args, **kwargs):
self = args[0]
self.log('Creating background progress emitter')
ilia-panoply marked this conversation as resolved.
Show resolved Hide resolved
finished = Event()
started_at = time()
with ThreadPoolExecutor(max_workers=1) as executor:
func_future = executor.submit(func, *args, **kwargs)
func_future.add_done_callback(lambda future: finished.set())

while not func_future.done():
if timeout and (time() - started_at) > timeout:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the expected behavior of the source in case the timeout exceeded?
what if a query of MySQL indeeded runs more than the timeout?
will it make sense to have a default timeout (let's say to 24hr), and provide also a possibility to set the timeout value on the source itself, for example, one source will have a timeout of 30m while another 30d?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the expected behavior of the source in case the timeout exceeded?

In this case the source will end with an Exception of Max waiting time exceeded

what if a query of MySQL indeeded runs more than the timeout?

In MySQL when we run a query, we explicity ask server not to run for more than 24 hours. User, for whom I am doing this task, either have too old server or their server stalls, so the MAX_EXECUTION_TIME does not work. This feature will control max execution time on our server, rather then trust buggy user server

will it make sense to have a default timeout (let's say to 24hr), and provide also a possibility to set the timeout value on the source itself, for example, one source will have a timeout of 30m while another 30d?

I don't think so. At least in Shopify I remember that sources were running for 24h+, so timeout on overall collection for ALL DS will not be a good idea.
Setting the timeout per DS is cool, but needs to be discussed with Product. Do you want me to talk with Alon about it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but even in a source that runs more than 24hr (like this shoupify source) doesn't need a background progress the whole 24hr+ , it needs it only when waiting for a response, that may take more than 1 hour on databases, single API/GraphQL call (I'm not sure), but waiting for one single response for more than 24hr, sounds very odd

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ilia-panoply I argee with you here. Do you want this limit be present in all DS?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I think it should be set here as part of the SDK

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alonbrody Hi Alon, o you agree with this change?
Setting the max 24 hour limit on every function which uses background_progress decorator. I think it is set only to send_request functions in database-alike datasources.
In my case I need it for MySQL DS
Task in PR description

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ilia-panoply Alon agreed

self.log("Max waiting time exceeded")
executor._threads.clear()
concurrent.futures.thread._threads_queues.clear()
raise Exception("Max waiting time exceeded")
self.log(message)
self.progress(None, None, message)
finished.wait(timeout=waiting_interval)
Expand All @@ -188,4 +199,4 @@ def wrapper(*args, **kwargs):

return wrapper

return _background_progress
return _background_progress

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add newline at the end of the file