Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

readinto function hangs while in progress #24847

Closed
affanv14 opened this issue Jun 14, 2022 · 10 comments
Closed

readinto function hangs while in progress #24847

affanv14 opened this issue Jun 14, 2022 · 10 comments
Assignees
Labels
customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-author-feedback Workflow: More information is needed from author to address the issue. no-recent-activity There has been no recent activity on this issue. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Storage Storage Service (Queues, Blobs, Files)

Comments

@affanv14
Copy link

  • Package Name: azure-storage-blob
  • Package Version: 12.12.0
  • Operating System: Amazon AL2
  • Python Version: 3.7

Describe the bug
when calling the below piece of code - the download hangs indefinitely while in progress
with open(local_file, "wb") as f: download_stream = blob_client.download_blob( ) download_stream.readinto(f)

To Reproduce
Steps to reproduce the behavior:

  1. Run the above piece of code on a file > 1gb

Expected behavior
The download should complete and the data should be stored in the file

Additional context
output below, hangs just after the last statement for 12+ hours


23:20:26.705 __main__.log_progress INFO 1086324736 bytes have been completed out of 1134174520
23:20:27.13 __main__.log_progress INFO 1090519040 bytes have been completed out of 1134174520
23:20:27.78 __main__.log_progress INFO 1094713344 bytes have been completed out of 1134174520
23:20:27.135 __main__.log_progress INFO 1098907648 bytes have been completed out of 1134174520
23:20:27.188 __main__.log_progress INFO 1103101952 bytes have been completed out of 1134174520
23:20:27.273 __main__.log_progress INFO 1107296256 bytes have been completed out of 1134174520
23:20:27.328 __main__.log_progress INFO 1111490560 bytes have been completed out of 1134174520
23:20:27.345 __main__.log_progress INFO 1113203000 bytes have been completed out of 1134174520
23:23:39.847 __main__.log_progress INFO 1117397304 bytes have been completed out of 1134174520
@ghost ghost added needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. customer-reported Issues that are reported by GitHub users external to the Azure organization. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that labels Jun 14, 2022
@azure-sdk
Copy link
Collaborator

Label prediction was below confidence level 0.6 for Model:ServiceLabels: 'Azure.Identity:0.39250913,Storage:0.2970738,Service Bus:0.07714585'

@kashifkhan
Copy link
Member

@affanv14 Thanks for your feedback, we'll investigate asap.

@kashifkhan kashifkhan added the Storage Storage Service (Queues, Blobs, Files) label Jun 14, 2022
@ghost ghost removed the needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. label Jun 14, 2022
@jalauzon-msft
Copy link
Member

Hi @affanv14 Mohammed, thanks for the report. Can you share a bit about how you are tracking the progress of the download? Also are you setting the max_concurrency keyword in your call to dnowload_blob? There is a known issue with the previously recommended way to track progress (if you are using the SDK to do so) when performing a concurrent download where the progress would never reach 100%.

@affanv14
Copy link
Author

Hi @jalauzon-msft we originally had it running without any keywords and it was hanging. We tried a bunch of things including max_concurrency. the last thing we tried was

with open(local_file, "wb") as f:
        download_stream = blob_client.download_blob(
            progress_hook=log_progress,
            timeout=MAX_DOWNLOAD_WAIT_TIME,
            max_concurrency=NUM_DOWNLOAD_THREADS,
            read_timeout=MAX_DOWNLOAD_WAIT_TIME,
        )
        LOG.info("Download stream received from blob client")
        download_stream.readinto(f)

@jalauzon-msft
Copy link
Member

Hi again @affanv14 Mohammed, thanks for sharing the sample. Can you also share a sample of how you are tracking the progress to show the output in your original post? This progress tracking output is not something I recognize so it must be custom on your end.

I'm trying to determine if the download itself is hanging or possibly the progress reporting is just never reporting that it's done. Thanks!

@affanv14
Copy link
Author

def log_progress(completed: int, total: int) -> None:
    LOG.info("{} bytes have been completed out of {}".format(completed, total))

here is what we used to log progress

@jalauzon-msft
Copy link
Member

Thanks @affanv14 Mohammed, apologies for the back and forth. I did not see in your previous response that you were using the progress_hook keyword! That is a newly introduced feature.

So, what I want to understand is if there is an issue with the progress_hook feature, since it's a new feature, or with the download itself. Do you know if the call to readinto is actually hanging or if the progress is just never reported that it reached the end of the blob? You could confirm by adding another LOG statement after your call to readinto to see if that call actually finishes.

@affanv14
Copy link
Author

There are statements after this(including a log statement right after) which is not reached. I am certain that it hangs on this. Furthermore, we previously had no params passed in the download_blob call - it was still having the issue

@vincenttran-msft vincenttran-msft self-assigned this Jun 23, 2022
@jalauzon-msft
Copy link
Member

Hi @affanv14 Mohammed, sorry for the long delay and thanks for confirming that readinto is indeed hanging.

The time the client will wait if there is no response from the server while reading data is configured via the read_timeout keyword and our default value is 80000 seconds (about 22 hours) (which is probably too high, and we are thinking of updating). I see you tried setting read_timeout for yourself to MAX_DOWNLOAD_WAIT_TIME? What is the value you used here? Did you wait for this length of time to see if some error eventually came back?

I think a good next step would be to enable debug logging so we can see the requests/responses happening and see if it is indeed hanging while waiting for a response of if something else is happening. Can you try enabling debug logging? Some documentation on how to do that is here but here is a code snippet from that documentation of all you should need (there are other ways to configure in article if needed):

import sys
import logging
from azure.storage.blob import BlobClient

# Set the logging level for the azure.storage.blob library
logger = logging.getLogger('azure.storage.blob')
logger.setLevel(logging.DEBUG)

# Direct logging output to stdout. Without adding a handler,
# no logging output is visible.
handler = logging.StreamHandler(stream=sys.stdout)
logger.addHandler(handler)

blob_client = BlobClient(..., logging_enable=True)

Thanks for your help and patience.

@xiangyan99 xiangyan99 added the needs-author-feedback Workflow: More information is needed from author to address the issue. label Jul 15, 2022
@ghost ghost added the no-recent-activity There has been no recent activity on this issue. label Jul 22, 2022
@ghost
Copy link

ghost commented Jul 22, 2022

Hi, we're sending this friendly reminder because we haven't heard back from you in a while. We need more information about this issue to help address it. Please be sure to give us your input within the next 7 days. If we don't hear back from you within 14 days of this comment the issue will be automatically closed. Thank you!

@ghost ghost closed this as completed Aug 6, 2022
@github-actions github-actions bot locked and limited conversation to collaborators Apr 11, 2023
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-author-feedback Workflow: More information is needed from author to address the issue. no-recent-activity There has been no recent activity on this issue. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Storage Storage Service (Queues, Blobs, Files)
Projects
None yet
Development

No branches or pull requests

6 participants