Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: read -> connection reset by peer #108

Closed
karalabe opened this issue Feb 5, 2015 · 19 comments
Closed

storage: read -> connection reset by peer #108

karalabe opened this issue Feb 5, 2015 · 19 comments
Assignees
Labels
api: storage Issues related to the Cloud Storage API. status: investigating The issue is under investigation, which is determined to be non-trivial.

Comments

@karalabe
Copy link
Contributor

karalabe commented Feb 5, 2015

'smee again :D :P

I'm probably missing something, but I've hit this snag and thought I'd ask if it's something occasional, or if it's by design, and I missed the docs for it.

I was slowly streaming a download from GCS to GCE, processing some data and pushing stuff back up to GCS. After about 10-15 minutes, my reader crashed with read tcp 64.233.182.132:443: connection reset by peer. I haven't really found any reason why GCS would disrupt a running download (apart from maybe deeming it too slow). Is there some time/rate/whatev limit that CGS imposes, or should I look for the culprit somewhere else?

Thanks

@karalabe
Copy link
Contributor Author

The problem still persists, and with non-slow downloads too. I'm using a simple reader to stream download a file from GCS into GCE at about 2-3 MB/s, and GCS still manages to interrupt the download after some time (last time it was a bit more than 2 hours in (yes, it's a huge file :P).

@gmlewis
Copy link
Contributor

gmlewis commented Feb 20, 2015

I have a theory that the oauth token is expiring during the transfer, but I could be totally wrong.
We are working on a ResumableMedia upload in the google-api-go-client library that will simply retry from where it left off in case something failed during transfer. I'm hoping that this should solve the problem, but I don't have a date for you as to when this will be ready because there are continuing design discussions.

@karalabe
Copy link
Contributor Author

Hey Glenn, actually I opened that discussion too :P

Note however, that this issue is not about upload, but rather download. There shouldn't be any auth issues there, since after the initial read grant, GCS is never queried for further permissions.

This issue could probably be solved by using a resumable download, but I'm reluctant to suggest that since I would expect GCS -> GCE to work flawlessly and not get dropped randomly (maybe occasionally is understandable, but definitely not regularly).

@okdave okdave added the api: storage Issues related to the Cloud Storage API. label May 29, 2015
@okdave okdave modified the milestone: storage in beta May 29, 2015
@jba jba added the type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. label Jun 23, 2016
@c0b
Copy link

c0b commented Feb 24, 2017

ping @gmlewis do you have for an update for ResumableMedia for this almost 2 years?

my app is doing constantly uploading relatively large files (hundreds MBs to GBs); with current cloud.google.com/go/storage library I use the model of obj.NewWriter and io.Copy into; but it has occasional failure and the err didn't tell where it fail, so I have to re-try upload since beginning?

ResumableMedia seems available in the raw api library https://godoc.org/google.golang.org/api/storage/v1
https://cloud.google.com/storage/docs/json_api/v1/how-tos/resumable-upload Would you guys have an update in the near future?

@gmlewis
Copy link
Contributor

gmlewis commented Feb 24, 2017

@c0b - I no longer work on the cloud team that supports this and don't have an answer for you. Maybe @broady or @jba have more information.

@jba
Copy link
Contributor

jba commented Apr 5, 2017

@c0b We do use resumable media for upload. Can you give us more information on your problem? Preferably in a different issue—this one is about reads, not writes.

@c0b
Copy link

c0b commented Apr 6, 2017

#596 to @jba

@jba jba self-assigned this Jun 7, 2017
@jba
Copy link
Contributor

jba commented Jun 9, 2017

@karalabe Are you still experiencing download problems? We haven't heard any other reports of this problem. I'm going to close, but re-open if it's still an issue for you and we'll look into it.

@jba jba closed this as completed Jun 9, 2017
@karalabe
Copy link
Contributor Author

karalabe commented Jun 9, 2017

Hah, I opened this request sooo long ago :) My old project got abandoned a few years back so no more data point for you. Feel.free to close, someone can repost if they hit it.

@ernsheong
Copy link
Contributor

ernsheong commented Aug 16, 2017

I'm hitting this. I'm not sure what I did wrong. Any help would be appreciated.

2017/08/16 15:06:51 http2: server: error reading preface from client [::1]:64953: read tcp [::1]:9090->[::1]:64953: read: connection reset by peer

This was v0.7.0. I'll try upgrading first.

UPDATE: Sorry, self-signed SSL error 😏

@brianmhunt
Copy link

@jba I've experienced this intermittent failure with Python Cloud Functions. I've attached a traceback.

traceback.txt

I'll leave this here as a data point, but if the issue persists I'll add more detail.

@brianmhunt
Copy link

As this is happening regularly for me in Python, I've reported it here: https://issuetracker.google.com/issues/113672049 . (Will also post to the Google cloud python repo)

@cmey
Copy link

cmey commented Sep 28, 2018

I get the same error as OP @karalabe and @brianmhunt.
Connection reset by peer

It happens during:

  • storage.bucket(bucket).get_blob(path)
  • and bigquery_client.insert_rows(table, rows_to_insert).

This is running Google Cloud Functions with Python 3.7 and google-cloud-storage==1.11.0.

Not all the time, about 10% failure rate. Function deployed to us-east1 (I also tried us-central1, about the same).

@anuvab
Copy link

anuvab commented Oct 11, 2018

I still get this issue regularly. Failure rates are low - something like 2-3 percent, but they are regular.

@ncruces
Copy link

ncruces commented Nov 29, 2018

I still get this with Go on Google Cloud Functions (both cloud-functions-go and Google Cloud Functions for Go). Both on uploads and downloads.

It seems to be a property of the environment? Instances, when not used, are left in a frozen state. Server drops the connection but client doesn't realize it because it's frozen. Node.js package authors have avoided Keep-Alive because of this, but that has it's own issues (like: poorer performance, and more connections and DNS queries, both of which have quotas). See this.

My particular function downloads a file, converts it to something else, then uploads it.

The error pattern I get is: connection reset by peer on download (connection to the storage.googleapis.com domain), I retry the function, download succeeds, then the same issue on upload (connection to the www.googleapis.com domain).

Fix for me is: check the error for this particular error (strings.Contains 🙄), if so, close and recreate the client.

My question is, why isn't this particular error retried?

@broady
Copy link
Contributor

broady commented Nov 29, 2018

Re-opening.

Yes, @ncruces, I think you're on the right track with your diagnosis. GCF instances are indeed put to sleep as part of the autoscaling/warm-boot mechanism.

/cc @jadekler @enocom

@broady broady reopened this Nov 29, 2018
@broady broady unassigned jba Nov 29, 2018
@JustinBeckwith JustinBeckwith added 🚨 This issue needs some love. triage me I really want to be triaged. labels Nov 29, 2018
@broady broady removed this from the storage in beta milestone Nov 29, 2018
@jeanbza jeanbza added status: investigating The issue is under investigation, which is determined to be non-trivial. and removed 🚨 This issue needs some love. triage me I really want to be triaged. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. labels Nov 29, 2018
@jeanbza
Copy link
Contributor

jeanbza commented Dec 5, 2018

@ncruces We're doing some investigating but would love more details on what's going on. Would it be possible to open a new issue, in which you describe what you're doing and what you're seeing? Specifically which operations you're running and so on.

Also, could you email me at [email protected] your app-id so that we can go inspect logs?

@ncruces
Copy link

ncruces commented Dec 12, 2018

I'm sorry for not responding sooner. I'm sending you an email, and opened #1253.

@jeanbza
Copy link
Contributor

jeanbza commented Dec 12, 2018

Thanks @ncruces ! Closing this issue and continuing the GCF-specific questions at #1253.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: storage Issues related to the Cloud Storage API. status: investigating The issue is under investigation, which is determined to be non-trivial.
Projects
None yet
Development

No branches or pull requests