Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Spanner] GRPC randomly throws ServiceException("Socket closed") #1208

Closed
taka-oyama opened this issue Aug 8, 2018 · 23 comments
Closed

[Spanner] GRPC randomly throws ServiceException("Socket closed") #1208

taka-oyama opened this issue Aug 8, 2018 · 23 comments
Assignees
Labels
api: spanner Issues related to the Spanner API. priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@taka-oyama
Copy link
Contributor

taka-oyama commented Aug 8, 2018

We occasionally see this error in production.

Google\Cloud\Core\Exception\ServiceException: { "message": "Socket closed", "code": 14, "status": "UNAVAILABLE", "details": [] }
at Google\Cloud\Core\GrpcRequestWrapper->handleStream (/project/vendor/google/cloud-core/src/GrpcRequestWrapper.php:254)
at Generator->valid ([internal function])
at Google\Cloud\Spanner\Result->rows (/project/vendor/google/cloud-spanner/src/Result.php:177)
at iterator_to_array ([internal function])

This error occurs randomly across various places where we use spanner.

How can I resolve this?

@taka-oyama
Copy link
Contributor Author

taka-oyama commented Aug 8, 2018

I followed the stacktrace and found the following comment for code: 14 in Google\Rpc\Code.php

    /**
     * The service is currently unavailable.  This is most likely a
     * transient condition, which can be corrected by retrying with
     * a backoff.
     * See the guidelines above for deciding between `FAILED_PRECONDITION`,
     * `ABORTED`, and `UNAVAILABLE`.
     * HTTP Mapping: 503 Service Unavailable
     *
     * Generated from protobuf enum <code>UNAVAILABLE = 14;</code>
     */
    const UNAVAILABLE = 14;

Should we be retrying this?
Shouldn't the library be retrying automatically?

@taka-oyama taka-oyama changed the title [Spanner] Google\Cloud\Core\Exception\ServiceException("Socket closed") [Spanner] GRPC randomly throws ServiceException("Socket closed") Aug 8, 2018
@axot
Copy link

axot commented Aug 8, 2018

When using Spanner, should we turn these parameters in GKE?

net.ipv4.tcp_keepalive_intvl
net.ipv4.tcp_keepalive_probes
net.ipv4.tcp_keepalive_time

@jdpedrie jdpedrie added type: question Request for information or clarification. Not an issue. api: spanner Issues related to the Spanner API. labels Aug 8, 2018
@jdpedrie
Copy link
Contributor

jdpedrie commented Aug 8, 2018

Can you determine whether the issue arises from various service calls (like commit, perhaps), or is it isolated to calls from Result::rows()?

@taka-oyama
Copy link
Contributor Author

Unfortunately, we can only confirm this error on Result::rows since that's all that is being called from our service at the moment.

@jdpedrie
Copy link
Contributor

jdpedrie commented Aug 8, 2018

No problem! I'll investigate and get back to you once I know more.

@taka-oyama
Copy link
Contributor Author

taka-oyama commented Aug 8, 2018

So our team was looking though the code and realized that the error occurs on
https://github.com/GoogleCloudPlatform/google-cloud-php/blob/master/Spanner/src/Result.php#L177

But seems like the try/catch right below that is actually handling the case.
So would this work out if $generator->valid() was also handled with try/catch?

@axot
Copy link

axot commented Aug 8, 2018

Our production is not fully open now, at this time we are using many Spanner nodes, does this related?
Because this did not happen so frequently in our stress environment.

pasted_image_2018_08_08_23_39

@dwsupplee
Copy link
Contributor

We are looking in to whether wrapping the valid() call with a try/catch block will help.

/cc @snehashah16 (our Spanner contact) for some assistance around your question regarding GKE parameters and the differences seen in the stress environment.

@danoscarmike
Copy link
Contributor

@dwsupplee any progress investigating this one?

@dwsupplee
Copy link
Contributor

@danoscarmike We've been having trouble replicating the issue locally, but have a change set moving the valid calls which we could resume on into the try/catch block. We are attempting to validate it doesn't have any negative side effects and will be pushing it up shortly.

@dwsupplee
Copy link
Contributor

@taka-oyama @axot

Would it be possible to check out this branch to see if it helps mitigate the issue before we release?

@taka-oyama
Copy link
Contributor Author

@dwsupplee Sure, we'll try it out today and let you know how it goes.

@taka-oyama
Copy link
Contributor Author

taka-oyama commented Aug 16, 2018

@dwsupplee
I was able to reproduce this error by setting up 50 php-fpm/nginx servers and feeding it relatively low traffic like 1 request per second where that one request sends 1 SELECT query to spanner.
The error occurred 20 minutes in.

I tried out the latest branch you provided and ran it for about 2 hours.
It seems to be running fine without errors.

Hopefully this gets merged soon?

@dwsupplee dwsupplee removed the type: question Request for information or clarification. Not an issue. label Aug 16, 2018
@dwsupplee dwsupplee self-assigned this Aug 16, 2018
@dwsupplee dwsupplee added type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. labels Aug 16, 2018
@dwsupplee
Copy link
Contributor

Awesome, thanks for testing this out @taka-oyama. We will get this merged today and a release following shortly!

@dwsupplee
Copy link
Contributor

Closing this out, as we just released v0.74 which includes #1220. If you run into any other issues please feel free to re-open.

@castaneai
Copy link

@dwsupplee
Hi, I found the same error on first call of $generator->valid();

Google\Cloud\Core\Exception\ServiceException: { "message": "Socket closed", "code": 14, "status": "UNAVAILABLE", "details": [] }
at Google\Cloud\Core\GrpcRequestWrapper->handleStream (/project/vendor/google/cloud-core/src/GrpcRequestWrapper.php:263)
at Generator->valid ([internal function])
at Google\Cloud\Spanner\Result->rows (/project/vendor/google/cloud-spanner/src/Result.php:181)
at iterator_to_array ([internal function])

$valid = $generator->valid();

Why not include this part in a try block?

@dwsupplee
Copy link
Contributor

@castaneai, from what I can remember our original guidance from the Spanner team was to only retry when a resume token is available. With that said, it does seem reasonable to retry on the first valid call since we haven't received any rows yet. I'm certainly open to adding this in.

@castaneai
Copy link

This error still occurs from time to time.
I am glad if you fix it.

@bacheson
Copy link

bacheson commented Jan 23, 2019

I'm seeing this exact issue still on v0.91

@lukasgit
Copy link

@dwsupplee @jdpedrie this issue still randomly persists. We're part of the Google Cloud Startup program and launching this year... a fix would be greatly appreciated so we can move to production.

Log output:

PHP Fatal error:  Uncaught Google\Cloud\Core\Exception\ServiceException: {
    "message": "Socket closed",
    "code": 14,
    "status": "UNAVAILABLE",
    "details": []
} in /[...]/Google/composer-google-cloud/vendor/google/cloud/Core/src/GrpcRequestWrapper.php:257
Stack trace:
#0 /[...]/Google/composer-google-cloud/vendor/google/cloud/Core/src/GrpcRequestWrapper.php(194): Google\Cloud\Core\GrpcRequestWrapper->convertToGoogleException(Object(Google\ApiCore\ApiException))
#1 [internal function]: Google\Cloud\Core\GrpcRequestWrapper->handleStream(Object(Google\ApiCore\ServerStream))
#2 /[...]/Google/composer-google-cloud/vendor/google/cloud/Firestore/src/SnapshotTrait.php(122): Generator->current()
#3 /[...]/Google/composer-google-clo in /[...]/Google/composer-google-cloud/vendor/google/cloud/Core/src/GrpcRequestWrapper.php on line 257

@dwsupplee
Copy link
Contributor

@lukasgit would you mind opening up a fresh issue for us? Sometimes comments on closed issues can get lost.

@lukasgit
Copy link

@dwsupplee #2427

@dwsupplee
Copy link
Contributor

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: spanner Issues related to the Spanner API. priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

No branches or pull requests

8 participants