feat: implement resource based routing feature #2535

AVaksman · 2020-01-01T05:51:16Z

Implement resource based routing.

Try to obtain an instance specific endpointUri
If successfull => create a SpannerClient using instance specific endpointUri as apiEndpoint
use an Instance designated SpannerClient (from pool) for all data calls of the said Instance.

jdpedrie · 2020-01-01T14:41:47Z

Hi @AVaksman. Can you explain what the benefit and use-case of this is? Was it requested by the spanner team or something you believe would be useful?

Happy new year!

AVaksman · 2020-01-01T14:51:19Z

Requested by spanner team.

A new set of instance-config specific endpoints provide an isolation mechanism and a
performance boost. By hitting the endpoint for instance-specific requests, traffic will be shared
only with other users with instances in that instance-config. Similarly, since the endpoint
resolves to frontends colocated with the user's instance, and there is a request fanout at the
frontend level, latency will improve.

AVaksman · 2020-01-01T14:53:34Z

This PR is also uses FieldMask feature exposed in PR #2530

AVaksman · 2020-01-01T14:54:08Z

And a Happy New Year!

jdpedrie · 2020-01-07T15:45:25Z

Thanks @AVaksman, I've taken a few minutes to understand what we're doing here now. :)

I have one rather large concern which we should discuss before fully reviewing. The implementation you have here will add 1 RPC call to each instantiation of the library. Since PHP doesn't have long-running processes and every request requires instantiating the library, that adds up to a lot of additional RPCs. I'd venture to guess that it would more than offset the gains from using a more optimized endpoint.

We need to find a way to cache this across requests to prevent having to make API calls every time the client is created.

edit: Also, I see the endpoint_uris in the public protos, but not on cloud.google.com yet. Is this feature fully available to all users of Spanner?

AVaksman · 2020-01-08T04:43:59Z

In the proposed implementation the instantiation of spannerClient (data client) is not performed on client instantiation ($client = new SpannerClient($clientConfig);) but rather postponed (similar to the logic with databaseAdmin and insatanceAdmin clients) until user is making a data network call.
The one RPC call is added before instantiation of the spannerClient (when user is expecting a data RPC call).
The extra RPC call is only performed once per instance, otherwise spannerClient is cached and used with all subsequent network data calls.

jdpedrie · 2020-01-08T16:02:58Z

Right, but in high-traffic applications, if you have 1000 visitors, that means 1000 instantiations of the client and 1000 additional RPCs that this change would add. I find it unlikely that the benefits of this change outweigh the added cost.

What we need is to cache the endpoint_uris for a period of time and share that across all instances of the client. We do this already in the Spanner session pool, or in the Auth token cache.

Here's what I'm thinking:

Accept an additional configuration option on SpannerClient called Psr\Cache\CacheItemInterface $endpointCache. Advise the users that it is extremely highly recommended that they provide a persistent cache in all cases where GOOGLE_CLOUD_ENABLE_RESOURCE_BASED_ROUTING is on. When that cache is provided, store the endpoint URIs in it and only make the RPC call if the cache item does not exist or has expired.

@dwsupplee do you have any thoughts here?

dwsupplee · 2020-01-09T23:38:38Z

This seems like a reasonable thing to implement. I agree with @jdpedrie the network overhead here might end up negating some of the usefulness of the feature, however.

Using a cache seems reasonable, or I could also see providing an environment variable which hosts a list of endpoint uri's a user could determine and provide themselves. Whatever we land on, it is definitely desirable to avoid always needing to make this network request. Do these endpoints change regularly, and that necessitates the network request? Or can we reliably pre-determine them?

skuruppu · 2020-01-16T00:42:30Z

Thanks @jdpedrie and @dwsupplee for your inputs here.

This seems like a reasonable thing to implement. I agree with @jdpedrie the network overhead here might end up negating some of the usefulness of the feature, however.

Let me try to understand the problem first. So you're saying that a customer using the client would instantiate a new client for every request and then close it at the end of the request, rather than keeping it around, therefore for every request, we would need to make two RPC calls. I'm a bit confused by this since I imagined a customer application would create a single client connection at the start of the application and then reuse it for serving multiple requests assuming that all the requests are made to the same database. I believe that's why we initialize a session pool with multiple sessions so you can serve concurrent requests. Let me know if I misunderstood something.

Using a cache seems reasonable, or I could also see providing an environment variable which hosts a list of endpoint uri's a user could determine and provide themselves. Whatever we land on, it is definitely desirable to avoid always needing to make this network request. Do these endpoints change regularly, and that necessitates the network request? Or can we reliably pre-determine them?

In any case, I'm not opposed to using a cache. That was actually the original design but some of the client lib owners for other languages said that a cache is unnecessary given that a client connection is only made at the start of an application so it's just one request to figure out the endpoint and then the client is reused for the lifetime of the application. But maybe the PHP implementation is different to those of other languages. If a cache makes more sense for PHP, then I agree that we should implement it.

I don't think the actual endpoints will be changing frequently. But the design from the backend team is for the client to rely on what the backend provides as the endpoint rather than statically declaring the endpoints. So I don't think the idea of an environment variable would work here. We also don't want users to have to decide on which endpoint is appropriate because the backend is in the best position to decide that based on the location of their data. If users decide they want to use a particular endpoint, they can do so by just specifying it in the options when they init the client so no env variable is necessary in that case.

jdpedrie · 2020-01-16T01:00:56Z

So you're saying that a customer using the client would instantiate a new client for every request and then close it at the end of the request, rather than keeping it around, therefore for every request, we would need to make two RPC calls.

Sorry, let me define terms here. In this case, I'm speaking of an HTTP request to a PHP application. Since PHP does not keep state across multiple HTTP requests, each time a user visits the website, we will construct a new PHP instance, along with all the objects (such as the Spanner client) it needs. That means that each user request will add one additional RPC to the total required for the program to communicate with Cloud Spanner.

PHP is different from other languages like Java or Go in that programs written in those language stay alive and listen for incoming HTTP requests. PHP shuts down after each request, or at the very least, does not preserve state. So we couldn't keep the endpoints in memory across multiple requests as one could with a language with a different execution model.

skuruppu · 2020-01-17T00:14:22Z

So you're saying that a customer using the client would instantiate a new client for every request and then close it at the end of the request, rather than keeping it around, therefore for every request, we would need to make two RPC calls.

Sorry, let me define terms here. In this case, I'm speaking of an HTTP request to a PHP application. Since PHP does not keep state across multiple HTTP requests, each time a user visits the website, we will construct a new PHP instance, along with all the objects (such as the Spanner client) it needs. That means that each user request will add one additional RPC to the total required for the program to communicate with Cloud Spanner.

PHP is different from other languages like Java or Go in that programs written in those language stay alive and listen for incoming HTTP requests. PHP shuts down after each request, or at the very least, does not preserve state. So we couldn't keep the endpoints in memory across multiple requests as one could with a language with a different execution model.

Thanks @jdpedrie, I understand the issue much better. In hindsight, I should've realized that.

I talked to @AVaksman and decided on the following design:

Users have to opt into using this feature unlike for the other languages. We'll provide an optional param to the client class where they can specify a cache if they want to use it.
We'll keep the env var in case the user opts in and runs into trouble then they can turn it off quickly.
If user specifies env var but doesn't provide a cache, we'll print a warning and we won't enable the feature.

I hope that makes sense. Let me know if you have any questions. @AVaksman will work on these changes over the next week.

rhiro · 2020-02-07T22:44:44Z

Spanner/src/SpannerClient.php

@@ -141,7 +141,9 @@ public function __construct(array $config = [])
            'projectIdRequired' => true
        ];

-        $this->connection = new Grpc($this->configureAuthentication($config));
+        $config['enableCaching'] = 'true' == strtolower(getenv('GOOGLE_CLOUD_ENABLE_RESOURCE_BASED_ROUTING'));


should be GOOGLE_CLOUD_SPANNER_ENABLE_RESOURCE_BASED_ROUTING

rhiro · 2020-02-07T22:45:30Z

Spanner/tests/Unit/SpannerClientTest.php

@@ -65,6 +66,17 @@ public function setUp()
        ]);
    }

+    public function testResourceCachingEnvVar()
+    {
+        $this->assertTrue(putenv("GOOGLE_CLOUD_ENABLE_RESOURCE_BASED_ROUTING=true"));


should be GOOGLE_CLOUD_SPANNER_ENABLE_RESOURCE_BASED_ROUTING

jdpedrie · 2020-02-21T16:31:32Z

What's the status of this PR is relation to the recent deprecation of the field in question?

AVaksman · 2020-02-21T16:48:03Z

What's the status of this PR is relation to the recent deprecation of the field in question?

@skuruppu

jdpedrie · 2020-02-27T15:09:07Z

@skuruppu given that #2713 was merged, deprecating the feature implemented in this pull request, I believe that this pull request is no longer necessary?

skuruppu · 2020-02-28T02:57:51Z

@skuruppu given that #2713 was merged, deprecating the feature implemented in this pull request, I believe that this pull request is no longer necessary?

I mostly agree but the backend team did ask us to keep the PRs open for now in case they change their minds. If it's ok, I'd like to keep it open for a couple of weeks and then I'll check back with them on whether it's fine to close it.

But if you think having the PR is creating too much noise, we can close it and @AVaksman can push a new branch if we need to reopen.

jdpedrie · 2020-03-02T18:21:12Z

Sure, we can leave it open for the time being!

skuruppu · 2020-04-23T09:22:29Z

We have decided to implement this functionality on the server side so we no longer need to add this support on the client side.

feat: implement resource based routing feature

bfe7a95

AVaksman requested a review from jdpedrie as a code owner January 1, 2020 05:51

googlebot added the cla: yes This human has signed the Contributor License Agreement. label Jan 1, 2020

jdpedrie added the api: spanner Issues related to the Spanner API. label Jan 1, 2020

skuruppu requested review from rhiro and mbril January 13, 2020 03:12

rhiro reviewed Feb 7, 2020

View reviewed changes

rhiro self-requested a review February 7, 2020 22:45

jdpedrie added the do not merge Indicates a pull request not ready for merge, due to either quality or timing. label Mar 2, 2020

skuruppu closed this Apr 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: implement resource based routing feature #2535

feat: implement resource based routing feature #2535

AVaksman commented Jan 1, 2020 •

edited

Loading

jdpedrie commented Jan 1, 2020

AVaksman commented Jan 1, 2020

AVaksman commented Jan 1, 2020

AVaksman commented Jan 1, 2020

jdpedrie commented Jan 7, 2020 •

edited

Loading

AVaksman commented Jan 8, 2020

jdpedrie commented Jan 8, 2020

dwsupplee commented Jan 9, 2020

skuruppu commented Jan 16, 2020

jdpedrie commented Jan 16, 2020

skuruppu commented Jan 17, 2020

rhiro Feb 7, 2020

rhiro Feb 7, 2020

jdpedrie commented Feb 21, 2020

AVaksman commented Feb 21, 2020

jdpedrie commented Feb 27, 2020

skuruppu commented Feb 28, 2020

jdpedrie commented Mar 2, 2020

skuruppu commented Apr 23, 2020

feat: implement resource based routing feature #2535

feat: implement resource based routing feature #2535

Conversation

AVaksman commented Jan 1, 2020 • edited Loading

jdpedrie commented Jan 1, 2020

AVaksman commented Jan 1, 2020

AVaksman commented Jan 1, 2020

AVaksman commented Jan 1, 2020

jdpedrie commented Jan 7, 2020 • edited Loading

AVaksman commented Jan 8, 2020

jdpedrie commented Jan 8, 2020

dwsupplee commented Jan 9, 2020

skuruppu commented Jan 16, 2020

jdpedrie commented Jan 16, 2020

skuruppu commented Jan 17, 2020

rhiro Feb 7, 2020

Choose a reason for hiding this comment

rhiro Feb 7, 2020

Choose a reason for hiding this comment

jdpedrie commented Feb 21, 2020

AVaksman commented Feb 21, 2020

jdpedrie commented Feb 27, 2020

skuruppu commented Feb 28, 2020

jdpedrie commented Mar 2, 2020

skuruppu commented Apr 23, 2020

AVaksman commented Jan 1, 2020 •

edited

Loading

jdpedrie commented Jan 7, 2020 •

edited

Loading