GameServerAllocationPolicy with empty AllocationEndpoints errors on allocate #2011

highlyunavailable · 2021-02-27T01:05:05Z

What happened:

Multi cluster allocation does not work with a local cluster policy defined.

Calling the allocator service with a GameServerAllocationPolicy that has a empty (nil, not present in yaml) set of allocationEndpoints causes the allocator service to fail with {"error":"Marshal called with nil","code":2,"message":"Marshal called with nil"}.

I also cannot set AllocationEndpoints to an empty array, as the CRD validation forces that to be a single item, and even if I edit the CRD to allow it, the same error occurs.

Additionally, the documentation for the REST example is incorrect: It says use '{"namespace":"'${NAMESPACE}'", "multi_cluster_settings":{"enabled":"true"}}' as the data, but the real JSON is '{"namespace":"'${NAMESPACE}'", "multiClusterSetting":{"enabled":"true"}}' which does give the error as above, as does the Protobuf client in examples/allocator-client.

What you expected to happen:

An empty (empty array) or missing (nil) allocationEndpoints should allocate to the local cluster.

How to reproduce it (as minimally and precisely as possible):

Follow the documentation for multi-cluster allocation and make sure the allocationEndpoints value is missing in the GameServerAllocationPolicy. Also, when calling the endpoint, make sure to use the protobuf client or use the correct multiClusterSetting data above - both will repro the error.

Anything else we need to know?:

I've taken a crack at fixing this but I can't figure out where it's attempting to marshal the nil (not really sure how to debug the path through the code - can I just go run this somehow?) and thus can't. Happy to fix this if someone points me in the right direction.

Logs from the allocator service:

agones-allocator-557d479c94-5jh27 agones-allocator {"message":"allocation request received.","request":{"namespace":"default","multiClusterSetting":{"enabled":true}},"severity":"info","source":"main","time":"2021-02-27T01:14:54.688724Z"}
agones-allocator-557d479c94-5jh27 agones-allocator {"gsa":{"metadata":{"namespace":"default","creationTimestamp":null},"spec":{"multiClusterSetting":{"enabled":true,"policySelector":{}},"required":{},"scheduling":"Packed","metadata":{}},"status":
agones-allocator-557d479c94-5jh27 agones-allocator {"error":"Could not find a Ready GameServer","gsa":{"metadata":{"namespace":"agones-system","creationTimestamp":null},"spec":{"multiClusterSetting":{"enabled":true,"policySelector":{}},"required":{},"scheduling":"Packed","metadata":{}},"status":{"state":"","gameServerName":""}},"gsaKey":"agones-system/","message":"failed to allocate. Retrying... ","severity":"warning","source":"*gameserverallocations.Allocator","time":"2021-02-27T01:14:55.1724364Z"}
agones-allocator-557d479c94-5jh27 agones-allocator {"error":null,"message":"allocation response is being sent","response":null,"severity":"info","source":"main","time":"2021-02-27T01:14:55.1725248Z"}
agones-allocator-557d479c94-5jh27 agones-allocator ERROR: 2021/02/27 01:14:55 grpc: server failed to encode response:  rpc error: code = Internal desc = grpc: error while marshaling: proto: Marshal called with nil

The "Could not find a Ready GameServer" error is spurious:

NAME                        STATE   ADDRESS        PORT   NODE             AGE
sd-gameserver-4h755-p9z2z   Ready   192.168.65.3   7134   docker-desktop   14m

Environment:

Agones version: 1.12.0
Kubernetes version (use kubectl version): v1.19.3
Cloud provider or hardware configuration: Docker desktop
Install method (yaml/helm): helm
Troubleshooting guide log(s):
Others:

The text was updated successfully, but these errors were encountered:

highlyunavailable · 2021-02-27T03:54:19Z

Okay, this appears to be a PEBKAC on my part caused by somewhat unclear documentation around this feature and a need to have extra error checks.

Here was the broken GSAP:

apiVersion: multicluster.agones.dev/v1
kind: GameServerAllocationPolicy
metadata:
  name: docker-desktop
  namespace: default
spec:
  connectionInfo:
    clusterName: docker-desktop
    namespace: agones-system
    secretName: local-allocator
    serverCa: [redacted]
  priority: 1
  weight: 100

Fixed by changing the connectionInfo namespace to "default". I figured it out when I noticed the code did this:

if gsa.Namespace != connectionInfo.Namespace {
	gsaCopy = gsa.DeepCopy()
	gsaCopy.Namespace = connectionInfo.Namespace
}

The connectionInfo.Namespace there must be a namespace you have configured to accept game server pods.

That said: I still get the panic: rpc error: code = Internal desc = grpc: error while marshaling: proto: Marshal called with nil if there are no game servers ready (all are allocated) so there's an underlying error here.

highlyunavailable · 2021-02-27T04:13:54Z

The clue is actually right here:

agones-allocator-6d9b48f48b-cp6r6 agones-allocator {"error":null,"message":"allocation response is being sent","response":null,"severity":"info","source":"main","time":"2021-02-27T04:12:46.4578428Z"}

It's most likely trying to marshal the response (which is nil! because it's actually an error!) to a response which obviously isn't gonna fly.

Fixes googleforgames#2011 This resolves an edge case where if all connections in a multicluster allocation failed, the allocation would return (nil, nil) and cause protobuf to fail to serialize the response. Instead, the last result (even if it was unsuccessful!) is returned. An example of this is if all clusters being fully allocated.

Fixes #2011 This resolves an edge case where if all connections in a multicluster allocation failed, the allocation would return (nil, nil) and cause protobuf to fail to serialize the response. Instead, the last result (even if it was unsuccessful!) is returned. An example of this is if all clusters being fully allocated.

highlyunavailable added the kind/bug These are bugs. label Feb 27, 2021

highlyunavailable mentioned this issue Feb 27, 2021

Return last result even if all multicluster allocations fail #2012

Merged

pooneh-m closed this as completed in #2012 Mar 3, 2021

markmandel added this to the 1.13.0 milestone Mar 10, 2021

markmandel added the area/user-experience Pertaining to developers trying to use Agones, e.g. SDK, installation, etc label Mar 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GameServerAllocationPolicy with empty AllocationEndpoints errors on allocate #2011

GameServerAllocationPolicy with empty AllocationEndpoints errors on allocate #2011

highlyunavailable commented Feb 27, 2021 •

edited

Loading

highlyunavailable commented Feb 27, 2021 •

edited

Loading

highlyunavailable commented Feb 27, 2021

GameServerAllocationPolicy with empty AllocationEndpoints errors on allocate #2011

GameServerAllocationPolicy with empty AllocationEndpoints errors on allocate #2011

Comments

highlyunavailable commented Feb 27, 2021 • edited Loading

highlyunavailable commented Feb 27, 2021 • edited Loading

highlyunavailable commented Feb 27, 2021

highlyunavailable commented Feb 27, 2021 •

edited

Loading

highlyunavailable commented Feb 27, 2021 •

edited

Loading