Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v1.27] Intermittent failures: Response body:`Permission 'servicemanagement.services.report' denied on service '<redacted_3rd_party_service>' #510

Closed
ernsheong opened this issue Jan 2, 2019 · 14 comments

Comments

@ernsheong
Copy link

ernsheong commented Jan 2, 2019

Affects ESP release v1.27

403 errors are happening intermittently, and thus the count in the dashboard is not accurate.

INFO:Fetching an access token from the metadata service
INFO:Fetching the service config ID from the rollouts service
INFO:Fetching the service configuration from the service management service
INFO:Attribute zone: us-central1-f
INFO:Attribute project_id: <PROJECT_ID>
INFO:Attribute kube_env: KUBE_ENV
nginx: [warn] Using trusted CA certificates file: /etc/nginx/trusted-ca-certificates.crt
10.128.0.4 - - [02/Jan/2019:21:37:54 +0000] "GET /api/v1/articles?entities=ETH HTTP/1.1" 200 12 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36"
2019/01/02 21:37:56[error]10#10: Failed to call https://servicecontrol.googleapis.com/v1/services/<SERVICE_NAME>.endpoints.<PROJECT_ID>.cloud.goog:report, Error: FORBIDDEN: server response status code: 403, Response body:`Permission 'servicemanagement.services.report' denied on service '<redacted_3rd_party_service>'.
[libprotobuf ERROR external/servicecontrol_client_git/src/service_control_client_impl.cc:182] Failed in Report call: Service control request failed with HTTP response code 403
10.128.0.4 - - [02/Jan/2019:21:38:53 +0000] "GET /api/v1/articles?entities=ETH HTTP/1.1" 200 12 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36"
10.128.0.4 - - [02/Jan/2019:21:38:54 +0000] "GET /api/v1/articles?entities=ETH HTTP/1.1" 200 12 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36"
2019/01/02 21:38:54[error]10#10: Failed to call https://servicecontrol.googleapis.com/v1/services/<SERVICE_NAME>.endpoints.<PROJECT_ID>.cloud.goog:report, Error: FORBIDDEN: server response status code: 403, Response body:`Permission 'servicemanagement.services.report' denied on service '<redacted_3rd_party_service>'.
[libprotobuf ERROR external/servicecontrol_client_git/src/service_control_client_impl.cc:182] Failed in Report call: Service control request failed with HTTP response code 403
10.128.0.4 - - [02/Jan/2019:21:38:55 +0000] "GET /api/v1/articles?entities=ETH HTTP/1.1" 200 12 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36"
2019/01/02 21:38:55[error]10#10: Failed to call https://servicecontrol.googleapis.com/v1/services/<SERVICE_NAME>.endpoints.<PROJECT_ID>.cloud.goog:report, Error: FORBIDDEN: server response status code: 403, Response body:`Permission 'servicemanagement.services.report' denied on service '<redacted_3rd_party_service>'.
[libprotobuf ERROR external/servicecontrol_client_git/src/service_control_client_impl.cc:182] Failed in Report call: Service control request failed with HTTP response code 403
10.128.0.4 - - [02/Jan/2019:21:38:56 +0000] "GET /api/v1/articles?entities=ETH HTTP/1.1" 200 12 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36"
2019/01/02 21:38:56[error]10#10: Failed to call https://servicecontrol.googleapis.com/v1/services/<SERVICE_NAME>.endpoints.<PROJECT_ID>.cloud.goog:report, Error: FORBIDDEN: server response status code: 403, Response body:`Permission 'servicemanagement.services.report' denied on service '<redacted_3rd_party_service>'.
[libprotobuf ERROR external/servicecontrol_client_git/src/service_control_client_impl.cc:182] Failed in Report call: Service control request failed with HTTP response code 403

Reverting to 1.26 seems to have fixed the problem.

@qiwzhang
Copy link
Contributor

qiwzhang commented Jan 2, 2019

What is your deployment platform, e.g. GKE, GCE? Which start_esp flags are you using when you deploy ESP?

@ernsheong
Copy link
Author

It was for GKE. I am not aware of any start_esp flags

@ernsheong
Copy link
Author

ernsheong commented Jan 3, 2019

Oh I suppose you meant something like this:

- name: esp
  image: gcr.io/endpoints-release/endpoints-runtime:1
  args: [
    "--http_port=8081",
    "--backend=127.0.0.1:8080",
    "--service=SERVICE_NAME",
    "--rollout_strategy=managed"
  ]

@qiwzhang
Copy link
Contributor

qiwzhang commented Jan 3, 2019

Thanks. In your case, ESP is using GKE metadata server to get the access token and use it to talk to Google Service Control. That access token is expired every hour, but ESP is re-fetching it every hour too.

How often do you see "report" is rejected by 403?

In your service config, are you using "allow_unregistered_call", if not, then "check" call to service control should be rejected too. In this case, it will reject your requests. Do you see your requests got rejected?

@ernsheong
Copy link
Author

I switched back to 1.27, and the issue seemed to have disappeared.

I am not using allow_unregistered_call anywhere in my config. I will monitor.

Seemingly related: GoogleCloudPlatform/java-docs-samples#873

@ernsheong
Copy link
Author

For some context, this was what I was facing immediately prior to this. Permissions on the service went wonky: https://issuetracker.google.com/issues/122241615

@qiwzhang
Copy link
Contributor

qiwzhang commented Jan 3, 2019

So the problem is not related to ESP release version and it is NOT happening now. I will close this issue. If it happens again, please re-open it. Thanks

@qiwzhang qiwzhang closed this as completed Jan 3, 2019
@kurdybacha
Copy link

kurdybacha commented Apr 18, 2020

It seems the issue is back with 1.49.0 release. I am getting this:

 HTTP/1.1" 403 287
    {
     "code": 7,
     "message": "\b\u0007\u0012_Permission 'servicemanagement.services.check' denied on service '\u003credacted_3rd_party_service\u003e'.",
     "details": [
      {
       "@type": "type.googleapis.com/google.rpc.DebugInfo",
       "stackEntries": [],
       "detail": "service_control"
      }
     ]
    }

No issues with 1.47.0 release.
Cloud Run deployment (for cloud functions).

@nareddyt
Copy link
Contributor

Yes, #779 better surfaces permission errors. It seems like your Cloud Run deployment does not have permission to access service control. This may cause some metrics to be missing, which is why we are enforcing it now (in both ESP and ESPv2).

2 things to ensure all permissions:

@kurdybacha
Copy link

thanks for the reply @nareddyt your first link is broken. Could you re-post, please?

Cloud Run is deployed with default compute service account as per https://cloud.google.com/endpoints/docs/openapi/get-started-cloud-run

@nareddyt
Copy link
Contributor

Ah, it's a link to this section: https://cloud.google.com/endpoints/docs/openapi/get-started-cloud-run#checking_required_services

gcloud services enable servicemanagement.googleapis.com
gcloud services enable servicecontrol.googleapis.com
gcloud services enable endpoints.googleapis.com
gcloud services enable $ENDPOINTS_SERVICE_NAME

Ok, the default compute service account should have the Project Editor role by default. That should be fine. So the issue is most likely the first one (above).

@kurdybacha
Copy link

thank you @nareddyt

These below are enabled on the project all the time:

gcloud services enable servicemanagement.googleapis.com
gcloud services enable servicecontrol.googleapis.com
gcloud services enable endpoints.googleapis.com

The only issue I can see is that behavior for the Google Cloud Endpoints after deployment and gcloud services enable $ENDPOINTS_SERVICE_NAME has changed.

Use case: Google Cloud Endpoints service on Cloud Run is created during CI pipeline with different ENDPOINTS_SERVICE_NAME on every build (build-id prefix), enabled right after the deployment with gcloud services enable $ENDPOINTS_SERVICE_NAME and followed with some integration tests that now fail with the 403 error on random endpoints' paths. If I retry on 403 then endpoints service eventually catches up, 403 disappears (after ~200 ms) and tests pass. I had a similar issue before but that was 401 error, now with 1.49.0 release it is 401 followed by 403...

When can I know that service is operational? What would be recommended way to handle that case? Is retrying on 401 and 403 enough?

@nareddyt
Copy link
Contributor

Like you said, gcloud services enable $ENDPOINTS_SERVICE_NAME is most likely the root cause.

Enabling a service can take a few minutes to propagate across the GCP control plane. So it seems your CI system deploys ESP and starts running tests before the propagation has finished. Note that in production (where you don't create a new Endpoints Service every time), you shouldn't run into this issue.

Before release 1.49.0, we did not check if the service was fully enabled. This can cause missing metrics in the first few minutes of traffic. So in 1.49.0 we essentially only allow traffic (with API keys) once the service is fully enabled. Hence why you start noticing these errors now.

One workaround is to just wait 5 minutes after doing the enable. This is how we handle it in CI for ESPv2 (we have a similar CI setup that you described).

You can also keep retrying based on the status code, but note that #785 changes these to 500 errors for consistency. This change will occur in the next release of ESP, so it might be easier to just use the time-based workaround.

@nareddyt
Copy link
Contributor

nareddyt commented Apr 20, 2020

FYI I did not realize that the current implementation results in a 401 followed the by 403. Anyways, these codes ESP uses are incorrect since these are not errors caused by the client request, but control plane permissions. As mentioned above, ESP should only respond with 500 for this situation in the next release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants