Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GitHub Actions results not appearing on the dashboard #4020

Closed
gsnedders opened this issue Sep 25, 2024 · 20 comments
Closed

GitHub Actions results not appearing on the dashboard #4020

gsnedders opened this issue Sep 25, 2024 · 20 comments

Comments

@gsnedders
Copy link
Member

From web-platform-tests/wpt#48223 (comment):

So:

This seems surprising that it succeeded to upload, and yet the upload doesn't appear as pending, invalid, or show up as complete.

We seem to have logging going into Google Cloud Logging, but I don't have any access to that.

cc @past @DanielRyanSmith @jcscottiii @KyleJu

@KyleJu
Copy link
Collaborator

KyleJu commented Sep 26, 2024

Checking the log, I can confirm that /api/checks/github-actions/ was called with run_id=11039052681 and returned 200. Also I see the logging

Adding %!s(*string=0xc0002990a0) for web-platform-tests/wpt run 11039052681 to upload...

from this line, but it didn't log properly:

log.Infof("Adding %s for %s/%s run %v to upload...", artifact.Name, owner, repo, runID)

It should be *artifact.Name instead.

If I have to guess, it could be an issue with the uploader. Is there a link that I can inspect the wpt results from the artifacts?

@gsnedders
Copy link
Member Author

@KyleJu https://github.com/web-platform-tests/wpt/actions/runs/11039052681 has downloads links for all the artifacts.

@gsnedders
Copy link
Member Author

I guess I'm more interested in any logging from the request we issue for https://wpt.fyi/api/results/create? Because that's presumably working too and giving 200 back.

if err != nil {
logger.Warningf("Failed to write data in api/results/create handler: %s", err.Error())
}
is one case where we could silently fail?

But it seems like it should presumably be managing to schedule the task, it's just for whatever reason it never completes.

@gsnedders
Copy link
Member Author

I guess also logging from /api/results/process?

Because something somewhere is clearly failing.

@KyleJu
Copy link
Collaborator

KyleJu commented Sep 26, 2024

Confirmed that /api/results/create has no error log around the time this run was created.

Looking at /api/checks/github-actions/ again, I noticed successful logging for 11039052681:
Screenshot 2024-09-26 11 51 44 AM

but 10 minutes before that, it also logged failures:

Screenshot 2024-09-26 11 42 42 AM

But it seems like it should presumably be managing to schedule the task, it's just for whatever reason it never completes.

I would agree with this. A job could be scheduled and returned 200, but the task might be failing in the Task Queue. There seems to be lots failures around the same timestamps:

Screenshot 2024-09-26 11 56 07 AM

@gsnedders
Copy link
Member Author

but 10 minutes before that, it also logged failures:

Can you find what run ID this was? I wonder if that's the point at which I tried to submit all the Safari runs, but I can't find any that gave 400 all the way back to the initial request, which I would've expected?

@gsnedders
Copy link
Member Author

Is there any way to find any logging from /api/results/process (i.e., from the results-processor) when the task runs? Because it's kinda surprising they're failing in the results processor.

@KyleJu
Copy link
Collaborator

KyleJu commented Sep 26, 2024

but 10 minutes before that, it also logged failures:

Can you find what run ID this was? I wonder if that's the point at which I tried to submit all the Safari runs, but I can't find any that gave 400 all the way back to the initial request, which I would've expected?

I cannot find any logging for run_id and the payload is empty as well, but the user agent of that request is curl. Maybe that rings a bell?
Screenshot 2024-09-26 4 12 32 PM

After digging through logs in results-precossor, I found these around the same timestamp (note that the severity level isn't labelled properly in processor):
Screenshot 2024-09-26 4 18 19 PM

e.g.

ERROR:processor:Failed to fetch (403): https://api.github.com/repos/web-platform-tests/wpt/actions/artifacts/1978878338/zip

in the url (https://github.com/web-platform-tests/wpt/actions/runs/11039052681) you showed me above, it should be

https://github.com/web-platform-tests/wpt/actions/runs/11039052681/artifacts/1978878338

Would that be the issue?

@gsnedders
Copy link
Member Author

It'll be the difference between what it gives as the URL via the API versus via the web interface; I guess we're getting 403 (Forbidden) because we're not passing a GitHub API token.

We'd have to notice that we're fetching api.github.com from the results-processor and then fetch the GitHub token from the datastore.

That said, the even more significant failing here is that they aren't showing up as invalid runs.

@gsnedders
Copy link
Member Author

From https://cloud.google.com/appengine/docs/standard/services/taskqueue/push/retrying-tasks?tab=python:

If a handler fails to execute a task (and therefore returns any HTTP status code outside of the range 200–299), App Engine retries the task until it succeeds.

So are we just retrying these indefinitely?

Because I don't think we're catching the error anywhere, so it should just bubble up to the Flask endpoint and then Flask will turn it into a 500 Internal Server Error.

@gsnedders
Copy link
Member Author

Reading the code, I think we're not actually retrying here:

p.download(results, screenshots, archives)
if len(p.results) == 0:
_log.error("No results successfully downloaded")
p.update_status(run_id, 'EMPTY', None, callback_url)
return ''

We should just be setting the status as EMPTY, and manually hitting https://wpt.fyi/api/status/empty confirms this.

@gsnedders
Copy link
Member Author

#4025 should fix this, but given this is about API tokens it's hard to verify this is actually fixed aside from waiting for the next deployment.

@gsnedders
Copy link
Member Author

OK, even after #4025, https://staging.wpt.fyi/status is showing all the runs stuck as pending. Can someone take a look at the logs for the staging instance?

@gsnedders
Copy link
Member Author

Ah, after #4025 we're getting:

Traceback (most recent call last):
  File "/env/lib/python3.9/site-packages/flask/app.py", line 1473, in wsgi_app
    response = self.full_dispatch_request()
  File "/env/lib/python3.9/site-packages/flask/app.py", line 882, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/env/lib/python3.9/site-packages/flask/app.py", line 880, in full_dispatch_request
    rv = self.dispatch_request()
  File "/env/lib/python3.9/site-packages/flask/app.py", line 865, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)  # type: ignore[no-any-return]
  File "/app/main.py", line 69, in decorated_func
    return func(*args, **kwargs)
  File "/app/main.py", line 52, in decorated_func
    return func(*args, **kwargs)
  File "/app/main.py", line 114, in task_handler
    resp = processor.process_report(task_id, flask.request.form)
  File "/app/processor.py", line 357, in process_report
    p.download(results, screenshots, archives)
  File "/app/processor.py", line 211, in download
    self._download_archive(archive_url)
  File "/app/processor.py", line 185, in _download_archive
    artifact = self._download_http(archive_url)
  File "/app/processor.py", line 142, in _download_http
    'Authorization': 'Bearer ' + self._github_token,
  File "/app/processor.py", line 125, in _github_token
    return self._secret('github-wpt-fyi-bot-token')
  File "/app/processor.py", line 121, in _secret
    return self.datastore.get(key)['secret']
KeyError: 'secret'

…which is surprising given that's what wpt.fyi itself is reading. So somehow we don't have the secret accessible?

@KyleJu
Copy link
Collaborator

KyleJu commented Oct 15, 2024

Ah, after #4025 we're getting:

Traceback (most recent call last):
  File "/env/lib/python3.9/site-packages/flask/app.py", line 1473, in wsgi_app
    response = self.full_dispatch_request()
  File "/env/lib/python3.9/site-packages/flask/app.py", line 882, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/env/lib/python3.9/site-packages/flask/app.py", line 880, in full_dispatch_request
    rv = self.dispatch_request()
  File "/env/lib/python3.9/site-packages/flask/app.py", line 865, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)  # type: ignore[no-any-return]
  File "/app/main.py", line 69, in decorated_func
    return func(*args, **kwargs)
  File "/app/main.py", line 52, in decorated_func
    return func(*args, **kwargs)
  File "/app/main.py", line 114, in task_handler
    resp = processor.process_report(task_id, flask.request.form)
  File "/app/processor.py", line 357, in process_report
    p.download(results, screenshots, archives)
  File "/app/processor.py", line 211, in download
    self._download_archive(archive_url)
  File "/app/processor.py", line 185, in _download_archive
    artifact = self._download_http(archive_url)
  File "/app/processor.py", line 142, in _download_http
    'Authorization': 'Bearer ' + self._github_token,
  File "/app/processor.py", line 125, in _github_token
    return self._secret('github-wpt-fyi-bot-token')
  File "/app/processor.py", line 121, in _secret
    return self.datastore.get(key)['secret']
KeyError: 'secret'

…which is surprising given that's what wpt.fyi itself is reading. So somehow we don't have the secret accessible?

#4057. I think this must be the cause

@gsnedders
Copy link
Member Author

We seem to be downloading a bunch of artifacts, with no errors (because otherwise we'd get an ERROR logged from requests), and then still hitting "ERROR:processor:No results successfully downloaded": downloaded-logs-20241015-141730.json.

@past
Copy link
Member

past commented Oct 15, 2024

The capitalization fix is now deployed to production.

@gsnedders
Copy link
Member Author

https://staging.wpt.fyi/results/?run_id=6174988006588416 successfully made it up, after #4061 was fixed (though still only on staging).

That said, it does seem to be not labelled with master, which is somewhat problematic.

@past
Copy link
Member

past commented Oct 17, 2024

https://staging.wpt.fyi/results/?run_id=6174988006588416 successfully made it up, after #4061 was fixed (though still only on staging).

#4065 should have gotten the fix on production as well.

@gsnedders
Copy link
Member Author

Let's just treat #4063 as a separate follow-up and resolve this. 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants