Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upload URLs map does not work (--uploadUrlMap) #250

Closed
alekseykulikov opened this issue Mar 20, 2020 · 10 comments · Fixed by #404 · May be fixed by XirdigH/lighthouse-ci#2 or Qdigital/lighthouse-ci#3
Closed

Upload URLs map does not work (--uploadUrlMap) #250

alekseykulikov opened this issue Mar 20, 2020 · 10 comments · Fixed by #404 · May be fixed by XirdigH/lighthouse-ci#2 or Qdigital/lighthouse-ci#3
Labels
bug Something isn't working P1

Comments

@alekseykulikov
Copy link

While working on LHCI Action, I tried to add a support for results comparison for reports stored in Temporary Public Storage. But it seems, that getUrlMap always returns {"success":false}.

LHCI_BUILD_CONTEXT__GITHUB_REPO_SLUG=treosh/lighthouse-ci-action lhci upload \
  --target=temporary-public-storage --uploadUrlMap=true

Also, CI Viewer does not support html as a source of LH reports, and I am not sure how this code works in the case of Temporary Storage:
https://github.com/GoogleChrome/lighthouse-ci/blob/master/packages/cli/src/upload/upload.js#L321

Another improvement would be to add branches support beside master: https://github.com/GoogleChrome/lighthouse-ci/blob/master/packages/cli/src/upload/upload.js#L303

Ideal scenario: in addition to Lighthouse reports in links.json, add support for compare URLs with the latest build to a current branch, and master.

@patrickhulce
Copy link
Collaborator

Thanks @alekseykulikov! Tried to respond to each of your points

But it seems, that getUrlMap always returns {"success":false}.

Were you doing this recently? I recently received lots of suspension emails from GCP that have deleted lots of our storage objects and I'm working through with their support, so I suspect this might be the issue here.

Also, CI Viewer does not support html as a source of LH reports

What led you to believe this? Yes it does.

I am not sure how this code works in the case of Temporary Storage:

CI viewer requests those files on load.

async function loadInitialReports(setBaseReport, setCompareReport, setIsLoading, addToast) {
if (window.location.hostname === 'localhost') {
const lastBaseReport = localStorage.getItem('lastBaseReport');
const lastCompareReport = localStorage.getItem('lastCompareReport');
if (lastBaseReport) setBaseReport(JSON.parse(lastBaseReport));
if (lastCompareReport) setCompareReport(JSON.parse(lastCompareReport));
}
const promises = [
INITIAL_BASE_URL && loadReportFromURL(INITIAL_BASE_URL, setBaseReport),
INITIAL_COMPARE_URL && loadReportFromURL(INITIAL_COMPARE_URL, setCompareReport),
].filter(/** @return {p is Promise<void>} */ p => !!p);
if (!promises.length) return;
setIsLoading(true);
await Promise.all(
promises.map(p =>
p.catch(err => {
console.error(err); // eslint-disable-line no-console
addToast({message: `Failed loading report from URL: ${err.message}`, level: 'error'});
})
)
);
setIsLoading(false);
}

Another improvement would be to add branches support beside master:

Yep, see #82 .

in addition to Lighthouse reports in links.json, add support for compare URLs with the latest build to a current branch, and master.

I'm not sure what you mean by this. Could you elaborate?

@alekseykulikov
Copy link
Author

Hey @patrickhulce, thank you for the reply.
Yes, I tried it a week ago. I retried it now, and it worked, which is quite awesome!

What led you to believe this?

I tried and saw a parsing error. Now, it seems, there're CORS issues, example.

Could you elaborate?

I see how it works now. It stores a comparison URL directly to links.json file.
It works, if we compare one commit from master to another commit in master (like it works by default now).
But it's also useful to compare, current branch (even after multiple commits) to the master, and then compare master after merge. I think, with an explicit branch option, it should be possible.

alekseykulikov added a commit to treosh/lighthouse-ci-action that referenced this issue Mar 20, 2020
@patrickhulce
Copy link
Collaborator

Yes, I tried it a week ago. I retried it now, and it worked, which is quite awesome!

Oh, funny it's still not working for me and I'm still in discussions with GCP support 😆 Perhaps they've resolved some of the issues already 🎉

I tried and saw a parsing error. Now, it seems, there're CORS issues, example.

The CORS issues I'm seeing are intermittent, and probably related to the weirdness we are seeing with storage buckets atm. If you have a reproducible report with a parsing issue I'd love to use it to improve the parsing. It's very rudimentary atm :)

if (s.includes('<script>window.__LIGHTHOUSE_JSON__ = ')) {
const match = s.match(/window\.__LIGHTHOUSE_JSON__ = (.*?});<\/script>/);
if (match) s = match[1];
}

It works, if we compare one commit from master to another commit in master (like it works by default now).

Right now (when everything is actually working on GCP 😄) you should be able to compare any build to the latest current master hash and the link to do that should be generated and put into links.json.

But it's also useful to compare, current branch (even after multiple commits) to the master,

That should be what's generated right now for PR builds, a URL that compares the current hash report to the latest master commit's report. Is this not what you're seeing? (Note: if any of the reports or storage objects are missing then the linking won't work, this is tough time to try to debug this functionality)

and then compare master after merge.

I'm not sure how this is different from "It works, if we compare one commit from master to another commit in master", mind elaborating a little more?

I think, with an explicit branch option, it should be possible.

I must be misunderstanding what type of branch option you want then. It's also possible that the functionality you want is already here and it's just not working right now due to the GCP struggles.

@patrickhulce
Copy link
Collaborator

OK @alekseykulikov I think everything has been resolved on the GCP end at this point based on their communications and what I'm seeing. If you're still seeing any of this behavior, then definitely let me know so I can either reopen with them or fix whatever is wrong in LHCI :)

I think the branch discussion is still worth having to see if there's functionality missing, but bear in mind we acknowledge that the free storage option isn't going to be as comprehensive as the server. They're only stored for 7 days, so investment into more historical comparisons isn't going to be the highest priority.

@alekseykulikov
Copy link
Author

Hey @patrickhulce now the CORS issue has gone, and the viewer opens a report 👍
Regarding the branches, now I see how it works, and it makes a great sense.

They're only stored for 7 days, so investment into more historical comparisons isn't going to be the highest priority.

I agree, comparing the latest branch commit with the latest master is sufficient enough. For the rest, LHCI Server provides a better way to manage history.

Thank you for the clear communication!

@chosak
Copy link
Contributor

chosak commented Aug 5, 2020

I'm seeing this problem fairly reliably. LHCI successfully uploads individual reports but then fails to upload the links.json file to the appropriate endpoint (https://us-central1-lighthouse-infrastructure.cloudfunctions.net/saveUrlMap).

The relevant code is here:

await fetch(SAVE_URL_MAP_URL, {
method: 'POST',
body: JSON.stringify(payload),
headers: {'content-type': 'application/json'},
});

This POST seems to always return a 400 with {"success":false}, both when called as part of LHCI or if I curl manually from the command line. I know this must have worked sometime because random URL reports seem to have been stored and thus I get comparison reports for them -- but most of the time the stored URL map just doesn't get updated.

It'd be nice if this POST had some error handling, like here, to inform the user that something went wrong. This code will catch and log any errors executing the POST, but nothing reported by a successfully executed POST that returns an error.

Is the code running on the other end of these endpoints open source?

(As an aside, it seems like there's a potential security/reliability issue here -- couldn't anyone conceivably overwrite someone's reports, since there's no authentication checks when writing to public storage? This is kind of implied here if you interpret "available to anyone on the internet" as "writeable by anyone on the internet". On the other hand, the free storage is a very nice feature of Lighthouse!)

@patrickhulce patrickhulce reopened this Aug 5, 2020
@patrickhulce patrickhulce added bug Something isn't working P1 and removed needs-more-info labels Aug 5, 2020
@patrickhulce
Copy link
Collaborator

Thanks @chosak!

You do indeed receive a 400 from the API when trying to update the maps when they already exist. This was intentional for the reasons you hint at later. I agree we should at least log a message when this happens even if we don't fail anything, and I'll let this issue track that bug.

Is the code running on the other end of these endpoints open source?

Sadly no. Because it's completely unauthenticated there are a few basic obscurity+rate-limiting factors we rely on to prevent abuse to deter anyone from reverse engineering what restrictions are in place. Nothing that anyone couldn't defeat if they were hellbent on breaking it of course, but they'd just be shutting it down for everyone so 🤞 that continues to not happen :)

As an aside, it seems like there's a potential security/reliability issue here -- couldn't anyone conceivably overwrite someone's reports, since there's no authentication checks when writing to public storage?

It's public write but not public overwrite. Same rough principle as LHCI server security which is that anyone can create data but deleting data requires authorization (which this API does not even have). When it comes to the URL maps (which is just a mapping of test URL -> latest URL of the uploaded report for that URL), we were on the fence about whether to allow edits since there's not really any data being lost, so by default it rejects updates to let the previous one stand (and only gets refreshed when the data expires) but there are ways to bypass the map and overwrite it that we could make available if abuse is a secondary concern.

One could of course argue that an attacker could preemptively write to the map before you and thus prevent you from updating it with the correct information, so I'm sympathetic to enabling overwrites here in this low stakes area.

it seems like there's a potential security/reliability issue here

There are definitely lots of security and reliability concerns with the public storage option. Your data is totally public, anyone in the world can add data as if it came from your project, the attempt at a diff is very loose and based on outdated reports that might not share an ancestor, diff won't work at all for projects that have infrequent main branch commits, etc.

The attitude for public storage is definitely "best effort, no promises", and if a project is concerned about any of them, we'd definitely encourage setting up a private LHCI server instead. I'm open to any and all extra disclaimers that helps convey that beyond what we already have :)

@chosak
Copy link
Contributor

chosak commented Aug 5, 2020

@patrickhulce thank you for the detailed reply.

the attempt at a diff is very loose and based on outdated reports that might not share an ancestor, diff won't work at all for projects that have infrequent main branch commits

I ran into this when trying to set up automated nightly runs of Lighthouse against a website with numerous pages to audit. During initial development, I first was running Lighthouse on only one page, which created the initial map. Subsequently I updated our tests to run against many more pages, and it was very unexpected that the URL map and reference reports weren't getting updated -- only that first URL would show up as a comparison view, while others would always show up as being a first run.

I agree we should at least log a message when this happens even if we don't fail anything, and I'll let this issue track that bug.

It'd be nice as part of this to also log the action of trying to upload, as happens for the GitHub status check. This would make it easier to debug and confirm that LHCI is trying to update the map, since whether this happens can depend on the local environment which might be hard to know after the fact.

If all data expires after 7 days, is the expected behavior that after a week the URL map itself will expire, giving the opportunity to redefine a new one? I believe I've been running regular nightly jobs for longer than that and still not seeing updates.

If it's helpful, for example this Action run generates a links.json file with a whole bunch of URLs (available in the artifact download for that job). And I believe that GITHUB_REPOSITORY should set properly by GHA, so it should be used as the slug when uploading the URL map. But if I check the map here, it refers to a broken (maybe expired?) URL, preventing any of my pages from generating comparison reports. Is it possible that the map expires but still prevents updates?

@patrickhulce
Copy link
Collaborator

patrickhulce commented Aug 5, 2020

Subsequently I updated our tests to run against many more pages, and it was very unexpected that the URL map and reference reports weren't getting updated

I can understand that. You've convinced me we can allow open updates to the mapping :) I suppose anyone can do this today on OSS projects anyway by opening a PR that updates the map haha

It'd be nice as part of this to also log the action of trying to upload, as happens for the GitHub status check. This would make it easier to debug and confirm that LHCI is trying to update the map, since whether this happens can depend on the local environment which might be hard to know after the fact.

SG

If all data expires after 7 days, is the expected behavior that after a week the URL map itself will expire, giving the opportunity to redefine a new one? I believe I've been running regular nightly jobs for longer than that and still not seeing updates.

Is it possible that the map expires but still prevents updates?

Ah, yeah that's certainly possible if the DB records got out of sync with the cloud storage files. Thanks for the concrete example I will look into it 👍

Summary AIs:

  • Add more logging to upload URL map flow (that it's being attempted, error status codes)
  • Allow updates to the URL map
  • Documentation for weaker guarantees around report availability timeframes

@chosak
Copy link
Contributor

chosak commented Aug 5, 2020

Thanks @patrickhulce! I've opened #402 to at least address the logging part.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment