Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

☂️ OYB Next Steps #6442

Closed
4 of 6 tasks
patrickhulce opened this issue Nov 1, 2018 · 8 comments
Closed
4 of 6 tasks

☂️ OYB Next Steps #6442

patrickhulce opened this issue Nov 1, 2018 · 8 comments
Assignees
Labels

Comments

@patrickhulce
Copy link
Collaborator

patrickhulce commented Nov 1, 2018

❔ ❔ ❔ Curious what OYB is? ❔ ❔ ❔

Summary
I did a quick repurposing of DZL to analyze where our time is going in aggregate. This is the top 30 average timings across ~400 runs today

timing-gather-loadblank                 1103.33
timing-gather-afterpass-accessibility    827.09
timing-gather-afterpass-optimizedimages  825.39
timing-gather-afterpass-fontsize         797.62
timing-init-config                       779.76
timing-gather-gettrace                   697.89
timing-audit-speed-index                 578.59
timing-gather-getbenchmarkindex          510.86
timing-audit-screenshot-thumbnails       500.83
timing-config-requireaudits              472.83
timing-gather-afterpass-starturl         248.76
timing-audit-unminified-javascript       243.60
timing-audit-metrics                     241.27
timing-gather-loadpage-offlinepass       234.72
timing-gather-loadpage-redirectpass      230.89
timing-config-requiregatherers           214.00
timing-gather-afterpass-scripts          170.01
timing-gather-afterpass-cssusage         165.33
timing-gather-afterpass-metarobots       146.49
timing-gather-afterpass-offline          122.49
timing-gather-afterpass-crawlablelinks   116.67
timing-audit-estimated-input-latency     116.23
timing-audit-first-contentful-paint      112.46
timing-audit-load-fast-enough-for-pwa    101.72
timing-audit-is-on-https                  90.27
timing-audit-unminified-css               86.65
timing-audit-offscreen-images             81.61

Looking closer at the top 10...

  1. Load blank. core: don't load blank page twice in first pass #6369 helped, but it's still a bigger impact to our runtime on average than any of our audits 😱, we might want to experiment with shorter blank times until we can revive data URIs (core(gather-runner): load a blank data URI, rather than about:blank #4310)
  2. axe. Not much we can do here immediately, I've already filed a couple issues on some of the worst offenders. Might indicate to us it's worth investing a bit of resources to help improve their perf story?
  3. OptimizedImages. We've already done the biggest thing here by moving it to Chrome side. Only obvious work I see here would reduce our functionality or move more into estimation/heuristics.
  4. FontSize. @hoten is already on the case!!
  5. Init config. This is surprising. I'm on it!
  6. Get trace. Already aware of it and on it Reduce the size of the trace #3596
  7. Speed Index. @brendankenny was on it with wasm and I have some more ideas if that isn't good enough.
  8. Get Benchmark Index. We should move to running this only in core(throttling): add option to slowdown to a device class #6162 settings.
  9. Screenshot Thumbnails. Not a ton we can do here, maybe look into wasm or native browser resizing?
  10. Require audits. Also a little surprising. I'm not sure there's any obvious low hanging fruit, it's probably just the evaluation/parsing of every script we have.

Summary of Near-term AIs:

@patrickhulce patrickhulce self-assigned this Nov 1, 2018
@paulirish
Copy link
Member

great analysis. AIs look good!

OptimizedImages

Can we parallelize the protocol work?

Init config. This is surprising.

yeah, very.

Get trace.

Yeah we're collecting this as fast as possible, so reducing categories/trace-size is the only headroom.

Screenshot Thumbnails

Brendan had some ideas on constructing the JPEG ArrayBuffers in JS, using some boring nearest neighbor logic.

@brendankenny
Copy link
Member

this is really great!

Init config. This is surprising.

I believe this includes requireAudits and requireGatherers, so that could be where most of the time is going. I remember at some point I was looking at optimizing some of our json re-parsing and merging shenanigans and it was all quite fast regardless, so I gave up.

Screenshot Thumbnails

Brendan had some ideas on constructing the JPEG ArrayBuffers in JS, using some boring nearest neighbor logic.

I think you're remembering what @patrickhulce wrote and actually landed :) We can look at how v8 handles the code, but at first glance it doesn't look like it's going to get much faster.

Maybe the time is actually from jpeg decoding? screenshot-thumbnails competes with speed-index for who gets the blame for the speedline computed artifact based on which one goes first. It's also possible that they end up requesting different frames and duplicate some work (since speedline lazily decodes and neither screenshot-thumbnails or speed-index decode all frames)

@brendankenny
Copy link
Member

Might indicate to us it's worth investing a bit of resources to help improve their perf story?

@paulirish and I looked at one or two major axe performance sinks recently (the table one? and something else) and there's definitely opportunity. axe is complicated, though, with a lot of idiosyncrasies and layered recursive calls, so it's going to take some time investment if we want to get involved.

@paulirish
Copy link
Member

I think you're remembering what @patrickhulce wrote and actually landed :)

Ha! Apparently so! :)

aXe

I found one recent win that's pretty easy: dequelabs/axe-core#1172 (discussion has led to a diff solution, but also straightforward).

After this though is probably either table stuff or color contrast. Both rather spooky. 🎃

@patrickhulce
Copy link
Collaborator Author

I believe this includes requireAudits and requireGatherers, so that could be where most of the time is going

Ha, yeah I realized this as soon as I started digging in, there's only <100ms headroom after the require parts. It's also quite possible this is very different in LR since locally it means hitting the filesystem to require in everything while LR is executing our browserified module functions.

Maybe the time is actually from jpeg decoding?

Yeah that's probably true too, maybe your wasm JPEG will get us some wins here for free?

@connorjclark
Copy link
Collaborator

RE: Load blank - there's a WIP branch and details about the blocker here: #3707 (comment)

@brendankenny
Copy link
Member

It's also quite possible this is very different in LR since locally it means hitting the filesystem to require in everything while LR is executing our browserified module functions.

I was also thinking we could replace the time we spend loading all locale files with a single dynamic import, but good point, it will only make a difference for node/cli (maaaaybe we'd get a lazy parsing bonus but that's hard to hit)

@patrickhulce
Copy link
Collaborator Author

Revisiting this ~20 months later, I think we've addressed the key AIs here already and much has changed since that analysis. If run time becomes an important issue again we can do another round of investigations and takeaways.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants