Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create whitelist to filter 3rd party user timing out of resulting HAR #257

Closed
beenanner opened this issue Jan 13, 2017 · 6 comments
Closed

Comments

@beenanner
Copy link
Member

Noticed today that google ads now have user timings being picked up in the results. The issue is that they are setting unique IDs on the marks which when combined with sitespeed.io and graphite it slowly fills up the entire disk. :'( My thought is we should implement a whitelist where a user can specify a regex to only add timers that match. For a first pass we could just accept an array of userTimings and expand to the regex if use-cases arise. The other option is since this only really affects graphite we could push this change downstream into sitespeedio code base since it's really only necessary to filter.

@soulgalore
Copy link
Member

Keep this as a reminder of what it looks like:
usertimings

@soulgalore
Copy link
Member

Fixed this for now in sitespeed.io: sitespeedio/sitespeed.io@12a4eb3

@jpvincent
Copy link

Had the problem too, thanks for the fix
I was about to prefix all my performance.mark calls anyway, because I start to see 3rd parties using it. Since it's a global chanel, prefixing it looks like a good option, so on the siteSpeed side, if we could allow to whitelist with a RegExp or simply a prefix, that would be fantastic

@soulgalore
Copy link
Member

@jpvincent I only fixed it for the User Timings we fetch in Browsertime, seen that the WPT API will still keep them, so that needs to be fixed. Also @beenanner long term fix will be perfect however muting the goog_* will be there forever I think, if they don't change.

@jpvincent
Copy link

you're right, I'm still polluted.
A Whitelist with a prefix or a RegExp at the sitespeed.io level would be perfect for my use case

@soulgalore
Copy link
Member

I can make a quick fix for WPT to just skip those timings maybe tonight, since there's no use at all for someone except Google? And then we take on the better solution later on. Best case would be if someone at Google could fix the origin of the problem :)

soulgalore added a commit to sitespeedio/sitespeed.io that referenced this issue Jan 17, 2017
This is probably the strangest commit I've done so far:
Google start using User Timing for there ads, overloading
all use cases where User Timings is automatically picked up
(thank you Google for beeing a responsible 3rd party provider).

WebPageTest started to mute all User Timings called goog_* in
the frontend but the API still sends them so sitespeed.io users
using WPT still got them. This commit removes the user timings
so they aren't sent to Graphite.

See sitespeedio/browsertime#257
beenanner added a commit that referenced this issue Jan 25, 2017
beenanner added a commit that referenced this issue Jan 27, 2017
* #257 - implement whitelist option for userTimings

* #257 - check for whitelist option before generating regex

* #257 - move whitelist code to filter.js

* #257 - add tests

* #258 - updating cli info based on feedback

* #258 - update changelog to mention new option
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants