Skip to content
This repository has been archived by the owner on Dec 1, 2022. It is now read-only.

More canvas fingerprinting #45

Closed
otksm opened this issue Dec 7, 2021 · 11 comments
Closed

More canvas fingerprinting #45

otksm opened this issue Dec 7, 2021 · 11 comments

Comments

@otksm
Copy link

otksm commented Dec 7, 2021

URL(s) where the issue occurs (mandatory)

See below.

Describe the issue (mandatory)

Canvas Fingerprinting. I have decided to file these as a single issue because it's essentially the same script and I don't want to spam the issue tracker.

Note that one or two sites have a \/[-_0-9a-zA-Z]{4,}\/ instead of \/[-_0-9a-zA-Z]{6,}\/ because I have seen the length of first path gone as low as 4 there. I did not lower it for other sites just in case there might be false positives (not that I have seen any).

! https://www.ana.co.jp/
/^https:\/\/www\.ana\.co\.jp\/[-_0-9a-zA-Z]{6,}\/[-\/_0-9a-zA-Z]{30,}$/$script,1p,domain=ana.co.jp

! https://www.asos.com/
/^https:\/\/www\.asos\.com\/[-_0-9a-zA-Z]{4,}\/[-\/_0-9a-zA-Z]{30,}$/$script,1p,domain=asos.com

! https://www.comptoirdescotonniers.co.jp/
/^https:\/\/www\.comptoirdescotonniers\.co\.jp\/[-_0-9a-zA-Z]{6,}\/[-\/_0-9a-zA-Z]{30,}$/$script,1p,domain=comptoirdescotonniers.co.jp

! https://www.flypeach.com/
/^https:\/\/www\.flypeach\.com\/[-_0-9a-zA-Z]{6,}\/[-\/_0-9a-zA-Z]{30,}$/$script,1p,domain=flypeach.com

! https://www.helmutlang.com/
/^https:\/\/www\.helmutlang\.com\/[-_0-9a-zA-Z]{6,}\/[-\/_0-9a-zA-Z]{30,}$/$script,1p,domain=helmutlang.com

! https://www2.hm.com/ja_jp/index.html
/^https:\/\/www2\.hm\.com\/[-_0-9a-zA-Z]{6,}\/[-\/_0-9a-zA-Z]{30,}$/$script,1p,domain=hm.com

! https://www.jal.co.jp/jp/ja/
/^https:\/\/www\.jal\.co\.jp\/[-_0-9a-zA-Z]{6,}\/[-\/_0-9a-zA-Z]{30,}$/$script,1p,domain=jal.co.jp

! https://www.jetstar.com/jp/ja/home
/^https:\/\/www\.jetstar\.com\/[-_0-9a-zA-Z]{6,}\/[-\/_0-9a-zA-Z]{30,}$/$script,1p,domain=jetstar.com

! https://www.theory.co.jp/
/^https:\/\/www\.theory\.co\.jp\/[-_0-9a-zA-Z]{6,}\/[-\/_0-9a-zA-Z]{30,}$/$script,1p,domain=theory.co.jp

! https://www.zara.com/jp/ (via https://www.zara.com/)
/^https:\/\/www\.zara\.(com|cn)\/[-_0-9a-zA-Z]{4,}\/[-\/_0-9a-zA-Z]{30,}$/$script,1p,domain=zara.com|zara.cn

! https://zozo.jp/
/^https:\/\/zozo\.jp\/[-_0-9a-zA-Z]{6,}\/[-\/_0-9a-zA-Z]{30,}$/$script,1p,domain=zozo.jp


! https://github.com/Yuki2718/adblock/issues/44#issuecomment-987118029

! https://www.aa.com/homePage.do?locale=en_US
/^https:\/\/www\.aa\.com\/[-_0-9a-zA-Z]{6,}\/[-\/_0-9a-zA-Z]{30,}$/$script,1p,domain=aa.com

! https://www.americanairlines.jp/
/^https:\/\/www\.americanairlines\.jp\/[-_0-9a-zA-Z]{6,}\/[-\/_0-9a-zA-Z]{30,}$/$script,1p,domain=americanairlines.jp

! https://www.costco.com/
/^https:\/\/www\.costco\.com\/[-_0-9a-zA-Z]{6,}\/[-\/_0-9a-zA-Z]{30,}$/$script,1p,domain=costco.com

! https://www.ibm.com/
/^https:\/\/www\.ibm\.com\/[-_0-9a-zA-Z]{6,}\/[-\/_0-9a-zA-Z]{30,}$/$script,1p,domain=ibm.com

! https://www.santanderbank.com/
/^https:\/\/www\.santanderbank\.com\/[-_0-9a-zA-Z]{6,}\/[-\/_0-9a-zA-Z]{30,}$/$script,1p,domain=santanderbank.com

! https://www.shopdisney.com/
! https://www.shopdisney.co.uk/
/^https:\/\/www\.shopdisney\.(com|co\.uk)\/[-_0-9a-zA-Z]{6,}\/[-\/_0-9a-zA-Z]{30,}$/$script,1p,domain=shopdisney.com|shopdisney.co.uk

Versions (mandatory)

  • Browser and version: Firefox Desktop
  • uBlock Origin version: 1.39.2

Filters (mandatory)

Yuki's uBlock Japanese filters

Notes

I can confirm that the randomized path always ends with a capital B or C (at least for now). If we are not going to use one filter for each site then it might be worth trying something like /[-_0-9a-zA-Z]{6,}\/[-\/_0-9a-zA-Z]{29,}[BC]$/$script,1p,match-case,domain= to minimize the possibility of false positives.

@Yuki2718
Copy link
Owner

Yuki2718 commented Dec 7, 2021

Thanks for reporting. I'll check tomorrow as these are a lot and will take time.

@Yuki2718
Copy link
Owner

Yuki2718 commented Dec 8, 2021

These are really many sites and all are kinda sensitive site. Given what @JobcenterTycoon said in another issue which seems to be true, I'll put them in Paranoid. This is what I call "Fingerprint for the sake of security", nowadays very common, and blocking them occasionary causes serious issue e.g. easylist/easylist#6075 (comment) . I'll keep gu-global.com and uniqlo.com in main list though, as these are already added to uBlock Privacy and AGTPF - many more use base so hopefully any trouble will be reported.

@otksm
Copy link
Author

otksm commented Aug 28, 2022

@Yuki2718 @JobcenterTycoon Just a small update.

I think this is more Akamai BM than Ipqualityscore (unless Akamai also acquired them) if we check out https://www.yoox.com/ in the Wayback Machine. The path for this fingerprinting script was clearly garbled as of January 2022, but if we go back to to 2019-ish the path became (public|resources|static|assets) as described in easylist/easylist#6075 (comment). And if we go further back to 2018 then the path was just plain /_bm/bd-1-30. Their script content also looked largely identical to me.

This is where things get interesting: The script is actually identical across most akamai sites (but not on yoox.com), and is relatively stable. From what I have seen it only changed once during April 2022 (unlike yoox.com where it changes at least once a week) so I have dropped the regex and started filtering with ETags instead.

*$script,strict1p,header=etag:"a7a61709860c0c57ec0c92584ae4f1bc214dfc71043ea43843572e55d14841f6"
*$script,strict1p,header=ETag:"a7a61709860c0c57ec0c92584ae4f1bc214dfc71043ea43843572e55d14841f6"

Obviously we do have to update it every time the script changes, but the upside is that there is no false positive.

For yoox.com I used /\/[-_0-9a-zA-Z]{4,}\/[-\/_0-9a-zA-Z]{25,}$/$script,strict1p,domain=yoox.com and the only false positive I have seen is another script that follows the (public|resources|static|assets) pattern. The last 35 characters of the randomized path is unique for each site and remains static over time. The rest of the path changes every 24 hours but its length is always stays the same for each site. I think this might be another tracking script and it's also present on most akamai sites listed on https://www.ynap.com/pages/about-us/what-we-do/monobrand/. If we wanted to pinpoint on the canvas fingerprinting script only, then I found the following working quite well so far:

/\/[-_0-9a-zA-Z]{4,}\/[-\/_0-9a-zA-Z]{25,}$/$script,strict1p,header=cache-control:max-age=21600,domain=yoox.com|net-a-porter.com|mrporter.com|maison-alaia.com|armani.com|armaniexchange.com|chloe.com|dunhill.com|store.ferrari.com|isabelmarant.com|karl.com|missoni.com|redvalentino.com|stoneisland.com|therow.com

However valentino.com needs its own solution.

/\/[-_0-9a-zA-Z]{4,}\/[-\/_0-9a-zA-Z]{25,}$/$script,strict1p,header=strict-transport-security:max-age=31536000,domain=valentino.com

Obviously the $header= bit is redundant if we start targeting them per-site; I am just leaving them there as PoC.

==============================================

The sensor_data sent was completely scrambled for yoox.com and others since at least May 2022, and Akamai seems to be rolling them out in stages to other sites recently (e.g. www.delta.com, www.fedex.com, www.singaporeair.com). Most Japanese sites served via Akamai are unaffected for now. I was hoping to sleep on this longer but I think this is a good time to observe the difference between scrambled and unscrambled sensor_data before 100% rollout.

The updated script seems to be stable for now:

*$script,strict1p,header=etag:"b1fbafbb19e6a354988fb6e7e8072e707b4f52bef9b86c4d172f28d5c8a14c62"
*$script,strict1p,header=ETag:"b1fbafbb19e6a354988fb6e7e8072e707b4f52bef9b86c4d172f28d5c8a14c62"

So yeah maybe this is a case for enabling $header= support in uBO by default? Alternatively we can try /\/[-_0-9a-zA-Z]{4,}\/[-\/_0-9a-zA-Z]{25,}$/$script,strict1p in uBO legacy (where there is no $header= support) and see if anyone reports any breakage.

Other related issues include uBlockOrigin/uAssets#10012 and AdguardTeam/AdguardFilters#104312 .

@Yuki2718 Yuki2718 reopened this Aug 28, 2022
@Yuki2718
Copy link
Owner

header=etag will be safer and thus maybe we should reconsider to add to uBlock filters - Privacy. @JobcenterTycoon what do you think?

@Yuki2718
Copy link
Owner

Unfortunately, /\/[-_0-9a-zA-Z]{4,}\/[-\/_0-9a-zA-Z]{25,}$/$script,strict1p can not be made tokenizable.

Yuki2718 added a commit that referenced this issue Aug 28, 2022
@JobcenterTycoon
Copy link

Im prefer the safer way

@Yuki2718
Copy link
Owner

$header is currently not default-enabled. I thought we had internal discussion about AkamaiBM and probably have to discuss something for the option to be enabled. Also I'm not sure if removing etag is enough or not, and @otksm the etag value now I see on these sites is 8409041ac476a7510c102db57c136d81cff01ee43c3f229561f4cfa05eb78940 - it may not be very stable?

@otksm
Copy link
Author

otksm commented Aug 29, 2022

To be fair I only spotted the rollout? a few days ago so I didn't really observe it for long. It could be changing daily/weekly like the one on yoox.com, which is why I included the cache-control example. The other etag still works on sites like ana.co.jp and yodobashi.com though.

If untokenizable regex is not very workable then maybe we can consider assigning a new symbol for the end of hostname/ start of path? Would that help with the anchoring?

@Yuki2718
Copy link
Owner

The regex works, being untokinizable means the number of such filters propotionally affects performance and there's no more mitigation than specifying as much ($script,strict1p). And there's still concern of false positive.

@otksm
Copy link
Author

otksm commented Sep 3, 2022

On closer inspection

8409041ac476a7510c102db57c136d81cff01ee43c3f229561f4cfa05eb78940

is actually the current hash for the script on yoox.com so yeah I think we can assume that it will be changing approximately weekly whenever sensor_data is scrambled.

@Yuki2718
Copy link
Owner

Yuki2718 commented Sep 5, 2022

The etag value now I see on www.net-a-porter.com/en-jp/ is 2000504ef61113a85c086866f4d548af5a3cac1c69647b9a9abc7222b6e000b7. It's likely revolving and keep maintaining such a filter will be burden for volunteer, not to mention header filtering is currently turned off by default. The rest possibility is the regex rule but I'm not willing to add it as generic one, gorhill expressed concern of any risky generic rule in another past case and he won't be happy to add generic untokenizable regex rule. I close this for now, but if anyone find something interesting please comment on and share your findings.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants