-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrong redirection on Amazon #30
Comments
I can't reproduce your example, but I see what you mean and your proposal doesn't sound like a bad idea at all. I'll have to do a bit of thinking first though. Given your example, matching on everything until the first slash won't work. Extracting the domain can be tough due to top level domains (https://en.wikipedia.org/wiki/Second-level_domain) which I wouldn't want to hardcode... |
I found https://publicsuffix.org/ yesterday, @crssi also referenced in #29 but I am not really a fan of introducing a sync functionality or doing it manually every month. It also is a very long list and checking that for all requests with a redirect could slow down the whole process. And the whole point for me is speed up (I don't care much about the privacy aspect here). So I thought I could implement a simple heuristic: do not skip if the domain matches or ends with the same 10? (could be configurable) characters. It's not perfect, but could cover 90% of all the cases? |
OK... 2 problems
|
I made a quick test and searching through the names takes something between 3 and 15 ms. A lot of variation there. Not sure whether that's acceptable, but anyway I don't think the list is right for the purpose of this ticket. What I'd need is a list of top level domain plus second level domains so that splitting by dots I can identify amazon.com or amazon.co.uk as the right domain. So I basically want |
I hadn't realized it would be so difficult to figure out if something is in the same domain! But of course it makes sense now that I think about it. I think you'd need more than top level domain plus second level domains. It looks like in Japan three-level public domains are very common, or things like blogspot.co.uk, or as many as five-level ones with amazonaws. I don't see that there's any robust option other than using the public suffix list. But there must be some optimized search code already available that you could use for it? For example, for domains ending in .com, there are relatively few entries to check. If you can't use the list, then I'd say probably just forget about it, because it won't work the way people expect.
Not sure what you mean by that... |
@sblask: |
That's not what I was talking about. I just meant that I can't just split on dots to figure out the right domain. Given this example: I would need to extract There is a library that seems to do the right thing: https://github.com/oncletom/tld.js but it's rather complex and there is no simple way to include it... |
I am just thinking out loud here and now (brainstorming). Let's start with First part, that is part to 1st
for Let's go with a longer example, which does not happen very often...
I am not sure if I have explained well, but at least I tried. :) Cheers |
There is no support for SQLite in webextensions, but I think I found a way to do this using sets - lookup should be instantaneous. I finally understood the format of the suffix list. I think this should work: create 3 sets:
And then similarly to what you suggest split the domain and lookup bit by bit - just in the opposite direction. It goes like this:
|
I have suggested the backward solution because I have a feeling that it would need less steps and would be more optimal (CPU regarding) and faster. Cheers |
It's the same thing, but I find it easier to wrap my head around it, especially with exceptions and wildcards. Might need more steps, but using sets that shouldn't be a problem. Now I just need to find the time to implement it... |
You are a champ. :) Thank you |
There's also this: It looks very simple to include as a submodule, and to use it's just one "psl.get" call for the "from" and "to" URLs, gives you the base domain names to compare. From Smart Referer's main.js: try {
let toURI = new URL(request.url);
let fromURI = new URL(referer.value);
// Check if this request can be dismissed early by either being a perfect
// source host / target host match OR by being whitelisted somewhere
if(fromURI.host === toURI.host || policy.allows(fromURI.host, toURI.host)) {
return;
}
// Attempt lax matching only if we, in fact, have hostnames and not IP addresses
let isIPAddress = !isHostnameIPAddress(fromURI.host) && !isHostnameIPAddress(toURI.host);
if(isIPAddress && !options.strict) {
// Parse the domain names and look for each of their base domains
let fromHostBase = psl.get(fromURI.host);
let toHostBase = psl.get(toURI.host);
// Allow if the both base domain names match
if(fromHostBase && toHostBase && fromHostBase === toHostBase) {
return;
}
}
} catch(e) {
console.exception(e);
} Another optimization, besides the "perfect match" as above, is a simple check if the top-level and 2nd level, based on splitting by dots, are not the same. In either of those cases, which is going to be the vast majority, you don't need to do the full check. You could even do a fast pre-check for the most common two-level public domains, like .co.uk, before calling the full check. |
I had a look at the psl code. For every domain it goes through all entries of the public suffix list. I would have to check two domains, so if my performance check is right, that would be up to 30 ms and that's really rather long. Now that I figured out the algorithm, it shouldn't be too hard to implement it myself. I like the idea of the pre-check. I'll do some benchmarking and if it's still rather slow, I can add that too. I would not want to judge what the most common domains are though ;-) |
See @gorhill's implentation for publicsuffix.org. https://github.com/gorhill/publicsuffixlist.js It is apparently much faster. |
Just finished my own implementation of getting the domain using the public suffix list: b17aca7 I made a new benchmark. Not quite comparable to my previous one because that was done in the browser console and this is done on command line. Also not comparable to the one from gorhill, especially because he didn't list the actual hostnames so I couldn't use the same. I also don't handle punycode - I don't know whether the webextension API uses punycode or unicode and I didn't have good examples to test with. Anyway a comparison of searching through all rules and using my set implementation looks like this:
The difference is quite significant (as expected), but even the search doesn't look quite as bad as my previous result. Next step: integrating this into the skipping logic and adding an option. |
Ok, it's done! But I could not upload a new version yet because the add-on page died on me. You can test with Firefox by cloning this repo and running npm install and npm run:stable in the directory you cloned into. For Chrome, just clone into a directory and then use "Load unpacked extension" from the extension settings. |
I am eager to try, but will be back in two weekes.
In the mean time your AMO might get a life back. :)
Thank you
|
New version is out, if you find any issues, please open a new issue. |
Seems to work well so far, thanks! |
You are a true wizard. Thank you. :) But will know more when I return from vacation in about 2 weeks and will let you know. :) |
The "same public domain" check doesn't appear to be working properly. The login form on www.chase.com redirect to chaseonline.chase.com and then to secure05b.chase.com. All are obviously chase.com, but the add-on prevents logging in and pops up a message about skipping the chaseonline.chase.com redirect. |
@petteyg I can't reproduce your problem. Do I need an account? When I click on signin, I get stuff like |
There's my problem. I was misreading the checkbox and thought it should be on. |
Nah, not your problem obviously. I made the same mistake. @sblask I will write you more about to make more clearly when I am back in a week or so. |
I also had the same experiences, and chalked it up to me being an idiot :-) But now I don't feel so bad... I'll open a new issue about it. |
OK... reading it again, it make sense. Cheers |
Hi, thanks for this addon!
I had a problem with Amazon, being unable to go to sellers' storefronts. Eventually I solved it by putting sellercentral.amazon.com on the blacklist of Skip Redirect. I guess you don't do fixes for specific websites, but I wonder then if it would be helpful to have an option to not skip redirects if they go to the same domain? Mostly I don't like click-tracking for privacy reasons, but within the same domain it doesn't matter so much, and it would reduce site breakages like this.
Go to for example this seller's page:
https://www.amazon.com/sp?seller=AEHA1AAWZIC3L
At the top of that page, click "Apevia storefront", which has the link:
https://www.amazon.com/shops/AEHA1AAWZIC3L?ref_=v_sp_storefront
That seems to go to here:
https://sellercentral.amazon.com/gp/sc-redirect/seller-page.html/ref=v_sp_storefront[blahblah-mysterious-information]&servername=www.amazon.com&type=shops
But Skip Links then just goes to the root page of www.amazon.com, instead of to the storefront.
The text was updated successfully, but these errors were encountered: