Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to block a domain pattern? #689

Closed
s783smith opened this issue Sep 6, 2015 · 5 comments
Closed

How to block a domain pattern? #689

s783smith opened this issue Sep 6, 2015 · 5 comments

Comments

@s783smith
Copy link

Is there a way to block a domain pattern? Or at least a domain name longer than N characters?

lKCvYeuhyHKKWQQR7Vvq.com
34AouRhahGd2ssxh93zN6O.com
PF8MdbTAwMCnAWA8nnLvs.com
@gorhill
Copy link
Owner

gorhill commented Sep 6, 2015

Only a regex can take care of this. For efficiency purpose, the regex would need to be as specific as can be given the regex expression will be tested for every single URL.

This one takes care of your examples + any subdomain of these + any protocol:

/^[a-z-]+:\/\/(?:[0-9a-z.-])*[0-9A-Za-z]{20,}\.com/

If subdomains are not an issue:

/^[a-z-]+:\/\/[0-9A-Za-z]{20,}\.com/

If protocol can only be http/https with no subdomains:

/^https?:\/\/[0-9A-Za-z]{20,}\.com/

This would be even better if you further narrow with filter options: third-party, request type (image, script, etc), these might prevent the regex from being tested at all if the party and/or request type do not first match. Examples:

/^https?:\/\/[0-9A-Za-z]{20,}\.com/$third-party,script

@RoxKilly
Copy link

RoxKilly commented Sep 7, 2015

@gorhill 2 quick ones:

  1. Once you've refined the regex, where do you place it? My Rules? My Filters?
  2. Why do you not include the end-of-string special character at the end of the regex? Instead of \.com/ why not \.com$/ ? Without that, wouldn't "reallyLongButGibberishDomain**.com**munity.edu" also match?

@vzjrz
Copy link

vzjrz commented Sep 7, 2015

  1. My Filters
  2. That would match only exactly lKCvYeuhyHKKWQQR7Vvq.com and not lKCvYeuhyHKKWQQR7Vvq.com/something but you're right about it matching reallyLongButGibberishDomain.community.edu I guess a better rule would be /^https?:\/\/[0-9A-Za-z]{20,}\.com\// Keep in mind that this regex blocks all long domains and even legitimate long domains will be blocked.

@bulrush15
Copy link

The regex goes in My Filters and it looks like this.

/[A-Za-z9-0-]{15,}.com/

This blocks domains which are at least 15 characters long and end in ".com", but it blocks ALL those domains, not just randomly generated nonsense domains. So it will block "privateinternetaccess.com".
I think it will be very difficult to block the long domains you want and keep the "good" ones.

@RoxKilly
Copy link

How about:
/^https?:\/\/(?=.*[A-Z])(?=.*[a-z])(?=.*[0-9])[\w]{18,}\.com/$third-party

This will block the URL if it meets all of the following:

  1. It contains at least one capital letter,
  2. It contains at least one lowercase letter,
  3. It contains at least one digit,
  4. It contains only alphanumeric characters or the underscore ("_"),
  5. The domain contains at least 18 characters,
  6. It's a .com domain,
  7. The protocol is http or https, and
  8. The request is for a third-party resource (meaning you did not actually surf directly to that domain)

If you're concerned only about scripts, then add ,script to the end of the filter. This filter is specific enough that false-positives are very unlikely IMO.

@gorhill gorhill closed this as completed Sep 14, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants