Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

My New Filterlist #3551

Open
thedoggybrad opened this issue Apr 21, 2023 · 10 comments
Open

My New Filterlist #3551

thedoggybrad opened this issue Apr 21, 2023 · 10 comments
Labels
directory-data changes to basic FilterLists data

Comments

@thedoggybrad
Copy link

thedoggybrad commented Apr 21, 2023

@iam-py-test
Copy link
Contributor

iam-py-test commented Apr 21, 2023

I have a few questions (not affiliated with this project, just curious).
Where do you get the entries for this list? The README mentions Phishing Domain Database and The Big List of Hacked Malware Web Sites
Also, there are a lot of duplicated entries:
image
Thanks!

@collinbarrett collinbarrett added the directory-data changes to basic FilterLists data label Apr 21, 2023
@collinbarrett
Copy link
Owner

@iam-py-test , what tool did you use to find those duplicates? Just curious.

@gwarser
Copy link
Contributor

gwarser commented Apr 21, 2023

Visible in uBO

image

@iam-py-test
Copy link
Contributor

iam-py-test , what tool did you use to find those duplicates? Just curious.

I used https://abpvn.com/ruleChecker/redundantRuleChecker.html (DandelionSprout recommends it in the adfilt README, that's how I found it), but @gwarser's method works too (though this shows the specific redundant rules).
I am working on a PR to remove some of the redundant rules, but there are too many to do by hand and my Python script keeps wanting to change the line endings from CRLF to LF, which makes the diff show I changed every single line.

@jarelllama
Copy link
Contributor

I recently had to deal with this issue on my own blocklist. Here is a snippet of code in Bash to find redundant entries:

while read -r entry; do
    grep "\.${entry#||}$" adblock.txt >> redundant_entries.txt
done < adblock.txt

# The output has a high chance of having duplicates
sort -u redundant_entries.txt -o redundant_entries.txt

This assume your list only has entries in the form of ||example.com^. The code loops through each entry and converts it into a pattern to be matched by grep. grep looks for other entries that are subdomains (of any level) of the current entry. The whole process takes quite long (takes about 45 seconds for my 2300 rule ABP list).

I'm going to feed the redundant entries file into my list building script so it ignores the entries in the file.

@thedoggybrad
Copy link
Author

I will try to fix those duplicates. I have not checked for it. Let me fix it.

@thedoggybrad
Copy link
Author

I have a few questions (not affiliated with this project, just curious). Where do you get the entries for this list? The README mentions Phishing Domain Database and The Big List of Hacked Malware Web Sites Also, there are a lot of duplicated entries: image Thanks!

What you have said is right. Just compiled them.

@iam-py-test
Copy link
Contributor

iam-py-test commented May 21, 2023

Also, one small comment on the README. IMO "uBlock" is garbage and shouldn't be recommended as an option to use this list with; it was unmaintained for years and then recently removed it's code from GitHub and started pushing updates again. The developer(s) have done shady stuff in the past (tracking users, stealing code), and doesn't even have a functional options page, so it's not even possible to install any non-default lists in it:
image
It's also blocked as malicious by several blocklists, including uBo's default badware risks.

@thedoggybrad
Copy link
Author

thedoggybrad commented May 22, 2023

@iam-py-test Thanks for that, removing it ASAP on my readme of all my filterlists
(Update: Sucessfully removed on the readmes of all my filterlists.)

By the way, the duplication of filters are fixed.

@thedoggybrad
Copy link
Author

thedoggybrad commented May 22, 2023

@iam-py-test
Thanks for making me aware of what is happening on uBlock now. Before it was almost looking like the same as uBlock Origin.
What I know is that uBlock is the original one but due to conflicts between 2 repository owners the original owner maked uBlock Origin. Before, I have read some recommendations on uBlock Origin's filterlist (issues on repository) itself suggesting not to use uBlock. Now, the Github code for uBlock has been removed, I was surprised to know that and immediately looked for it myself. I am not actually a fan of uBlock either.

By the way, I am using uBlock Origin on my web browsers. So I am definetly not testing my filterlists on other adblocks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
directory-data changes to basic FilterLists data
Projects
None yet
Development

No branches or pull requests

5 participants