-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Version 2.0 Goals #70
Comments
Could #69 be a new feature for 2.0? Compatibility wise the new field would/should not break anything (that I'm aware of). |
Hi, found out your project via "Explore repositories" on github.com homepage feed |
I just found this: https://mark0.net/soft-trid-e.html Not sure how well it is known but it contains "over 17k file types". The file signatures does not have an explicit data license attached to it, but at the very least it might be useful to compare against maybe related: |
TrID is one of the oldest filetype sites/software out there. That site has looked near enough the same for decades. Their database is pretty solid and very extensive. But they cannot generate a confidence or process more complicated searches. For example .SBK Creative Soundfont is only handled as an extension where as we can handle looking at the file in two places to generate a match. |
Starting work on adding more advanced scanners. Rough right now, but have detection for unusual PDFs #94 and better ZIP type format detection #102 (MS Office, Open Office, JAR, APK, etc...) https://github.com/cdgriffith/puremagic/tree/deep-scan/puremagic/scanners Before release want to add scanners for:
Still need to do:
Won't be able to work on more myself for at least two weeks, hence this in progress documentation. Biggest help would be testing framework for scanners if anyone wants to contribute to a part of this! |
Just had a quick skim through the code and this is awesome stuff. The zip method is way better coded that I can manage but I can see it works as I sort of thought it would in my head. If I want to help fill in some of the .zip what's the best way? I'm guessing I need to fork the dev branch? Looking at the two examples I can see the rough ideas of how to improve some of the more complex formats I've mentioned in my PR's. For example, we could heavily reduce the size of the .json by shoving all the .mp3 related stuff I added into a dedicated scanner. That in itself would likely be smaller than the .json entries data size as we would not need to repeat everything so heavily. |
Now that
puremagic
is picking up some outside traction, and used in places like MongoDB, want to lay out clear future plans.Please keep comments on this page limited to overall goals, any specific conversations about any goal should be their own issue and will be updated here.
The text was updated successfully, but these errors were encountered: