Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi part reverse lookup #59

Merged
merged 7 commits into from
Apr 18, 2024
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 13 additions & 1 deletion puremagic/magic_data.json
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,15 @@
"52494658": [
["4647444d", 8, ".dcr", "", "Adobe Shockwave"],
["4d563933", 8, ".dir", "", "Macromedia Director file format"]
],
"4352454d" : [
["444f4e4500000000", -8, ".ctm", "", "CreamTracker module"]
],
"3c747261636b206e616d653d22" : [
["3c2f747261636b3e0a", -9, ".pt2", "", "PicaTune 2 module"]
],
"3c6d6c74" : [
["3c2f6d6c743e0a", -7, ".mlt", "", "Shotcut project"]
]
},
"footers": [
Expand Down Expand Up @@ -1330,6 +1339,9 @@
["6674797068656973", 4, ".heic", "image/heic", "HEIC Image format (HEIS scalable)"],
["667479706865696d", 4, ".heic", "image/heic", "HEIC Image format (HEIM multiview)"],
["667479706865766d", 4, ".heic", "image/heic", "HEIC Animated Image format (HEIM multiview)"],
["6674797068657673", 4, ".heic", "image/heic", "HEIC Animated Image format (HEIS scalable)"]
["6674797068657673", 4, ".heic", "image/heic", "HEIC Animated Image format (HEIS scalable)"],
["4352454D", 44, ".ctm", "", "CreamTracker module"],
["3c747261636b206e616d653d22", 0, ".pt2", "", "PicaTune 2 module"],
["3c6d6c74", 38, ".mlt", "", "Shotcut project"]
]
}
41 changes: 28 additions & 13 deletions puremagic/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -151,20 +151,35 @@ def _identify_all(header: bytes, footer: bytes, ext=None) -> List[PureMagicWithC
for matched in matches:
if matched.byte_match in multi_part_header_dict:
for magic_row in multi_part_header_dict[matched.byte_match]:
start = magic_row.offset
end = magic_row.offset + len(magic_row.byte_match)
if end > len(header):
continue
if header[start:end] == magic_row.byte_match:
new_matches.add(
PureMagic(
byte_match=header[matched.offset : end],
offset=magic_row.offset,
extension=magic_row.extension,
mime_type=magic_row.mime_type,
name=magic_row.name,
if "-" in str(magic_row.offset):
start = magic_row.offset
end = magic_row.offset + len(magic_row.byte_match)
match_area = footer[start:end] if end != 0 else footer[start:]
if match_area == magic_row.byte_match:
new_matches.add(
PureMagic(
byte_match=matched.byte_match + magic_row.byte_match,
offset=magic_row.offset,
extension=magic_row.extension,
mime_type=magic_row.mime_type,
name=magic_row.name,
)
)
else:
start = magic_row.offset
end = magic_row.offset + len(magic_row.byte_match)
if end > len(header):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if "-" in str(magic_row.offset):
start = magic_row.offset
end = magic_row.offset + len(magic_row.byte_match)
match_area = footer[start:end] if end != 0 else footer[start:]
if match_area == magic_row.byte_match:
new_matches.add(
PureMagic(
byte_match=matched.byte_match + magic_row.byte_match,
offset=magic_row.offset,
extension=magic_row.extension,
mime_type=magic_row.mime_type,
name=magic_row.name,
)
)
else:
start = magic_row.offset
end = magic_row.offset + len(magic_row.byte_match)
if end > len(header):
start = magic_row.offset
end = magic_row.offset + len(magic_row.byte_match)
if magic_row.offset < 0:
match_area = footer[start:end] if end != 0 else footer[start:]
if match_area == magic_row.byte_match:
new_matches.add(
PureMagic(
byte_match=matched.byte_match + magic_row.byte_match,
offset=magic_row.offset,
extension=magic_row.extension,
mime_type=magic_row.mime_type,
name=magic_row.name,
)
)
else:
if end > len(header):

Haven't verified the logic / tested myself but just wanted to provide a bit of python specific cleanup. Moving the start and end outside the if statements as they are the same, and check magic_row.offset < 0 instead of against a string (if that's a problem for some reason let me know.)

Copy link
Contributor Author

@NebularNerd NebularNerd Mar 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cdgriffith, Tested and no issues, I have merged it into the PR. Thanks for the suggestions, my coding skills are ok but I know there's always room for improvement. 🙂

I used a str() as I had brain fog and skipped past the correct < 0 method, both achieve the same goal, just mine was the long way round. 🤣

continue
if header[start:end] == magic_row.byte_match:
new_matches.add(
PureMagic(
byte_match=header[matched.offset : end],
offset=magic_row.offset,
extension=magic_row.extension,
mime_type=magic_row.mime_type,
name=magic_row.name,
)
)
)

matches.extend(list(new_matches))
return _confidence(matches, ext)
Expand Down