2024-06-25 sndhdr update and HD/CD/DVD Image files #87
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Should close #85
SNDHDR Parity update (and HD/CD/DVD Image files)
.aif/.aiff/.aiffc/.8svx:
These are IFF based files, all start with
0x464f524d
/FORM
AIFF files are a hodgepodge of formats and specs all thrown under the same label, different compression styles or similar compression styles with the wrong FourCC can render a file unplayable on certain software.
I've updated/tidied the database to recognise the additional
AIFF
orAIFC
header at byte 8. With possible enhancements under V2 we could perform further matches to detail compression used and possible even bitrates etc....aif
I've removed the trailing00
to allow it access to the multi-part section for better confidences.aiff
@cdgriffith had already addedAIFF
and8SVX
at byte 8 in multi-part, corrected MIME for AIFF["41494646", 8, ".aif", "audio/x-aiff", "AIFF/Amiga/Mac audio"]
,["38535658", 8, ".aif", "audio/x-aiff", "AIFF/Amiga/Mac audio"]
and["41494643", 8, ".aiffc", "audio/x-aifc", "AIFC audio"]
as these are covered by other matches in the0x464f524d
/FORM
match format..au:
The existing fingerprint should match all files
No changes, we could extract more info but looking at how
sndhdr
does it I'll leave that for a V2 upgrade.hcom:
There exists almost no information on the format, what there is, is basically the same data as linked below in differing formats. From what I can see it's some old Apple Mac format possibly used in apps and games.
The
sndhdr
test looks for two headers, one is in the Mac header, the other in the Mac data fork. For the time being I have added them as two separate tests, this will give a low-ish confidence score, however, in the absence of test files there is little more I can do.If anyone ever reads this and has some sample files, I'll take a look to improve this match.
.sndt:
After a lot of digging I found this format seems to belong to a very old Win 3.1 era program called SoundTool/SNDTOOL, I managed to source a copy buried in a shareware .iso at archive.org. Downloading it and comparing a sample file included to the ones below seems to indicate this is the source of these files.
.voc/.wav:
No changes required, existing fingerprint will match any VOC/WAV file.
V2 Improvements could look to decode audio data for sample rate etc...
.sb/.ub/.ulaw:
Cannot add, .sb and .ub are intended to be signed or unsigned byte-streams as far as I can guess the intentions of the
sndhdr
authors. This means they are simply a stream of bytes that hold audio data, knowledge of the correct bitrate etc.. then decodes them back to audio..ulaw is essentially a CODEC used in various audio containers such as AIFF and AU, this again means there is no specific
ulaw
file format.In these cases, there is not a lot we can do to detect these files. It would basically require creating an audio decoder similar to
sndhdr
or Audacity, VLC etc. to fully process and try to understand these files. This could be possible with V2 but this would take on a life of its own..sndr:
I have no idea on this, I've added the header match from
sndhdr
but again without test files or knowledge of the program they came from we can't go any better than that.Again, if anyone reading this has any test files of the program that made it, I'll take a look and improve.
Other formats:
Honestly this PR is a bit of a so so one, so let's add some extras stuff to make it more exciting.
.vhdx:
The updated version of the older .vhd format used by Microsoft Hyper-V and Virtual PC, nice simple header of
0x7668647866696c65
/vhdxfile
..qcow/.qcow2/.qed:
QEMU's Hard drive image formats. Simple headers with version numbers
0x514649fb00000001
/QFIû ��
for QCOW Image0x514649fb00000002
/QFIû �
for QCOW20x514649fb00000003
/QFIû �
for QCOW3 (Still .qcow2 extension)0x514544
/QED
for QEMU Enhanced Disk Image.luks
Linux Unified Key Setup is another HD Image format, there are two versions LUKS1 and LUKS2.
0x4c554b53babe0001
/LUKSº¾ �
for LUKS10x4c554b53babe0002
/LUKSº¾ �
for LUKS2It's an interesting format that has an embedded .json, future V2 functionality could interrogate the files to display encryption type and other data.
.vdi
Sun/Oracle HD Image for use with VirtualBox. Nice long headers to match against. There is no official document on the format it seems but a good breakdown is available, linked below.
0x3c3c3c2053756e2078564d205669727475616c426f78204469736b20496d616765203e3e3e
/<<< Sun xVM VirtualBox Disk Image >>>
for older Sun images0x3c3c3c204f7261636c6520564d205669727475616c426f78204469736b20496d616765203e3e3e
/<<< Oracle VM VirtualBox Disk Image >>>
for newer Oracle imagesAs far as I can see there is only one version (1.1) with the same image signature starting at byte 64 for both flavours, I've included it as a multi-part for completeness.
.vmdk
There are already entries in the .json for VMWare .vmdk files, I have tidied and adjusted some to better match real world files
["4b444d", 0, ".vmdk", "application/octet-stream", "VMware 4 Virtual Split Disk file"]
Removed as the correct match is below it in the .json["23204469736b2044657363726970746f", 0, ".vmdk", "application/octet-stream", "VMware 4 Virtual Split Disk file"]
Corrected to include better match using the full term# Disk DescriptorFile
and changed the name toVMware Image Descriptor File
["23204469736b2044", 0, ".vmdk", "application/octet-stream", "VMware Virtual Disk description"]
Removed as above fix is better match434f5744/COWD
and4b444d56/KDMV
labels, these files do different jobs, they are not for different versions of VMWare.dmg
The venerable archive format of Mac OS machines, the existing entry would only ever work for the file it came from. The correct way to identify a .dmg is to use a footer match at -512 for
koly
.["7801730d626260", 0, ".dmg", "application/octet-stream", "MacOS X image file"]
and["", 0, ".dmg", "application/octet-stream", "MacOS X image file"]
removed, new entry infooter
addedOK, Even more formats
I'll note here the CD/DVD images are a real pain in the backside, lots of overlapping headers and proprietary info. This is a good start for later V2 fun.
MagicISO Image Format .uif
A seemingly much hated proprietary format for storing images of CD/DVD's. Can't find any test files or documentation, however, there is UIF2ISO which converts the files to regular ISO. Digging in the source seems to show a header at byte 0 of
0x73696262
/sibb
with another match at byte 8 of0x72686c62
/rhlb
if it's encrypted.If I ever come across a real file to test against I'll confirm this but the code has been around a long time so it's pretty safe to assume it's correct.
PowerISO Direct Access Archive .daa
Another proprietary format for storing images of CD/DVD's, much like .uif it's also pretty unpopular. The author of UIF2ISO also created a tool to deal with them called DAA2ISO.
Simple header of
0x444141
/DAA
at byte 0gBurner Image .gbi
Another proprietary format for storing images of CD/DVD's, it appears to be quite similar to .daa as DAA2ISO handles both.
Simple header of
0x474249
/GBI
at byte 0Apple HyperCard Stack .hc
While I was looking for data on another .hc extension, HyperCards popped up, so we'll add them in while we're here. HyperCards were almost a pre-cursor to web pages, able to store text and images in a clickable, searchable database. Header of
0x5354414b
/STAK
at byte 4VeraCrypt File Container .hc
An encrypted image container, we can only add this as an extension as the
VERA
header at byte 64 and all data following is encrypted by the 64 byte salt.Nero Disc images *.nrg
Nero was once one of the most popular CD/DVD burning tools, the .nrg was their own custom image format. These use Footer matches for the two versions
0x4e45524f
/NERO
at -8 and0x4e455235
/NER5
at -12 for v1 and v2 images.Compressed ISO images .isz
Created by EZB Systems for use in their various products, this is an open specification for producing ZLIB compressed version of ISO images. Header is
0x49735a21
/IsZ!
at byte 0DiscJuggler images .cdi
Padus DiscJuggler was a professional mastering solution for CD and DVD. Due to their .cdi image format being highly flexible, it got adopted as the de-facto format for archiving Dreamcast games. There appear to be a few versions. Adding as an extension only, looking at the source for cdi2nero it's a complex format that would need a partial port of that app to understand them, looking at libMirage confirms this idea.
CloneCD Control File .ccd, Image .img and Subchannel Info .sub
CloneCD is another powerful CD/DVD image tool. The .ccd contains various metadata relating to the .img file. Official specs on the format are non-existent it seems, I've inferred the matches from samples from a range of sources. Much like .cdi above some form of decoding may be the way to go in the future, looking at libMirage confirms this idea.
0x5b436c6f6e6543445d
/[CloneCD]
as it's first line. There are versions on the next line, but as it's a text file spacing\tabs could cause match issues. A regex solution would be best for extracing that info.0xffffffffffffffffffffffff
then a few bytes after which may be some sort of versioning.BlindWrite images .b5t / .b6t and BlindRead images .bwt
BlindWrite and it predecessor BlindRead are another set of CD/DVD Imaging tools. Much like CloneCd they can produces various files to preserve important onformation about the source disk. Most of these will be extension only for the time being as I lack sample files and cannot find much about the format.
0x425754352053545245414d205349474e
/BWT5 STREAM SIGN
0xffffffffffffffffffffffff
based on source code to b5i2iso)0x425754352053545245414d205349474e
/BWT5 STREAM SIGN
WinOnCD images .c2d
While browsing the libMirage source for other formats, this one was in the list. This was an early entry into the CD mastering market, it changed hands a couple of times from Roxio to Adaptec. Two headers
0x4164617074656320436551756164726174205669727475616c43442046696c65
/Adaptec CeQuadrat VirtualCD File
and526f78696f20496d6167652046696c6520466f726d617420332e30
/Roxio Image File Format 3.0
Adaptec Easy CD/DVD Creator image file .cif
Another CD/DVD creator software purchased by Adaptec from Corel, header info from libMirage. This use a RIFF header then at byte 8
0x696d6167
/imag
.There are earlier versions of the format that used
.cl2
,.cl3
and.cl4
but there is no info on these formats beyond that, will add as extension only until samples files are found.Alcohol 120% image file .mds and GameJack image file .xmd
Another powerful CD/DVD image creator, like BlindWrite and CloneCD it can make near perfect copies of most discs.
There's not much info on GameJack, it's either a licensed or questionable clone of Alcohol.
0x4d454449412044455343524950544f5201
/MEDIA DESCRIPTOR�
Daemon Tools image file .mdx
Pretty much one of the most popular virtual drive tools, it's been around for a very long time.
0x4d454449412044455343524950544f5202
/MEDIA DESCRIPTOR�
which is nearly identical to Alcohol's expect the last byteApple Toast File .toast
Toast is a early CD burning software package for Macs, it's changed hands many times of the years.
Early toast files have a header of
45520200
/ER�
. Later toast files are simply .iso with a different name.Links: