dano is a wrapper for ffmpeg
that checksums the internal file streams of ffmpeg
compatible media files, and stores them in a format which can be used to verify such checksums later. This is handy, because, should you choose to change metadata tags, or change file names, the media checksums should remain the same.
- Non-media path filtering (which can be disabled)
- Highly concurrent hashing (select # of threads)
- Several useful modes: WRITE, TEST, PRINT, DUMP, or DUPLICATES
- Select from multiple checksum algorithms (default: murmur3, MD5, adler32, CRC32, SHA160, SHA256, SHA512)
- Option to decode the stream before executing the hash function
- Write to xattrs or to hash file (and always read back and operate on both)
To me, first class checksums are one thing that sets the FLAC music format apart. FLAC supports the writing and checking of the streams held within its container. When I ask whether the FLAC audio stream has the same checksum as the stream I originally wrote to disk, the flac
command tells me whether the checksum matches:
➜ flac -t 'Link Wray - Rumble! The Best of Link Wray - 01-01 - 02 - The Swag.flac'
Link Wray - Rumble! The Best of Link Wray - 01-01 - 02 - The Swag.flac: ok
The question is -- why don't we have this functionality for video and other media streams? The answer is, of course, we do, (because ffmpeg
is incredible!) we just never use it. dano
aims to make what ffmpeg
provides easier to use.
For lossless files, this means we can confirm that a lossless file decodes to the exact bitstream we encoded, but, for all files, this means our checksums are stable against metadata changes, file name changes, and/or moving a bitstream, or many bitstreams, from one media container into another.
So, for example, when I ask whether a media stream has the same checksum as when I originally wrote it to disk, dano
tells me whether the checksum matches:
➜ dano -w 'Sample.mkv'
murmur3=2f23cebfe8969a8e11cd3919ce9c9067 : "Sample.mkv"
➜ dano -t 'Sample.mkv'
"Sample": OK
# Now change our file's name and our checksum still verifies,
# because the checksum is stored in a xattr
➜ mv 'Sample.mkv' 'test1.mkv'
➜ dano -t 'test2.mkv'
"test1.mkv": OK
# Now change our file's metadata and *write a new file in a
# new container* and our checksum is the *same*
➜ ffmpeg -i 'test1.mkv' -metadata author="Kimono" 'test2.mp4'
➜ dano -w 'test2.mp4'
murmur3=2f23cebfe8969a8e11cd3919ce9c9067 : "test2.mkv"
Of course you can. dano
will even import your FLAC file's checksums directly:
# Import dano checksum from FLAC and write to an xattr
➜ dano --import-flac 'Pavement - Wowee Zowee_ Sordid Sentinels Edition - 02-02 - 50 - We Dance.flac'
MD5=fed8052012fb6d0523ef3980a0f6f7bd : "Pavement - Wowee Zowee_ Sordid Sentinels Edition - 02-02 - 50 - We Dance.flac"
Writing dano hash for: "Pavement - Wowee Zowee_ Sordid Sentinels Edition - 02-02 - 50 - We Dance.flac"
No old file data to overwrite.
# Verify checksum is the same as the checksum embedded in the FLAC container
➜ metaflac --show-md5sum 'Pavement - Wowee Zowee_ Sordid Sentinels Edition - 02-02 - 50 - We Dance.flac'
fed8052012fb6d0523ef3980a0f6f7bd
# Verify the decoded FLAC audio stream is the same as the xattr checksum
➜ dano -t 'Pavement - Wowee Zowee_ Sordid Sentinels Edition - 02-02 - 50 - We Dance.flac'
"Pavement - Wowee Zowee_ Sordid Sentinels Edition - 02-02 - 50 - We Dance.flac": OK
I get it! For serious collectors, if you can't verify your checksums later when you change the album art, what use is a checksum?
dano
allows you have to store a stable checksum, and verify it later, just like FLAC:
# To test, this we will create an ALAC copy of a FLAC file
ffmpeg -i 'Pavement - Wowee Zowee_ Sordid Sentinels Edition - 02-02 - 50 - We Dance.flac' -acodec alac 'Pavement - Wowee Zowee_ Sordid Sentinels Edition - 02-02 - 50 - We Dance.m4a'
# Write dano checksum to an xattr
➜ dano -w --only=audio --decode --hash-algo=md5 'Pavement - Wowee Zowee_ Sordid Sentinels Edition - 02-02 - 50 - We Dance.m4a'
MD5=fed8052012fb6d0523ef3980a0f6f7bd : "Pavement - Wowee Zowee_ Sordid Sentinels Edition - 02-02 - 50 - We Dance.m4a"
Writing dano hash for: "Pavement - Wowee Zowee_ Sordid Sentinels Edition - 02-02 - 50 - We Dance.m4a"
No old file data to overwrite.
# Verify checksum is the same as the decoded FLAC audio stream
➜ metaflac --show-md5sum "Pavement - Wowee Zowee_ Sordid Sentinels Edition - 02-02 - 50 - We Dance.flac"
fed8052012fb6d0523ef3980a0f6f7bd
# Verify the decoded ALAC audio stream is the same as the xattr checksum
➜ dano -t "Pavement - Wowee Zowee_ Sordid Sentinels Edition - 02-02 - 50 - We Dance.m4a"
"Pavement - Wowee Zowee_ Sordid Sentinels Edition - 02-02 - 50 - We Dance.m4a": OK
If you've ever used fdupes
, you know how cool it is. fdupes
can recursively search a file tree to discover duplicate files. The only issue is -- what if some of your media files have the same internal bitstreams, but distinct file checksums? Perhaps such bitstreams are contained within different/distinct containers, and/or have different file metadata/tags attached?
dano
makes it easy to find such duplicate media, based upon their internal bitstreams:
# To test, create a copy
➜ cp 'Pavement - Wowee Zowee_ Sordid Sentinels Edition - 02-02 - 50 - We Dance.flac' 'Pavement - Wowee Zowee_ Sordid Sentinels Edition - 02-02 - 50 - We Dance-copy1.flac'
# Copy will not contain a hash, so we will create one
➜ dano -w -x ./*
murmur3=ff95fc73a64ace424964f30af3ed932 : "./Pavement - Wowee Zowee_ Sordid Sentinels Edition - 02-02 - 50 - We Dance-copy1.flac"
No new file paths to write.
Overwriting dano hash for: "./Pavement - Wowee Zowee_ Sordid Sentinels Edition - 02-02 - 50 - We Dance-copy1.flac"
# Now, find duplicates
➜ find . -type f | dano --dupes
murmur3=ff95fc73a64ace424964f30af3ed932 : "./Pavement - Wowee Zowee_ Sordid Sentinels Edition - 02-02 - 50 - We Dance-copy1.flac"
murmur3=ff95fc73a64ace424964f30af3ed932 : "./Pavement - Wowee Zowee_ Sordid Sentinels Edition - 02-02 - 50 - We Dance.flac"
WARN: Duplicates found.
Inspired by hashdeep
, md5tree
, flac
, and, of course, ffmpeg
.
For Debian-based distributions (like Ubuntu), I maintain a personal package archive, or PPA. See the linked repository for instructions on how to use.
For Debian-based and Redhat-based Linux distributions (like, Ubuntu or Fedora, etc.), check the tagged releases for native packages for your distribution.
You may also create and install your own native package from the latest sources, like so:
➜ curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
➜ cargo install cargo-deb
➜ git clone https://github.com/kimono-koans/dano.git
➜ cd ./dano/; cargo deb
# to install on a Debian/Ubuntu-based system
➜ dpkg -i ./target/debian/dano_*.deb
# or convert to RPM
➜ alien -r ./target/debian/dano_*.deb
# and install on a Redhat-based system
➜ rpm -i --replacefiles ./dano*.rpm
For now, dano
depends on ffmpeg
and metaflac
if you want to import FLAC files.
You may install rustup
and build dano
like so:
➜ curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
➜ cargo install dano
Note: In addition to what your package manager or OS may provide (for instance, apt install rustc cargo
, security-minded users may be interested to know that there are alternative methods for installing the rustc
compiler and cargo
besides the method described above, which allow you to verify the rustup
before install.
Caveat: dano
is only tested on MacOS and Linux, and will probably only compile and run on Unix-y Rust supported platforms, but a Windows is version is likely to compile with only minor changes. My further thoughts on a Windows version can be found in this linked issue.