Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle Disk Image Formats (CHD, CSO, RVZ, etc.) #937

Closed
maxexcloo opened this issue Feb 14, 2024 · 25 comments · Fixed by #1233
Closed

Handle Disk Image Formats (CHD, CSO, RVZ, etc.) #937

maxexcloo opened this issue Feb 14, 2024 · 25 comments · Fixed by #1233
Assignees
Labels
enhancement New feature or request up-for-grabs Issues to consider for external contribution

Comments

@maxexcloo
Copy link
Contributor

Is your feature request related to a problem?

Currently I can't handle any of my disk based systems using igir as they're in different formats - it would be nice to be able to handle conversion and checking.

Formats that can be converted losslessly include:

  • CHD
  • CSO
  • RVZ

Describe the solution you'd like

When running with input directories containing disk images offer either an option for temporary conversion, checking and then a move and rename of the original file or a conversion and move of the converted files.

Compressing the file afterwards wouldn't help with the checksum or compression so it could be good to store a hash alongside for speed?

An example of a tool that handles this (but has other problems that don't make it suitable for my use) is https://github.com/alucryd/oxyromon

Additional context

More than happy to help with this one, it could be an optional feature as the dependencies are rather annoying to deal with (see this Dockerfile for example: https://github.com/maxexcloo/docker-oxyromon/blob/main/Dockerfile).

@maxexcloo maxexcloo added the enhancement New feature or request label Feb 14, 2024
@maxexcloo
Copy link
Contributor Author

maxexcloo commented Feb 16, 2024

Another example of a way to check CHD files: https://github.com/j68k/verifydump

@emmercm
Copy link
Owner

emmercm commented Feb 20, 2024

I had seen oxyROMon before, but I didn't realize how many compressed formats it could handle. I tend to benchmark against RomVault since it is some of the most full-featured, and I think it only has CHD support (which is better than most since it doesn't need chdman).

It's highly likely that no one will ever build Node.js libraries to read these file types (and I probably can't take on the work). OS compatibility of the various tools (maxcso, chdman, and the like) will likely be tricky. Maybe a strategy such as https://github.com/develar/7zip-bin would be good, where someone is dedicated to maintaining precompiled versions of the tools.


Separately, something that has held me back from CHD support is knowing how many disk write cycles it will need - I don't think the CHD format stores the checksum of all individual files, so they must be extracted before being calculated. We'll want some checksum caching in igir before doing this.

@emmercm emmercm added the up-for-grabs Issues to consider for external contribution label Feb 20, 2024
@maxexcloo
Copy link
Contributor Author

maxexcloo commented Feb 20, 2024

Would be more than happy to help compile and maintain the tools - have just done this on macOS anyway to handle everything.

For caching, even a sidecar file per supported format with hashes inside in the library could work? Something like ‘Game Name (USA).cso.hash’ - that way a rescan is fast (like a ZIP) but there could always be a flag to force a recalculation if required.

I absolutely love IGIR and this is the only feature it’s really missing for me to clean up a whole heap of other tools (currently have my library split between No-Inteo dumps and Redump dumps…)

@emmercm
Copy link
Owner

emmercm commented Feb 20, 2024

I'm thinking some kind of DB for the cache would work well, either something custom or something like SQLite.

I think the strategy would be to treat all of these compressed formats like archives, and then all archives could get the caching treatment. That would help with formats like 7zip that need temp files to calculate MD5 and SHA1.

I think some kind of option to change the temp directory would be helpful as well, so power users could use a memfs mount to prevent disk wear.


Thank you for the compliments! I also have No-Intro and Redump separated for those same reasons. I've been focused on #818 most recently, and my speed of development has slowed down a bit.

@maxexcloo
Copy link
Contributor Author

maxexcloo commented Feb 20, 2024

Something to note as well is that with disk based formats I have definitely had file hash collisions, I think a lot of games split into tracks tend to have similar sized empty or filler tracks - there would need to be some more advanced logic to match and complete a game there I think.

In the meantime I’ll put together a repo and set up actions to build and release binaries on GitHub releases - generally some of these have been hard to find so it could help?

EDIT: I didn’t realise that you can run IGIR without DAT files!

@emmercm
Copy link
Owner

emmercm commented Feb 20, 2024

I've definitely seen duplicate disc tracks before, I think on Dreamcast. And they're truly duplicate tracks, not just CRC32 collisions. I think people over-index on how small the keyspace is for CRC32. But #818 should help.


Pre-compiled binaries would very much help! Maybe starting with chdman first. igir is Node.js so publishing it in a way that https://github.com/develar/7zip-bin does would be helpful. https://github.com/onikienko/7zip-min is the 7zip library that igir uses, and it's just a wrapper around 7zip-bin.

@maxexcloo
Copy link
Contributor Author

I did try to work on adding CHD support but stalled - publishing binaries (with the associated guide on how to compile) I can do ☀️

@maxexcloo
Copy link
Contributor Author

maxexcloo commented Feb 21, 2024

I've definitely seen duplicate disc tracks before, I think on Dreamcast. And they're truly duplicate tracks, not just CRC32 collisions. I think people over-index on how small the keyspace is for CRC32. But #818 should help.

Pre-compiled binaries would very much help! Maybe starting with chdman first. igir is Node.js so publishing it in a way that https://github.com/develar/7zip-bin does would be helpful. https://github.com/onikienko/7zip-min is the 7zip library that igir uses, and it's just a wrapper around 7zip-bin.

Just following up on this - is it possible to use native binaries before using precompiled? Are there any platforms you’d like to see in particular? I’ve found Linux and Windows are fairly easy to do with GitHub actions - Mac is also possible but a tad more tricky, will continue to work on this today though!

For example oxyromon doesn’t actually install the binaries, it just expects them on the path and if they don’t exist it just doesn’t support those features (I think it’s a compile time flag but that’s more a factor of it being written in rust I think…)

@maxexcloo
Copy link
Contributor Author

chdman and maxcso are fairly easy to compile and package I think - these would provide CHD and CSO (for PSP) support.

dolphin-tools (for RVZ, used in a lot of GameCube and Wii games) is a massive PITA, it takes my new MacBook Pro 20 mins to compile as it requires a ton of build tools and I haven’t worked out how to skip the rest of Dolphin 🐬

@maxexcloo
Copy link
Contributor Author

maxexcloo commented Feb 22, 2024

Giving it a go here with ctrtool and maxcso to start, will have a look at chdman tomorrow :)

https://github.com/maxexcloo/binaries

Have been making sure it's all reproducible and able to be run via GitHub Actions for safety.

@unexpectedpanda
Copy link

Another potential path: the development is slow as it really is trying to be everything, but keep an eye on NKit 2 as an all-in-one image conversion tool. It's even extended the CHD format to ensure round-trip conversions for Dreamcast (and no doubt more things will evolve there over time).

The plan is eventually to release the source, but who knows when that'll be.

Binaries are released in the Discord server.

@emmercm
Copy link
Owner

emmercm commented Mar 3, 2024

is it possible to use native binaries before using precompiled? ... For example oxyromon doesn’t actually install the binaries, it just expects them on the path

It's definitely possible and might be a good stepping stone. I likely won't be able to dedicate time to it for a while, though.

dolphin-tools (for RVZ, used in a lot of GameCube and Wii games) is a massive PITA

Yeah, that sounds about right 🙂 but unfortunate.

@emmercm
Copy link
Owner

emmercm commented Mar 3, 2024

I'm working on automated chdman builds, but it's quickly burning up my GitHub build minutes.

@maxexcloo
Copy link
Contributor Author

I'm working on automated chdman builds, but it's quickly burning up my GitHub build minutes.

Ah I nearly had this done, you can check my repo actions for a 80% done version if you’d like: public repos have unlimited minutes btw! I’ll have a look into this soon, just on holidays at the moment!

@maxexcloo
Copy link
Contributor Author

https://github.com/maxexcloo/binaries/blob/main/.github/workflows/chdman.yml

@emmercm
Copy link
Owner

emmercm commented Mar 4, 2024

public repos have unlimited minutes btw!

Ah, that's my problem, I was using a private repo to hide my trial and error away.

I'm going to end up using https://github.com/emmercm/chdman-js for both the binaries and the Node.js bindings. I finished my CI for the chdman build yesterday, but I'm having a curious problem where my built 0263 isn't creating CHDs correctly (it can't parse the CHDs that it created).

Something that's going to be a pain is parsing the single output .bin into multiple track .bins. https://github.com/putnam/binmerge can do this and the code doesn't look too bad, but it's going to be a whole project on its own.

@maxexcloo
Copy link
Contributor Author

There’s been some changes here I think: mamedev/mame#12087

@emmercm
Copy link
Owner

emmercm commented Mar 9, 2024

Cross-building chdman has been an absolute nightmare... https://github.com/emmercm/mame-build/actions.

@maxexcloo
Copy link
Contributor Author

Sorry I’ve been absent on this - have been overseas but I will take a look when I’m back if you’d like a second set of eyes!

@emmercm
Copy link
Owner

emmercm commented Mar 10, 2024

Zero apology is needed, enjoy your holiday! I may give up on non-x64 for now and return to it later.

@maxexcloo
Copy link
Contributor Author

Even just x64 would be sweet - the extra architectures can always be added later right :)

@emmercm
Copy link
Owner

emmercm commented Mar 10, 2024

I was able to figure it out last night, we're in business: https://www.npmjs.com/package/chdman

@emmercm
Copy link
Owner

emmercm commented Mar 21, 2024

@maxexcloo if you were looking for a new pet project, I wonder if chdman can be compiled into WebAssembly based on this section: https://docs.mamedev.org/initialsetup/compilingmame.html#emscripten-javascript-and-html. It sounds like MAME can be, but it's less clear about the tools. That would completely obviate the different architecture needs.

@emmercm emmercm self-assigned this Jul 17, 2024
@Djabal
Copy link

Djabal commented Jul 30, 2024

I discovered the MAME Redump project recently : it aim to convert the redump sets into formats low in disk space (chd et rvz) : https://github.com/MetalSlug/MAMERedump
I'm using it from last week to check my chd and rvz files.
For the chd files it uses the sha1 store in the header (uncompressed combined raw+meta) : chdman info -i rom.chd. So it's pretty quick. And the sha1 is identical between two different chd if the raw files are identical.

  • chd dat don't work with igir out the box (what? I tried !)
  • igir have a funny bug with chd dat !

igir report --input roms/Sega\ -\ Dreamcast/ --dat MAMERedump-0.266\ (2024-07)/MAME\ Redump/Sega\ -\ Dreamcast\ (2084).dat
[...]
✓ Scanning for DATs ·········· | 1 DAT found
✓ Scanning for ROMs ·········· | 33 files found
✓ Sega - Dreamcast ··········· | 2 084/2 084 games, 1 754/1 754 retail releases found
✓ Generating report ·········· | /media/morgan/679E-DEED/retro_games/report_roms.csv

For the rvz files it uses the sha1 of the file rvz... I don't get how a format like rvz can ensure you too have similar sha1 between two rvz which have similar raw files as it propose compression options.

  • rvz dat (gamecube, wii) work with igir out of the box !

Copy link

🔒 Inactive issue lock

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Comment generated by the GitHub Lock Issues workflow.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 30, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request up-for-grabs Issues to consider for external contribution
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants