Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove the cstruct dependency #137

Merged
merged 19 commits into from
Feb 2, 2024
Merged

remove the cstruct dependency #137

merged 19 commits into from
Feb 2, 2024

Conversation

hannesm
Copy link
Member

@hannesm hannesm commented Jan 10, 2024

I started to remove Cstruct.t from the interface.

/cc @reynir and @kit-ty-kate

also cc @dinosaure since I need some help with the tar_gz code -- when I read decompress gz.mli, I think the only input/output is bigstring (i.e. char Bigarray.Array1.t) -- is this right? This means we'll need to come from bytes/string to there!?

I tried to use string whereever possible (to denote "this is immutable"), but figured that our read definition passes in a buffer that is read into --> it needs to be bytes. And for some (to me unclear) reason the write function in OCaml (Stdlib and Lwt) uses bytes. So, I'm no longer sure whether using string is worth it (certainly all the unsafe_to/of_string operations should be avoided).

Any initial thoughts? Also, feel free to push further commits on this branch :)

@dinosaure
Copy link
Member

For decompress, yes the choice was initial made to use bigstring. Again, it raises the question if it's better to use a string or a bigstring regarding to performances and it's a difficult question.

About bytes and string, I'm in favor to use string for write (even if Unix.write uses bytes) as I did for miou to avoid some surprises (about shared buffers).

lib/tar.ml Outdated Show resolved Hide resolved
lib/tar.ml Outdated Show resolved Hide resolved
unix/tar_lwt_unix.ml Outdated Show resolved Hide resolved
unix/tar_unix.ml Outdated Show resolved Hide resolved
Then we don't need to do unsafe conversion to bytes.
@kit-ty-kate
Copy link
Contributor

@dinosaure couldn’t we add functions taking bytes/string instead of bigstring in decompress? I feel like in most cases that do not involve using data from another C binding or other kind of buffer trickery, users of the library have to do the conversion from either string or bytes anyway so the performance gained in the C code is offset by the constant conversion in the OCaml code. Furthermore every user of decompress need to do said conversion function or use another dependency (e.g. cstruct) which adds unwanted overhead

@dinosaure
Copy link
Member

Unfortunately, this is a major break in the API that cannot be taken lightly. I don't want to sound too conservative but, with @hannesm and @reynir, we've already floated the idea of re-implementing camlzip in a more minimal way for the purposes of OPAM - especially in view of the decompress dependencies.

@kit-ty-kate
Copy link
Contributor

Unfortunately, this is a major break in the API that cannot be taken lightly.

I don’t think it is. I’m not asking to replace the function, mairly to add the functions to make it more user-friendly for people who will (and most will in my opinion) use string / bytes anyway, and would have used a custom conversion function since none are provided.

lib/tar.ml Show resolved Hide resolved
lib/tar.ml Outdated Show resolved Hide resolved
…n account this detail into our unmarshal_string implementation
from the last position - fix the global_Extended_headers_test
Copy link
Member

@dinosaure dinosaure left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bytes‧make len '\000' is better (and probably correct) than Bytes‧create.

@dinosaure
Copy link
Member

The support of Tar_gz is done and I fixed few issues which appeared in this transition. However, I still have some issues when I try to run tests with block device. Unfortunately, I don't have enough knowledge (yet?) to fix them (@reynir?). At least, it seems that the cram test (which tests our otar tool and Tar_gz) works perfectly now.

kit-ty-kate added a commit to kit-ty-kate/ocaml-tar-playground that referenced this pull request Jan 16, 2024
@kit-ty-kate
Copy link
Contributor

I've tested this PR using my PoC extractor and it looks like it's working perfectly. Performances seem to be the same with this example (if anything ~1% faster but it's in the error margin) and the code is 25% smaller

Here is the diff for anyone curious: kit-ty-kate/ocaml-tar-playground@da2f186

bin/otar.ml Outdated Show resolved Hide resolved
lib/tar_gz.ml Outdated Show resolved Hide resolved
lib/tar_gz.ml Outdated Show resolved Hide resolved
lib/tar_gz.mli Outdated Show resolved Hide resolved
kit-ty-kate added a commit to kit-ty-kate/ocaml-tar-playground that referenced this pull request Jan 16, 2024
bin/otar.ml Outdated Show resolved Hide resolved
@kit-ty-kate
Copy link
Contributor

I opened a PR against your branch here finishing up the work in hannesm#1

All the testsuite now work, there was a problem in Tar_mirage due to an issue with offsets but I managed to track it down after half a day of debugging and fix it.

@kit-ty-kate
Copy link
Contributor

Just a tiny technicality question: do you think we can expect this PR to be available as part of an ocaml-tar 3.0.0 release or more like 2.7.0? (I would personally think 3.0.0 is more appropriate given the changes in the API)

I'm only asking because i'm currently on ocaml/opam#5648 which relies on this PR for the release of opam 2.2.0

@hannesm
Copy link
Member Author

hannesm commented Feb 1, 2024

3.0 it is. I'll soon (this or next week) again review the changes here, optionally push fixes, and merge.

I expect tar 3.0 to land before mid February (the outstanding item is a nice API for foliding over an archive).

@hannesm hannesm changed the title WIP: remove the cstruct dependency remove the cstruct dependency Feb 1, 2024
@hannesm
Copy link
Member Author

hannesm commented Feb 1, 2024

Thanks for your commits @reynir @dinosaure @kit-ty-kate. I reviewed the entire PR, reverted some tar-mirage changes (now there's only a really small difference in these modules).

Since the test suite is also passing, I'd be keen to merge this PR.

@hannesm
Copy link
Member Author

hannesm commented Feb 1, 2024

My plan is to squash & merge -- I merged the main branch here to have OCaml-CI run with older compilers.

@hannesm
Copy link
Member Author

hannesm commented Feb 1, 2024

initial performance measurement (anyone into scientific benchmarking, please have a go yourself if you like):

downloading index.tar.gz from opam.ocaml.org, and executing otar.exe list index.tar.gz > /dev/null

main branch: 1.016u 0.031s 0:01.04 100.0% 915+767k 0+0io 27pf+0w
this branch: 0.811u 0.039s 0:00.85 98.8% 889+735k 0+0io 26pf+0w

EDIT: running otar multiple times smoothes the results: main branch uses ~0.83s and 920k memory; this branch ~0.75s and 885k memory.

@hannesm
Copy link
Member Author

hannesm commented Feb 1, 2024

since we use String.get_int32_ne, this requires OCaml 4.13+. IMHO that is fine, but we could recover with some "external" definition compatibility with earlier OCaml releases. I don't think it is worth the effort, though. Any other opinion?

@kit-ty-kate
Copy link
Contributor

It could be done without externals using:

(* TODO: use String.get_int32_ne when ocaml-tar requires OCaml >= 4.13 *)
Bytes.get_int32_ne (Bytes.unsafe_of_string str)

It's your choice to use this or not, but it is to note as this is going to be used in opam we have to add our own patch internally if this stays as-is and it makes parts of the opam packages not installable with OCaml < 4.13

@kit-ty-kate
Copy link
Contributor

it makes parts of the opam packages not installable with OCaml < 4.13

in particular it would make opam-publish not installable for people using a system switch on older systems, which actually seems like an issue to me

@hannesm
Copy link
Member Author

hannesm commented Feb 1, 2024

@kit-ty-kate here we go, compatible with 4.08+.

@hannesm hannesm merged commit 9fdff04 into mirage:main Feb 2, 2024
1 of 2 checks passed
@hannesm hannesm deleted the no-cstruct branch February 2, 2024 12:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants