Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Zip64 support to Zip::Writer #7103

Open
julik opened this issue Nov 22, 2018 · 16 comments · May be fixed by #11396
Open

Adding Zip64 support to Zip::Writer #7103

julik opened this issue Nov 22, 2018 · 16 comments · May be fixed by #11396

Comments

@julik
Copy link

julik commented Nov 22, 2018

We have ported our zip_tricks library to Crystal at least in as far as writing ZIPs is concerned. What we needed was Zip64 support and the ability to manually "simulate" writes for keeping offsets. The low-level implementation differs slightly from Crystal's ZIP module, but the high-level API is very, very similar (a streaming Writer object that handles any IO within a block). The low-level difference is that there is a single Writer object that handles low-level byte packing and writing, and that is the object that handles the tricky logic with switching Zip64 on and off, setting the correct masked permission bits on things and so forth. The shard is in https://github.com/WeTransfer/cr_zip_tricks

So, in light of #5215 etc would it be something that belongs in the Crystal stdlib? I could potentially port the writing to the standard library, but I would prefer to then remove the "self writing" feature in Entry so that the serializer we already have can be reused and stay in one module. The Entry object then gets a much smaller API surface as well. @asterite would love to know your thoughts on this

@asterite
Copy link
Member

Nice!

If the performance is similar or faster and it provides more features, I can't see why we wouldn't want to replace the existing implementation with that one.

@RX14
Copy link
Contributor

RX14 commented Nov 22, 2018

I'd suggest moving the current zip library out of the stdlib to a repo on the crystal-lang org and adding new features and performing refactors there. I'd like to keep the API exactly compatible if we can.

Moving logic out of Entry seems like a good idea.

@julik
Copy link
Author

julik commented Nov 22, 2018

So what would be the preferred course of action? Should I make a repo with only the ZIP library from stdlib and change/add/remove things there, and then it becomes a zip shard that replaces the stdlib one?

@Sija
Copy link
Contributor

Sija commented Nov 22, 2018

As long as ZIP module is in stdlib I'd suggest contributing there and extracting it to separate shard once final decision for stdlib "diet" has been made. No need to fragment the ecosystem before anything is decided (if at all).

@j8r
Copy link
Contributor

j8r commented Nov 22, 2018

Great idea @RX14 . @julik could have more rights on the repo, therefore freeing time for core maintainers. To sum-up:

  • @julik , you can prepare you shard like it will be directly integrated to the stdlib (in the src directory) – docs, deleting version.cr, renaming...
  • then the library could be moved to the crystal-lang organisation, with you as a maintainer
  • finally, integrate the lib to src/zip, somehow

I strongly think this type of splits worth it, PR are stacking and depends only on core members for merging.

People mastering specific fields like crypto, system, http etc got slowed down. Not really motivating to actively contribute if our work become staled

@Sija
Copy link
Contributor

Sija commented Nov 23, 2018

@j8r It seems too early for "summing-up" since it's up to the core team members to decide and this is still unresolved. Personally I'm against fragmentation stdlib like that - ZIP is sth quite widely used which doesn't changes very often and I see no good reason to keep it out of stdlib.

@j8r
Copy link
Contributor

j8r commented Nov 23, 2018

@Sija It will still be in the stdlib, this isn't exclusive. Look at submodules in https://github.com/rust-lang/rust/tree/master/src and https://github.com/rust-lang/rust/blob/master/Cargo.lock . This approach is clearly possible, the only remaining thing is a go from core members.

Personally, I don't think Crystal has the ressource, i.e core members, to support properly the whole stdlib only by themselves. Especially since there are people who are willing to help them.

@j8r
Copy link
Contributor

j8r commented Nov 23, 2018

I may be wrong, they probably can support the whole language without delegating work to additional maintainers... For sure, that's up to them to see.

@ysbaddaden
Copy link
Contributor

ysbaddaden commented Nov 23, 2018

@julik the best course would be to create a zip shard that merges both implementations —shards take precedence over stdlib, so require "zip" will load lib/zip/src/zip.cr instead of crystal/src/zip.cr.

Anybody will be able to just use it, without waiting for core team reviews, an eventual merge, wait again for the next crystal release, and so on.

We can decide later whether we merge it back, distribute the shard with the stdlib (like ruby does for some gems), or drop it from stdlib and keep it as a shard only.

@RX14
Copy link
Contributor

RX14 commented Nov 23, 2018

@ysbaddaden that sounds error prone. What's the benefit of that for @julik over using the shard he already wrote then? If all the changes to stdlib zip are made in a shard its less likely the changes will be merged since there's no core team member reviewing the changes.

Anyway, http depends on zlib so we might as well keep zip in the stdlib. Let's take PRs.

@j8r
Copy link
Contributor

j8r commented Nov 23, 2018

Do we really want that any change in the stdlib require a review of a core member? Wouldn't be better to have teams responsible to different part of the language, instead of one for the whole? The core member will be able to focus on the compiler itself.

That's already more or less the case without knowing it: PCRE, OpenSSL, bdwgc are done externally, why not HTTP, Zip or Zlib then?

@asterite
Copy link
Member

Do we really want that any change in the stdlib require a review of a core member?

@j8r Yes, Crystal is the language and the standard library. The code in the standard library has to have a base level of quality, readablity and test coverage. If the ones that contribute to the standard library disappear and the code is a mess, nobody will be able to maintain it.

That's why I'm more and more inclined to split out things outside of the standard library, because it frees the core developers from this task. Also, not all core developers know everything (for example I implemented the current zip implementation and I had to learn it first, but it would have been faster if someone else knew zip beforehand and implemented it as a shard, as needed).

@j8r
Copy link
Contributor

j8r commented Nov 23, 2018

I would rather mean, having shards developing in parallel, and then asking to integrate X version in the stdlib. A bit like what it's done with LLVM when supporting a new version, or OpenSSL.

Core member, maintainer or an other name, there are likely people out there capable of doing a great job in the stdlib without knowing Crystal's compiler internals.

At the end, it's all about trust: do we trust people X to do a quality Y shard for the stdlib?

@RX14
Copy link
Contributor

RX14 commented Nov 23, 2018

@j8r Not all the current core developers are compiler developers (although they do need a robust understanding of how the compiler is architectured and it's externally visible warts). Removing things for the core team to do does not magically mean the compiler will get developed faster.

Recruitment to the core team has always been about aligning design and review sensibilities. Since the stdlib cannot be changed after 1.0 it means the stdlib needs a lot higher standard of design and review than external shards.

Removing stuff from the stdlib is fine, but it's a balance between making crystal useful for basic tasks without requiring shards and the reduced workload. I argue that removing zip is probably not worth it on balance because the zlib bindings are the complex part and they need to stay in the stdlib for HTTP. If we're keeping those, having a zip library is a nice convenience and not too difficult to maintain: we have to get the zip library right once and the spec is unlikely to ever change so there will be very little maintenance from then on.

@julik
Copy link
Author

julik commented Nov 25, 2018

Ok. I am going to figure out how to test the stdlib parts in isolation (still very new to the language) and then gradually open patches to add Zip64 support to the parts that exist. I might have to rework the writing of the entries as stated previously but that is IMO a good compromise - and won't change the API surface. The only concern I have is that the API allows passing extra: manually which is an arbitrary bytes slice. That works but it then needs to be combined with other extra fields that need to be added - like the precise timestamp extra and the Zip64 extra. I tis instrumental that the Zip64 extra comes first. So I will probably allow postfixing the extra fields we generate with the manually added extra if it is provided.

I will keep cr_zip_tricks for the time being though since it was purpose-made and we need it to do stuff.

@RX14
Copy link
Contributor

RX14 commented Nov 25, 2018

@julik you'll want to check out the crystal repo, use make deps then bin/crystal spec spec/std/zip/zip_spec.cr to run the zip tests.

I'm fine with small breaking changes to make stuff like extra nicer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants