-
-
Notifications
You must be signed in to change notification settings - Fork 390
Proposal: Rewrite the entire package. Design doc discussion #90
Comments
Thinking we should do the following to focus our efforts in the re-design and channel the communities help towards this?
thoughts? |
@johnarok Yes, that all sounds good -- let's draw attention to it and triage old issues that will be handled by the redesign. As soon as I get a chance to get https://relicabackup.com off the ground, I want to turn my attention (at least on weekends) to this, since we use this package extensively when doing releases. |
I spent some hours on a detailed API design today that addresses all my concerns, with lots of room for growth as well. Everything here is a very rough draft, subject to change and input, and I'm probably overlooking many implementation/technical difficulties that could cause this design draft to change... I started by pouring over the open and closed issues and PRs and assembled a wishlist of functionalities:
I believe all these requirements are met by the following type definitions: type Archiver interface {
Archive(sources []string, destination string) error
}
type Unarchiver interface {
Unarchive(source, destination string) error
}
type Writer interface {
Create(out io.Writer) error
Write(f File) error
Close() error
}
type Reader interface {
Open(in io.Reader) error
Read() (File, error)
Close() error
}
type Extractor interface {
Extract(targets []string, source, destination string) error
}
type Walker interface {
Walk(archive string, walkFn WalkFunc) error
}
type WalkFunc func(f File, err error) error
type File struct {
// The exact fields we choose could be whatever;
// these inspired by os.FileInfo
Name string
Size int64
Mode os.FileMode
ModTime time.Time
IsDir bool
// The original header info; depends on
// type of archive -- could be nil, too.
Header interface{}
// Allow the file contents to be read as an io.Reader.
io.Reader
}
type FileCompressor interface {
CompressFile(source, destination string) error
}
type FileDecompressor interface {
DecompressFile(source, destination string) error
}
type Compressor interface {
Compress(in io.Reader, out io.Writer) error
}
type Decompressor interface {
Decompress(in io.Reader, out io.Writer) error
} Then a type Zip struct {
CompressionLevel int
OverwriteExisting bool
SelectiveCompression bool
FollowSymlinks bool
ExtractToFolder bool
w io.Writer // explained later
r io.Reader // explained later
} It could then implement the methods that this archive format is reasonably capable of. Not all formats need to implement all interfaces, that's the beauty of this design. And for the formats that support it, this design also has full support for dealing with streams efficiently. Reader and Writer allow you to treat the archive kind of like a stream, but it's a little awkward since it's more like a z := &archive.Zip{
CompressionLevel: 10,
SelectiveCompression: true,
}
err := z.Create(w)
if err != nil {
return err
}
defer f.Close()
for _, f := range openFiles {
err := z.Write(archive.File{
Name: f.Name(),
// ...
Reader: f,
})
if err != nil {
return err
}
} We should only allow a Other than that, I'm quite happy with this design so far -- assuming it works -- and will get started implementing it soon. Feedback welcome |
I've begun work in the |
UPDATE: Done in pull request #99
Having been using this package for a few years now, I've encountered a number of issues that lead me to want to redesign this package entirely: burn it down and start over, copying only the fundamental parts of the code, and not worrying about backwards compatibility.
Some specific issues I've experienced:
Too much magic. Recently I spent a day debugging an problem where a .zip file couldn't be extracted every time with archiver. Sometimes it would work, sometimes it wouldn't. I eventually discovered that it's because archiver determines which extractor to use based on the extension and the file header while iterating through a map of formats (which is not ordered). If the Zip format came first, it matched by extension but failed to extract; if the TarGz format came first, it matched by file header (because it was actually a tar.gz file), and extraction succeeded.
Weak API. Apparently I was able to accidentally create a .tar.gz file with the Zip archiver, because the name I built for the file was not attached to which archiver format I was using. I can do
archiver.TarGz.Make("file.zip")
without errors, which is bad. Here's the code that led to my bug in the first place (notice the missing.
in"zip"
):^ Bad package design.
Not enough customizability. Namely: compression level; whether to include a top-level named folder vs. just its contents), similar to how rsync works based on presence of a trailing slash; and whether to overwrite existing files when outputting.
Lack of native streaming capabilities. Recently there were From a library perspective, I should be able to stream in a zip file and spit out individual files, or stream in individual files (or a list of filenames?) and spit out a single zip file.
There is no true cross-platform native solution to zip-slip (yet). I had to disable the "security feature" that prevented me extracting a perfectly safe archive. Even "SecureJoin" solutions don't cut it (read the linked thread, and its linked threads). For now, these "mitigations" only get in the way.
Not enough power to inspect archives or cherry-pick files. It would be helpful to be able to work with archives' contents without performing an extraction, such as getting listings, or filtering which files are extracted, etc.
General solutions:
When possible (almost always), match only by file header and ignore file extension. If the file contents are not (yet) available, then use extension but only after a warning or explicit opt-in. Or, (maybe "Also,") require that the file extension, where present, matches the format when creating an archive.
Be verbose in the error messages; if doing any magic, report it or make the magic explicitly opt-in either with a configuration parameter or a special high-level API that is documented as being magical, which wraps the underlying, concrete, explicit functions.
Couple the file extension to the archiver. For example: don't allow the Zip archiver to make a .tar.gz file. For example, the buggy code above could have been avoided with something more like this:
archiver.Make(outputFile, files...)
which uses the extension ofoutputFile
to force a format that matches.Expand the API so that an archiver is created for a specific format before being used, rather than having hard-coded globals like archiver.Zip like we do now. This will allow more customization too. Imagine
zipArch := archiver.Zip{CompressionLevel: 10}
or something similar.Be explicit about our threat model, which is being adjusted, to state that the files are expected to be trusted, i.e. don't download files you don't trust. Maybe it is possible to inspect a file before extracting it to know whether it could be malicious (e.g. look for zip-slip patterns in file names), but I am not sure about that yet.
Moar interfaces. We have one, Archiver, but we might need more, to accommodate an expanded design with more features. Small interfaces are the best.
Rename the package to(Decided to keep it the same)archive
.This issue is to track the discussion about the new design; work will hopefully begin soon, as I can find the time.
The text was updated successfully, but these errors were encountered: