diff --git a/README.md b/README.md index f2d94f00..7a63fe16 100644 --- a/README.md +++ b/README.md @@ -1,28 +1,58 @@ archiver [![archiver GoDoc](https://img.shields.io/badge/reference-godoc-blue.svg?style=flat-square)](https://godoc.org/github.com/mholt/archiver) [![Linux Build Status](https://img.shields.io/travis/mholt/archiver.svg?style=flat-square&label=linux+build)](https://travis-ci.org/mholt/archiver) [![Windows Build Status](https://img.shields.io/appveyor/ci/mholt/archiver.svg?style=flat-square&label=windows+build)](https://ci.appveyor.com/project/mholt/archiver) ======== -Package archiver makes it trivially easy to make and extract common archive formats such as .zip, and .tar.gz. Simply name the input and output file(s). +Introducing **Archiver 3.0** - a cross-platform, multi-format archive utility and Go library. A powerful and flexible library meets an elegant CLI in this generic replacement for several of platform-specific, format-specific archive utilities. + +## Features + +Package archiver makes it trivially easy to make and extract common archive formats such as zip and tarball (and its compressed variants). Simply name the input and output file(s). The `arc` command runs the same on all platforms and has no external dependencies (not even libc). It is powered by the Go standard library and several third-party, pure-Go libraries. Files are put into the root of the archive; directories are recursively added, preserving structure. -The `archiver` command runs the same cross-platform and has no external dependencies (not even libc); powered by the Go standard library, [dsnet/compress](https://github.com/dsnet/compress), [nwaples/rardecode](https://github.com/nwaples/rardecode), and [ulikunitz/xz](https://github.com/ulikunitz/xz). Enjoy! +- Make whole archives from a list of files +- Open whole archives to a folder +- Extract specific files/folders from archives +- Stream files in and out of archives without needing actual files on disk +- Traverse archive contents without loading them +- Compress files +- Decompress files +- Streaming compression and decompression +- Several archive and compression formats supported + +### Format-dependent features -Supported formats/extensions: +- Optionally create a top-level folder to avoid littering a directory or archive root with files +- Toggle overwrite existing files +- Adjust compression level +- Zip: store (not compress) already-compressed files +- Make all necessary directories +- Open password-protected RAR archives +- Optionally continue with other files after an error + +### Supported archive formats - .zip - .tar -- .tar.gz & .tgz -- .tar.bz2 & .tbz2 -- .tar.xz & .txz -- .tar.lz4 & .tlz4 -- .tar.sz & .tsz +- .tar.gz or .tgz +- .tar.bz2 or .tbz2 +- .tar.xz or .txz +- .tar.lz4 or .tlz4 +- .tar.sz or .tsz - .rar (open only) +### Supported compression formats + +- bzip2 +- gzip +- lz4 +- snappy +- xz + ## Install ```bash -go get github.com/mholt/archiver/cmd/archiver +go get -u github.com/mholt/archiver/cmd/archiver ``` Or download binaries from the [releases](https://github.com/mholt/archiver/releases) page. @@ -30,70 +60,94 @@ Or download binaries from the [releases](https://github.com/mholt/archiver/relea ## Command Use -Make a new archive: +### Make new archive ```bash -$ archiver make [archive name] [input files...] +# Syntax: arc archive [archive name] [input files...] +$ arc archive test.tar.gz file1.txt images/file2.jpg folder/subfolder ``` (At least one input file is required.) -To extract an archive: +### Extract entire archive ```bash -$ archiver open [archive name] [destination] +# Syntax: arc unarchive [archive name] [destination] +$ arc unarchive test.tar.gz ``` (The destination path is optional; default is current directory.) -The archive name must end with a supported file extension—this is how it knows what kind of archive to make. Run `archiver -h` for more help. +The archive name must end with a supported file extension—this is how it knows what kind of archive to make. Run `arc help` for more help. +### List archive contents -## Library Use - -```go -import "github.com/mholt/archiver" +```bash +# Syntax: arc ls [archive name] +$ arc ls caddy_dist.tar.gz +drwxr-xr-x matt staff 0 2018-09-19 15:47:18 -0600 MDT dist/ +-rw-r--r-- matt staff 6148 2017-08-07 18:34:22 -0600 MDT dist/.DS_Store +-rw-r--r-- matt staff 22481 2018-09-19 15:47:18 -0600 MDT dist/CHANGES.txt +-rw-r--r-- matt staff 17189 2018-09-19 15:47:18 -0600 MDT dist/EULA.txt +-rw-r--r-- matt staff 25261 2016-03-07 16:32:00 -0700 MST dist/LICENSES.txt +-rw-r--r-- matt staff 1017 2018-09-19 15:47:18 -0600 MDT dist/README.txt +-rw-r--r-- matt staff 288 2016-03-21 11:52:38 -0600 MDT dist/gitcookie.sh.enc +... ``` -Create a .zip file: +### Extract a specific file or folder from an archive -```go -err := archiver.Zip.Make("output.zip", []string{"file.txt", "folder"}) +```bash +# Syntax: arc extract [archive name] [path in archive] [destination on disk] +$ arc extract test.tar.gz foo/hello.txt extracted/hello.txt ``` -Extract a .zip file: +### Compress a single file -```go -err := archiver.Zip.Open("input.zip", "output_folder") +```bash +# Syntax: arc compress [input file] [output file] +$ arc compress test.txt compressed_test.txt.gz +$ arc compress test.txt gz ``` -Working with other file formats is exactly the same, but with [their own Archiver implementations](https://godoc.org/github.com/mholt/archiver#Archiver). +For convenience, if the output file is simply a compression format (without leading dot), the output file name will be the same as the input name but with the format extension appended, and the input file will be deleted if successful. +### Decompress a single file +```bash +# Syntax: arc decompress [input file] [output file] +$ arc decompress test.txt.gz original_test.txt +$ arc decompress test.txt.gz +``` -## FAQ +For convenience, if the output file is not specified, it will have the same name as the input, but with the compression extension stripped from the end, and the input file will be deleted if successful. -#### Can I list a file in one folder to go into a different folder in the archive? +### Flags -No. This works just like your OS would make an archive in the file explorer: organize your input files to mirror the structure you want in the archive. +Flags are specified before the subcommand. Use `arc help` or `arc -h` to get usage help and a description of flags with their default values. +## Library Use + +```go +import "github.com/mholt/archiver" +``` -#### Can it add files to an existing archive? +The archiver package allows you to easily create and open archives, walk their contents, extract specific files, compress and decompress files, and even stream archives in and out using pure io.Reader and io.Writer interfaces, without ever needing to touch the disk. See [package godoc documentation](https://godoc.org/github.com/mholt/archiver) to learn how to do this -- it's really slick! -Nope. This is a simple tool; it just makes new archives or extracts existing ones. +**Security note: This package does NOT attempt to mitigate zip-slip attacks.** It is [extremely difficult](https://github.com/rubyzip/rubyzip/pull/376) [to do properly](https://github.com/mholt/archiver/pull/65#issuecomment-395988244) and [seemingly impossible to mitigate effectively across platforms](https://github.com/golang/go/issues/20126). [Attempted fixes have broken processing of legitimate files in production](https://github.com/mholt/archiver/pull/70#issuecomment-423267320), rendering the program unusable. Our recommendation instead is to inspect the contents of an untrusted archive before extracting it (this package provides `Walkers`) and decide if you want to proceed with extraction. ## Project Values This project has a few principle-based goals that guide its development: -- **Do one thing really well.** That is creating and opening archive files. It is not meant to be a replacement for specific archive format tools like tar, zip, etc. that have lots of features and customizability. (Some customizability is OK, but not to the extent that it becomes complicated or error-prone.) +- **Do our thing really well.** Our thing is creating, opening, inspecting, compressing, and streaming archive files. It is not meant to be a replacement for specific archive format tools like tar, zip, etc. that have lots of features and customizability. (Some customizability is OK, but not to the extent that it becomes overly complicated or error-prone.) - **Have good tests.** Changes should be covered by tests. - **Limit dependencies.** Keep the package lightweight. -- **Pure Go.** This means no cgo or other external/system dependencies. This package should be able to stand on its own and cross-compile easily to any platform. +- **Pure Go.** This means no cgo or other external/system dependencies. This package should be able to stand on its own and cross-compile easily to any platform -- and that includes its library dependencies. - **Idiomatic Go.** Keep interfaces small, variable names semantic, vet shows no errors, the linter is generally quiet, etc. diff --git a/archiver.go b/archiver.go index a0d9b361..68c53d2a 100644 --- a/archiver.go +++ b/archiver.go @@ -3,55 +3,138 @@ package archiver import ( "fmt" "io" - "log" "os" + "path" "path/filepath" "runtime" "strings" ) -// Archiver represent a archive format +// Archiver is a type that can create an archive file +// from a list of source file names. type Archiver interface { - // Match checks supported files - Match(filename string) bool - // Make makes an archive file on disk. - Make(destination string, sources []string) error - // Open extracts an archive file on disk. - Open(source, destination string) error - // Write writes an archive to a Writer. - Write(output io.Writer, sources []string) error - // Read reads an archive from a Reader. - Read(input io.Reader, destination string) error -} - -// SupportedFormats contains all supported archive formats -var SupportedFormats = map[string]Archiver{} - -// RegisterFormat adds a supported archive format -func RegisterFormat(name string, format Archiver) { - if _, ok := SupportedFormats[name]; ok { - log.Printf("Format %s already exists, skip!\n", name) - return + Archive(sources []string, destination string) error +} + +// Unarchiver is a type that can extract archive files +// into a folder. +type Unarchiver interface { + Unarchive(source, destination string) error +} + +// Writer can write discrete byte streams of files to +// an output stream. +type Writer interface { + Create(out io.Writer) error + Write(f File) error + Close() error +} + +// Reader can read discrete byte streams of files from +// an input stream. +type Reader interface { + Open(in io.Reader, size int64) error + Read() (File, error) + Close() error +} + +// Extractor can extract a specific file from a source +// archive to a specific destination folder on disk. +type Extractor interface { + Extract(source, target, destination string) error +} + +// File provides methods for accessing information about +// or contents of a file within an archive. +type File struct { + os.FileInfo + + // The original header info; depends on + // type of archive -- could be nil, too. + Header interface{} + + // Allow the file contents to be read (and closed) + io.ReadCloser +} + +// FileInfo is an os.FileInfo but optionally with +// a custom name, useful if dealing with files that +// are not actual files on disk, or which have a +// different name in an archive than on disk. +type FileInfo struct { + os.FileInfo + CustomName string +} + +// Name returns fi.CustomName if not empty; +// otherwise it returns fi.FileInfo.Name(). +func (fi FileInfo) Name() string { + if fi.CustomName != "" { + return fi.CustomName } - SupportedFormats[name] = format + return fi.FileInfo.Name() } -// MatchingFormat returns the first archive format that matches -// the given file, or nil if there is no match -func MatchingFormat(fpath string) Archiver { - for _, fmt := range SupportedFormats { - if fmt.Match(fpath) { - return fmt - } +// ReadFakeCloser is an io.Reader that has +// a no-op close method to satisfy the +// io.ReadCloser interface. +type ReadFakeCloser struct { + io.Reader +} + +// Close implements io.Closer. +func (rfc ReadFakeCloser) Close() error { return nil } + +// Walker can walk an archive file and return information +// about each item in the archive. +type Walker interface { + Walk(archive string, walkFn WalkFunc) error +} + +// WalkFunc is called at each item visited by Walk. +// If an error is returned, the walk may continue +// if the Walker is configured to continue on error. +// The sole exception is the error value ErrStopWalk, +// which stops the walk without an actual error. +type WalkFunc func(f File) error + +// ErrStopWalk signals Walk to break without error. +var ErrStopWalk = fmt.Errorf("walk stopped") + +// Compressor compresses to out what it reads from in. +// It also ensures a compatible or matching file extension. +type Compressor interface { + Compress(in io.Reader, out io.Writer) error + CheckExt(filename string) error +} + +// Decompressor decompresses to out what it reads from in. +type Decompressor interface { + Decompress(in io.Reader, out io.Writer) error +} + +// Matcher is a type that can return whether the given +// file appears to match the implementation's format. +// Implementations should return the file's read position +// to where it was when the method was called. +type Matcher interface { + Match(*os.File) (bool, error) +} + +func fileExists(name string) bool { + _, err := os.Stat(name) + return !os.IsNotExist(err) +} + +func mkdir(dirPath string) error { + err := os.MkdirAll(dirPath, 0755) + if err != nil { + return fmt.Errorf("%s: making directory: %v", dirPath, err) } return nil } func writeNewFile(fpath string, in io.Reader, fm os.FileMode) error { - if fileExists(fpath) { - return fmt.Errorf("%s: skipping because there exists a file with the same name", fpath) - } - err := os.MkdirAll(filepath.Dir(fpath), 0755) if err != nil { return fmt.Errorf("%s: making directory for file: %v", fpath, err) @@ -103,31 +186,49 @@ func writeNewHardLink(fpath string, target string) error { return nil } -func mkdir(dirPath string) error { - err := os.MkdirAll(dirPath, 0755) +// within returns true if sub is within or equal to parent. +func within(parent, sub string) bool { + rel, err := filepath.Rel(parent, sub) if err != nil { - return fmt.Errorf("%s: making directory: %v", dirPath, err) + return false } - return nil + return !strings.Contains(rel, "..") } -func sanitizeExtractPath(filePath string, destination string) error { - // to avoid zip slip (writing outside of the destination), we resolve - // the target path, and make sure it's nested in the intended - // destination, or bail otherwise. - destpath := filepath.Join(destination, filePath) - if !strings.HasPrefix(destpath, filepath.Clean(destination)) { - return fmt.Errorf("%s: illegal file path", filePath) +// multipleTopLevels returns true if the paths do not +// share a common top-level folder. +func multipleTopLevels(paths []string) bool { + if len(paths) < 2 { + return false } - return nil + var lastTop string + for _, p := range paths { + p = strings.TrimPrefix(strings.Replace(p, `\`, "/", -1), "/") + for { + next := path.Dir(p) + if next == "." { + break + } + p = next + } + if lastTop == "" { + lastTop = p + } + if p != lastTop { + return true + } + } + return false } -// fileExists returns true only if we can successfuly get the file attributes or if the reason -// for failure is the absence of the file. -func fileExists(fpath string) bool { - _, err := os.Stat(fpath) - if err == nil { - return true +// folderNameFromFileName returns a name for a folder +// that is suitable based on the filename, which will +// be stripped of its extensions. +func folderNameFromFileName(filename string) string { + base := filepath.Base(filename) + firstDot := strings.Index(base, ".") + if firstDot > -1 { + return base[:firstDot] } - return !os.IsNotExist(err) + return base } diff --git a/archiver_test.go b/archiver_test.go index 0b09947d..dddb0bf9 100644 --- a/archiver_test.go +++ b/archiver_test.go @@ -2,257 +2,213 @@ package archiver import ( "bytes" + "fmt" + "io" "io/ioutil" "os" "path/filepath" - "strings" "testing" ) -func TestArchiver(t *testing.T) { - for name, ar := range SupportedFormats { - name, ar := name, ar - t.Run(name, func(t *testing.T) { - t.Parallel() - // skip RAR for now - if _, ok := ar.(rarFormat); ok { - t.Skip("not supported") - } - - _, gzOk := ar.(gzFormat) - _, bzip2Ok := ar.(bzip2Format) - if gzOk || bzip2Ok { - testSingleWriteRead(t, name, ar) - testSingleMakeOpen(t, name, ar) - } else { - testWriteRead(t, name, ar) - testMakeOpen(t, name, ar) - testMakeOpenWithDestinationEndingInSlash(t, name, ar) - testMakeOpenNotOverwriteAtDestination(t, name, ar) - } - }) +func TestWithin(t *testing.T) { + for i, tc := range []struct { + path1, path2 string + expect bool + }{ + { + path1: "/foo", + path2: "/foo/bar", + expect: true, + }, + { + path1: "/foo", + path2: "/foobar/asdf", + expect: false, + }, + { + path1: "/foobar/", + path2: "/foobar/asdf", + expect: true, + }, + { + path1: "/foobar/asdf", + path2: "/foobar", + expect: false, + }, + { + path1: "/foobar/asdf", + path2: "/foobar/", + expect: false, + }, + { + path1: "/", + path2: "/asdf", + expect: true, + }, + { + path1: "/asdf", + path2: "/asdf", + expect: true, + }, + { + path1: "/", + path2: "/", + expect: true, + }, + { + path1: "/foo/bar/daa", + path2: "/foo", + expect: false, + }, + { + path1: "/foo/", + path2: "/foo/bar/daa", + expect: true, + }, + } { + actual := within(tc.path1, tc.path2) + if actual != tc.expect { + t.Errorf("Test %d: [%s %s] Expected %t but got %t", i, tc.path1, tc.path2, tc.expect, actual) + } } } -// testSingleWriteRead performs a symmetric test by using ar.Write to generate -// an archive from the test corpus, then using ar.Read to extract the archive -// and comparing the contents to ensure they are equal. -func testSingleWriteRead(t *testing.T, name string, ar Archiver) { - buf := new(bytes.Buffer) - tmp, err := ioutil.TempDir("", "archiver") - if err != nil { - t.Fatalf("[%s] %v", name, err) - } - defer os.RemoveAll(tmp) - - origPath := "testdata/quote1.txt" - newPath := filepath.Join(tmp, "quote1.txt") - - // Test creating archive - err = ar.Write(buf, []string{origPath}) - if err != nil { - t.Fatalf("[%s] writing archive: didn't expect an error, but got: %v", name, err) - } - - // Test extracting archive - err = ar.Read(buf, newPath) - if err != nil { - t.Fatalf("[%s] reading archive: didn't expect an error, but got: %v", name, err) +func TestMultipleTopLevels(t *testing.T) { + for i, tc := range []struct { + set []string + expect bool + }{ + { + set: []string{}, + expect: false, + }, + { + set: []string{"/foo"}, + expect: false, + }, + { + set: []string{"/foo", "/foo/bar"}, + expect: false, + }, + { + set: []string{"/foo", "/bar"}, + expect: true, + }, + { + set: []string{"/foo", "/foobar"}, + expect: true, + }, + { + set: []string{"foo", "foo/bar"}, + expect: false, + }, + { + set: []string{"foo", "/foo/bar"}, + expect: false, + }, + { + set: []string{"../foo", "foo/bar"}, + expect: true, + }, + { + set: []string{`C:\foo\bar`, `C:\foo\bar\zee`}, + expect: false, + }, + { + set: []string{`C:\`, `C:\foo\bar`}, + expect: false, + }, + { + set: []string{`D:\foo`, `E:\foo`}, + expect: true, + }, + { + set: []string{`D:\foo`, `D:\foo\bar`, `C:\foo`}, + expect: true, + }, + { + set: []string{"/foo", "/", "/bar"}, + expect: true, + }, + } { + actual := multipleTopLevels(tc.set) + if actual != tc.expect { + t.Errorf("Test %d: %v: Expected %t but got %t", i, tc.set, tc.expect, actual) + } } - - // Check that what was extracted is what was compressed - checkSameContent(t, name, origPath, newPath) } -// testWriteRead performs a symmetric test by using ar.Write to generate an archive -// from the test corpus, then using ar.Read to extract the archive and comparing -// the contents to ensure they are equal. -func testWriteRead(t *testing.T, name string, ar Archiver) { - buf := new(bytes.Buffer) - tmp, err := ioutil.TempDir("", "archiver") - if err != nil { - t.Fatalf("[%s] %v", name, err) - } - defer os.RemoveAll(tmp) - - // Test creating archive - err = ar.Write(buf, []string{"testdata"}) - if err != nil { - t.Fatalf("[%s] writing archive: didn't expect an error, but got: %v", name, err) - } - - // Test extracting archive - err = ar.Read(buf, tmp) - if err != nil { - t.Fatalf("[%s] reading archive: didn't expect an error, but got: %v", name, err) +func TestArchiveUnarchive(t *testing.T) { + for _, af := range archiveFormats { + au, ok := af.(archiverUnarchiver) + if !ok { + t.Errorf("%s (%T): not an Archiver and Unarchiver", af, af) + continue + } + testArchiveUnarchive(t, au) } - - // Check that what was extracted is what was compressed - symmetricTest(t, name, tmp) } -// testSingleMakeOpen performs a symmetric test by using ar.Make to make an archive -// from the test corpus, then using ar.Open to open the archive and comparing -// the contents to ensure they are equal. -func testSingleMakeOpen(t *testing.T, name string, ar Archiver) { - tmp, err := ioutil.TempDir("", "archiver") - if err != nil { - t.Fatalf("[%s] %v", name, err) - } - defer os.RemoveAll(tmp) - - origPath := "testdata/quote1.txt" - newPath := filepath.Join(tmp, "quote1.txt") - - // Test creating archive - outfile := filepath.Join(tmp, "test-"+name) - err = ar.Make(outfile, []string{origPath}) - if err != nil { - t.Fatalf("[%s] making archive: didn't expect an error, but got: %v", name, err) - } - - if !ar.Match(outfile) { - t.Fatalf("[%s] identifying format should be 'true', but got 'false'", name) - } - - // Test extracting archive - err = ar.Open(outfile, newPath) - if err != nil { - t.Fatalf("[%s] extracting archive [%s -> %s]: didn't expect an error, but got: %v", name, outfile, newPath, err) - } - - // Check that what was extracted is what was compressed - checkSameContent(t, name, origPath, newPath) -} +func testArchiveUnarchive(t *testing.T, au archiverUnarchiver) { + auStr := fmt.Sprintf("%s", au) -// testMakeOpen performs a symmetric test by using ar.Make to make an archive -// from the test corpus, then using ar.Open to open the archive and comparing -// the contents to ensure they are equal. -func testMakeOpen(t *testing.T, name string, ar Archiver) { - tmp, err := ioutil.TempDir("", "archiver") + tmp, err := ioutil.TempDir("", "archiver_test") if err != nil { - t.Fatalf("[%s] %v", name, err) + t.Fatalf("[%s] %v", auStr, err) } defer os.RemoveAll(tmp) // Test creating archive - outfile := filepath.Join(tmp, "test-"+name) - err = ar.Make(outfile, []string{"testdata"}) + outfile := filepath.Join(tmp, "archiver_test."+auStr) + err = au.Archive([]string{"testdata"}, outfile) if err != nil { - t.Fatalf("[%s] making archive: didn't expect an error, but got: %v", name, err) + t.Fatalf("[%s] making archive: didn't expect an error, but got: %v", auStr, err) } - if !ar.Match(outfile) { - t.Fatalf("[%s] identifying format should be 'true', but got 'false'", name) - } + // Test format matching (TODO: Make this its own test, out of band with the archive/unarchive tests) + //testMatching(t, au, outfile) // TODO: Disabled until we can finish implementing this for compressed tar formats // Test extracting archive - dest := filepath.Join(tmp, "extraction_test") + dest := filepath.Join(tmp, "extraction_test_"+auStr) os.Mkdir(dest, 0755) - err = ar.Open(outfile, dest) + err = au.Unarchive(outfile, dest) if err != nil { - t.Fatalf("[%s] extracting archive [%s -> %s]: didn't expect an error, but got: %v", name, outfile, dest, err) + t.Fatalf("[%s] extracting archive [%s -> %s]: didn't expect an error, but got: %v", auStr, outfile, dest, err) } // Check that what was extracted is what was compressed - symmetricTest(t, name, dest) + symmetricTest(t, auStr, dest) } -// testMakeOpenWithDestinationEndingInSlash is similar to testMakeOpen except that -// it tests the case where destination path has a terminating forward slash especially -// on Windows os. -func testMakeOpenWithDestinationEndingInSlash(t *testing.T, name string, ar Archiver) { - tmp, err := ioutil.TempDir("", "archiver") - if err != nil { - t.Fatalf("[%s] %v", name, err) +// testMatching tests that au can match the format of archiveFile. +func testMatching(t *testing.T, au archiverUnarchiver, archiveFile string) { + m, ok := au.(Matcher) + if !ok { + t.Logf("[NOTICE] %T (%s) is not a Matcher", au, au) + return } - defer os.RemoveAll(tmp) - // Test creating archive - outfile := filepath.Join(tmp, "test-"+name) - err = ar.Make(outfile, []string{"testdata"}) + file, err := os.Open(archiveFile) if err != nil { - t.Fatalf("[%s] making archive: didn't expect an error, but got: %v", name, err) + t.Fatalf("[%s] opening file for matching: %v", au, err) } + defer file.Close() - if !ar.Match(outfile) { - t.Fatalf("[%s] identifying format should be 'true', but got 'false'", name) - } + tmpBuf := make([]byte, 2048) + io.ReadFull(file, tmpBuf) - // Test extracting archive with destination that has a slash at the end - dest := filepath.Join(tmp, "extraction_test") - os.Mkdir(dest, 0755) - err = ar.Open(outfile, dest+"/") + matched, err := m.Match(file) if err != nil { - t.Fatalf("[%s] extracting archive [%s -> %s]: didn't expect an error, but got: %v", name, outfile, dest, err) + t.Fatalf("%s (%T): testing matching: got error, expected none: %v", m, m, err) } - - // Check that what was extracted is what was compressed - symmetricTest(t, name, dest) -} - -// testMakeOpenNotOverwriteAtDestination performs a test to ensure we do not overwrite existing files -// when extracting at a destination that has existing files named as those in the archive. -func testMakeOpenNotOverwriteAtDestination(t *testing.T, name string, ar Archiver) { - tmp, err := ioutil.TempDir("", "archiver") - if err != nil { - t.Fatalf("[%s] %v", name, err) - } - defer os.RemoveAll(tmp) - - mockOverWriteDirectory := filepath.Join(tmp, "testdatamock") - mockOverWriteFile := filepath.Join(tmp, "testdatamock", "shouldnotoverwrite.txt") - - // Prepare a mock directory with files for testing - if err := os.Mkdir(mockOverWriteDirectory, 0774); err != nil { - t.Fatalf("[%s] preping mock directory: didn't expect an error, but got: %v", mockOverWriteDirectory, err) - } - if err := ioutil.WriteFile(filepath.Join(mockOverWriteDirectory, "fileatsourceonly.txt"), []byte("File-From-Source"), 0644); err != nil { - t.Fatalf("[%s] prep file in source: didn't expect an error, but got: %v", name, err) - } - defer os.RemoveAll(mockOverWriteDirectory) - if err := ioutil.WriteFile(mockOverWriteFile, []byte("File-From-Source"), 0644); err != nil { - t.Fatalf("[%s] prep file in source: didn't expect an error, but got: %v", name, err) - } - - // Test creating archive - outfile := filepath.Join(tmp, "test-"+name) - err = ar.Make(outfile, []string{mockOverWriteDirectory}) - if err != nil { - t.Fatalf("[%s] making archive: didn't expect an error, but got: %v", name, err) - } - - if !ar.Match(outfile) { - t.Fatalf("[%s] identifying format should be 'true', but got 'false'", name) - } - - // Introduce a change to the mock file, to track if it would be overwritten by unarchiving. - if err := ioutil.WriteFile(mockOverWriteFile, []byte("File-At-Destination"), 0644); err != nil { - t.Fatalf("[%s] change file in destination: didn't expect an error, but got: %v", name, err) - } - - // Test extracting archive with destination same as original folder - dest := tmp - if err := ar.Open(outfile, dest); err != nil { - if !strings.Contains(err.Error(), "skipping because there exists a file with the same name") { - t.Fatalf("[%s] extracting archive [%s -> %s]: Unexpected error got: %v", name, outfile, dest, err) - } - } - - // Validate if the mock file was changed by the un-archiving process - content, err := ioutil.ReadFile(mockOverWriteFile) - if err != nil { - t.Fatalf("[%s] extracting archive [%s -> %s]: Unable to read file : %v", name, outfile, dest, err) - } - if string(content) != "File-At-Destination" { - t.Fatalf("[%s] extracting archive [%s -> %s]: Unexpected Overwrite of File at Destination %s, %s got %s", name, outfile, dest, mockOverWriteFile, "File-At-Destination", string(content)) - + if !matched { + t.Fatalf("%s (%T): format should have matched, but didn't", m, m) } } // symmetricTest compares the contents of a destination directory to the contents // of the test corpus and tests that they are equal. -func symmetricTest(t *testing.T, name, dest string) { +func symmetricTest(t *testing.T, formatName, dest string) { var expectedFileCount int filepath.Walk("testdata", func(fpath string, info os.FileInfo, err error) error { expectedFileCount++ @@ -270,14 +226,14 @@ func symmetricTest(t *testing.T, name, dest string) { origPath, err := filepath.Rel(dest, fpath) if err != nil { - t.Fatalf("[%s] %s: Error inducing original file path: %v", name, fpath, err) + t.Fatalf("[%s] %s: Error inducing original file path: %v", formatName, fpath, err) } if info.IsDir() { // stat dir instead of read file _, err = os.Stat(origPath) if err != nil { - t.Fatalf("[%s] %s: Couldn't stat original directory (%s): %v", name, + t.Fatalf("[%s] %s: Couldn't stat original directory (%s): %v", formatName, fpath, origPath, err) } return nil @@ -285,99 +241,50 @@ func symmetricTest(t *testing.T, name, dest string) { expectedFileInfo, err := os.Stat(origPath) if err != nil { - t.Fatalf("[%s] %s: Error obtaining original file info: %v", name, fpath, err) + t.Fatalf("[%s] %s: Error obtaining original file info: %v", formatName, fpath, err) } + expected, err := ioutil.ReadFile(origPath) + if err != nil { + t.Fatalf("[%s] %s: Couldn't open original file (%s) from disk: %v", formatName, + fpath, origPath, err) + } + actualFileInfo, err := os.Stat(fpath) if err != nil { - t.Fatalf("[%s] %s: Error obtaining actual file info: %v", name, fpath, err) + t.Fatalf("[%s] %s: Error obtaining actual file info: %v", formatName, fpath, err) + } + actual, err := ioutil.ReadFile(fpath) + if err != nil { + t.Fatalf("[%s] %s: Couldn't open new file from disk: %v", formatName, fpath, err) } + if actualFileInfo.Mode() != expectedFileInfo.Mode() { - t.Fatalf("[%s] %s: File mode differed between on disk and compressed", name, + t.Fatalf("[%s] %s: File mode differed between on disk and compressed", formatName, expectedFileInfo.Mode().String()+" : "+actualFileInfo.Mode().String()) } - - checkSameContent(t, name, origPath, fpath) + if !bytes.Equal(expected, actual) { + t.Fatalf("[%s] %s: File contents differed between on disk and compressed", formatName, origPath) + } return nil }) if got, want := actualFileCount, expectedFileCount; got != want { - t.Fatalf("[%s] Expected %d resulting files, got %d", name, want, got) - } -} - -func checkSameContent(t *testing.T, name, origPath, fpath string) { - expected, err := ioutil.ReadFile(origPath) - if err != nil { - t.Fatalf("[%s] %s: Couldn't open original file (%s) from disk: %v", name, - fpath, origPath, err) - } - actual, err := ioutil.ReadFile(fpath) - if err != nil { - t.Fatalf("[%s] %s: Couldn't open new file from disk: %v", name, fpath, err) - } - if !bytes.Equal(expected, actual) { - t.Fatalf("[%s] %s: File contents differed between on disk and compressed", name, origPath) + t.Fatalf("[%s] Expected %d resulting files, got %d", formatName, want, got) } } -func BenchmarkMake(b *testing.B) { - tmp, err := ioutil.TempDir("", "archiver") - if err != nil { - b.Fatal(err) - } - defer os.RemoveAll(tmp) - - for name, ar := range SupportedFormats { - name, ar := name, ar - b.Run(name, func(b *testing.B) { - // skip RAR for now - if _, ok := ar.(rarFormat); ok { - b.Skip("not supported") - } - outfile := filepath.Join(tmp, "benchMake-"+name) - for i := 0; i < b.N; i++ { - err = ar.Make(outfile, []string{"testdata"}) - if err != nil { - b.Fatalf("making archive: didn't expect an error, but got: %v", err) - } - } - }) - } +var archiveFormats = []interface{}{ + DefaultZip, + DefaultTar, + DefaultTarBz2, + DefaultTarGz, + DefaultTarLz4, + DefaultTarSz, + DefaultTarXz, } -func BenchmarkOpen(b *testing.B) { - tmp, err := ioutil.TempDir("", "archiver") - if err != nil { - b.Fatal(err) - } - defer os.RemoveAll(tmp) - - for name, ar := range SupportedFormats { - name, ar := name, ar - b.Run(name, func(b *testing.B) { - // skip RAR for now - if _, ok := ar.(rarFormat); ok { - b.Skip("not supported") - } - // prepare a archive - outfile := filepath.Join(tmp, "benchMake-"+name) - err = ar.Make(outfile, []string{"testdata"}) - if err != nil { - b.Fatalf("open archive: didn't expect an error, but got: %v", err) - } - // prepare extraction destination - dest := filepath.Join(tmp, "extraction_test") - os.Mkdir(dest, 0755) - - // let's go - b.ResetTimer() - for i := 0; i < b.N; i++ { - err = ar.Open(outfile, dest) - if err != nil { - b.Fatalf("open archive: didn't expect an error, but got: %v", err) - } - } - }) - } +type archiverUnarchiver interface { + Archiver + Unarchiver } diff --git a/bz2.go b/bz2.go index 8a0d630a..5a914bc3 100644 --- a/bz2.go +++ b/bz2.go @@ -3,120 +3,52 @@ package archiver import ( "fmt" "io" - "os" - "strings" + "path/filepath" "github.com/dsnet/compress/bzip2" ) -// Bzip2 is for Bzip2 format -var Bzip2 bzip2Format - -func init() { - RegisterFormat("Bzip2", Bzip2) +// Bz2 facilitates bzip2 compression. +type Bz2 struct { + CompressionLevel int } -type bzip2Format struct{} - -func (bzip2Format) Match(filename string) bool { - return (strings.HasSuffix(strings.ToLower(filename), ".bz2") && - !strings.HasSuffix(strings.ToLower(filename), ".tar.bz2") && - !strings.HasSuffix(strings.ToLower(filename), ".tbz2")) || - (!isTarBz2(filename) && - isBz2(filename)) -} - -// isBz2 checks if the file is a valid bzip2. -func isBz2(bzip2Path string) bool { - f, err := os.Open(bzip2Path) +// Compress reads in, compresses it, and writes it to out. +func (bz *Bz2) Compress(in io.Reader, out io.Writer) error { + w, err := bzip2.NewWriter(out, &bzip2.WriterConfig{ + Level: bz.CompressionLevel, + }) if err != nil { - return false - } - defer f.Close() - - bzip2r, err := bzip2.NewReader(f, nil) - if err != nil { - return false - } - - buf := make([]byte, 16) - if _, err = io.ReadFull(bzip2r, buf); err != nil { - return false + return err } - - return true -} - -// Write outputs to a Writer the bzip2'd contents of the first file listed in -// filePaths. -func (bzip2Format) Write(output io.Writer, filePaths []string) error { - return writeBzip2(filePaths, output, "") + defer w.Close() + _, err = io.Copy(w, in) + return err } -// Make creates a file at bzip2Path containing the bzip2'd contents of the first file -// listed in filePaths. -func (bzip2Format) Make(bzip2Path string, filePaths []string) error { - out, err := os.Create(bzip2Path) +// Decompress reads in, decompresses it, and writes it to out. +func (bz *Bz2) Decompress(in io.Reader, out io.Writer) error { + r, err := bzip2.NewReader(in, nil) if err != nil { - return fmt.Errorf("error creating %s: %v", bzip2Path, err) + return err } - defer out.Close() - - return writeBzip2(filePaths, out, bzip2Path) + defer r.Close() + _, err = io.Copy(out, r) + return err } -func writeBzip2(filePaths []string, output io.Writer, dest string) error { - if len(filePaths) != 1 { - return fmt.Errorf("only one file supported for bzip2") - } - firstFile := filePaths[0] - - fileInfo, err := os.Stat(firstFile) - if err != nil { - return fmt.Errorf("%s: stat: %v", firstFile, err) - } - - if fileInfo.IsDir() { - return fmt.Errorf("%s is a directory", firstFile) - } - - in, err := os.Open(firstFile) - if err != nil { - return fmt.Errorf("error reading %s: %v", firstFile, err) - } - defer in.Close() - - bzip2w, err := bzip2.NewWriter(output, nil) - if err != nil { - return fmt.Errorf("error compressing bzip2: %v", err) - } - defer bzip2w.Close() - - if _, err = io.Copy(bzip2w, in); err != nil { - return fmt.Errorf("error writing bzip2: %v", err) +// CheckExt ensures the file extension matches the format. +func (bz *Bz2) CheckExt(filename string) error { + if filepath.Ext(filename) != ".bz2" { + return fmt.Errorf("filename must have a .bz2 extension") } return nil } -// Read a bzip2'd file from a Reader and decompresses the contents into -// destination. -func (bzip2Format) Read(input io.Reader, destination string) error { - bzip2r, err := bzip2.NewReader(input, nil) - if err != nil { - return fmt.Errorf("error decompressing: %v", err) - } - defer bzip2r.Close() - - return writeNewFile(destination, bzip2r, 0644) -} - -// Open decompresses bzip2'd source into destination. -func (bzip2Format) Open(source, destination string) error { - f, err := os.Open(source) - if err != nil { - return fmt.Errorf("%s: failed to open archive: %v", source, err) - } - defer f.Close() +func (bz *Bz2) String() string { return "bz2" } - return Bzip2.Read(f, destination) -} +// Compile-time checks to ensure type implements desired interfaces. +var ( + _ = Compressor(new(Bz2)) + _ = Decompressor(new(Bz2)) +) diff --git a/cmd/arc/main.go b/cmd/arc/main.go new file mode 100644 index 00000000..e8e32bc6 --- /dev/null +++ b/cmd/arc/main.go @@ -0,0 +1,401 @@ +package main + +import ( + "archive/tar" + "archive/zip" + "bytes" + "compress/flate" + "flag" + "fmt" + "os" + "path/filepath" + "strings" + + "github.com/mholt/archiver" + "github.com/nwaples/rardecode" +) + +var ( + compressionLevel int + overwriteExisting bool + mkdirAll bool + selectiveCompression bool + implicitTopLevelFolder bool + continueOnError bool +) + +func init() { + flag.IntVar(&compressionLevel, "level", flate.DefaultCompression, "Compression level") + flag.BoolVar(&overwriteExisting, "overwrite", false, "Overwrite existing files") + flag.BoolVar(&mkdirAll, "mkdirs", false, "Make all necessary directories") + flag.BoolVar(&selectiveCompression, "smart", true, "Only compress files which are not already compressed (zip only)") + flag.BoolVar(&implicitTopLevelFolder, "folder-safe", true, "If an archive does not have a single top-level folder, create one implicitly") + flag.BoolVar(&continueOnError, "allow-errors", true, "Log errors and continue processing") +} + +func main() { + if len(os.Args) >= 2 && + (os.Args[1] == "-h" || os.Args[1] == "--help" || os.Args[1] == "help") { + fmt.Println(usageString()) + os.Exit(0) + } + if len(os.Args) < 3 { + fatal(usageString()) + } + flag.Parse() + + subcommand := flag.Arg(0) + + // get the format we're working with + iface, err := getFormat(subcommand) + if err != nil { + fatal(err) + } + + // run the desired command + switch subcommand { + case "archive": + a, ok := iface.(archiver.Archiver) + if !ok { + fatalf("the archive command does not support the %s format", iface) + } + err = a.Archive(flag.Args()[2:], flag.Arg(1)) + + case "unarchive": + a, ok := iface.(archiver.Unarchiver) + if !ok { + fatalf("the unarchive command does not support the %s format", iface) + } + err = a.Unarchive(flag.Arg(1), flag.Arg(2)) + + case "extract": + e, ok := iface.(archiver.Extractor) + if !ok { + fatalf("the extract command does not support the %s format", iface) + } + err = e.Extract(flag.Arg(1), flag.Arg(2), flag.Arg(3)) + + case "ls": + w, ok := iface.(archiver.Walker) + if !ok { + fatalf("the ls command does not support the %s format", iface) + } + + var count int + err = w.Walk(flag.Arg(1), func(f archiver.File) error { + count++ + switch h := f.Header.(type) { + case *zip.FileHeader: + fmt.Printf("%s\t%d\t%d\t%s\t%s\n", + f.Mode(), + h.Method, + f.Size(), + f.ModTime(), + h.Name, + ) + case *tar.Header: + fmt.Printf("%s\t%s\t%s\t%d\t%s\t%s\n", + f.Mode(), + h.Uname, + h.Gname, + f.Size(), + f.ModTime(), + h.Name, + ) + + case *rardecode.FileHeader: + fmt.Printf("%s\t%d\t%d\t%s\t%s\n", + f.Mode(), + int(h.HostOS), + f.Size(), + f.ModTime(), + h.Name, + ) + + default: + fmt.Printf("%s\t%d\t%s\t?/%s\n", + f.Mode(), + f.Size(), + f.ModTime(), + f.Name(), // we don't know full path from this + ) + } + return nil + }) + + fmt.Printf("total %d", count) + + case "compress": + c, ok := iface.(archiver.Compressor) + if !ok { + fatalf("the compress command does not support the %s format", iface) + } + fc := archiver.FileCompressor{Compressor: c} + + in := flag.Arg(1) + out := flag.Arg(2) + + var deleteWhenDone bool + if cs, ok := c.(fmt.Stringer); ok && out == cs.String() { + out = in + "." + out + deleteWhenDone = true + } + + err = fc.CompressFile(in, out) + if err == nil && deleteWhenDone { + err = os.Remove(in) + } + + case "decompress": + c, ok := iface.(archiver.Decompressor) + if !ok { + fatalf("the compress command does not support the %s format", iface) + } + fc := archiver.FileCompressor{Decompressor: c} + + in := flag.Arg(1) + out := flag.Arg(2) + + var deleteWhenDone bool + if cs, ok := c.(fmt.Stringer); ok && out == "" { + out = strings.TrimSuffix(in, "."+cs.String()) + deleteWhenDone = true + } + + err = fc.DecompressFile(in, out) + if err == nil && deleteWhenDone { + err = os.Remove(in) + } + + default: + fatalf("unrecognized command: %s", flag.Arg(0)) + } + if err != nil { + fatal(err) + } +} + +func getFormat(subcommand string) (interface{}, error) { + formatPos := 1 + if subcommand == "compress" { + formatPos = 2 + } + + // figure out which file format we're working with + var ext string + archiveName := flag.Arg(formatPos) + for _, format := range supportedFormats { + // match by extension, or, in the case of 'compress', + // check the format without the leading dot; it allows + // a shortcut to specify a format while replacing + // the original file on disk + if strings.HasSuffix(archiveName, format) || + (subcommand == "compress" && + archiveName == strings.TrimPrefix(format, ".")) { + ext = format + break + } + } + + // configure an archiver + var iface interface{} + mytar := &archiver.Tar{ + OverwriteExisting: overwriteExisting, + MkdirAll: mkdirAll, + ImplicitTopLevelFolder: implicitTopLevelFolder, + ContinueOnError: continueOnError, + } + + switch ext { + case ".rar": + iface = &archiver.Rar{ + OverwriteExisting: overwriteExisting, + MkdirAll: mkdirAll, + ImplicitTopLevelFolder: implicitTopLevelFolder, + ContinueOnError: continueOnError, + Password: os.Getenv("ARCHIVE_PASSWORD"), + } + + case ".tar": + iface = mytar + + case ".tbz2": + fallthrough + case ".tar.bz2": + iface = &archiver.TarBz2{ + Tar: mytar, + } + + case ".tgz": + fallthrough + case ".tar.gz": + iface = &archiver.TarGz{ + Tar: mytar, + CompressionLevel: compressionLevel, + } + + case ".tlz4": + fallthrough + case ".tar.lz4": + iface = &archiver.TarLz4{ + Tar: mytar, + CompressionLevel: compressionLevel, + } + + case ".tsz": + fallthrough + case ".tar.sz": + iface = &archiver.TarSz{ + Tar: mytar, + } + + case ".txz": + fallthrough + case ".tar.xz": + iface = &archiver.TarXz{ + Tar: mytar, + } + + case ".zip": + iface = &archiver.Zip{ + CompressionLevel: compressionLevel, + OverwriteExisting: overwriteExisting, + MkdirAll: mkdirAll, + SelectiveCompression: selectiveCompression, + ImplicitTopLevelFolder: implicitTopLevelFolder, + ContinueOnError: continueOnError, + } + + case ".gz": + iface = &archiver.Gz{ + CompressionLevel: compressionLevel, + } + + case ".bz2": + iface = &archiver.Bz2{ + CompressionLevel: compressionLevel, + } + + case ".lz4": + iface = &archiver.Lz4{ + CompressionLevel: compressionLevel, + } + + case ".sz": + iface = &archiver.Snappy{} + + case ".xz": + iface = &archiver.Xz{} + + default: + archiveExt := filepath.Ext(archiveName) + if archiveExt == "" { + return nil, fmt.Errorf("format missing (use file extension to specify archive/compression format)") + } + return nil, fmt.Errorf("unsupported format '%s'", archiveExt) + } + + return iface, nil +} + +func fatal(v ...interface{}) { + fmt.Fprintln(os.Stderr, v...) + os.Exit(1) +} + +func fatalf(s string, v ...interface{}) { + fmt.Fprintf(os.Stderr, s+"\n", v...) + os.Exit(1) +} + +func usageString() string { + buf := new(bytes.Buffer) + buf.WriteString(usage) + flag.CommandLine.SetOutput(buf) + flag.CommandLine.PrintDefaults() + return buf.String() +} + +// supportedFormats is the list of recognized +// file extensions. They are in an ordered slice +// because ordering is important, since some +// extensions can be substrings of others. +var supportedFormats = []string{ + ".tar.bz2", + ".tar.gz", + ".tar.lz4", + ".tar.sz", + ".tar.xz", + ".rar", + ".tar", + ".zip", + ".gz", + ".bz2", + ".lz4", + ".sz", + ".xz", +} + +const usage = `Usage: arc {archive|unarchive|extract|ls|compress|decompress|help} [arguments...] + archive + Create a new archive file. List the files/folders + to include in the archive; at least one required. + unarchive + Extract an archive file. Provide the archive to + open and the destination folder to extract into. + extract + Extract a single file or folder (recursively) from + an archive. First argument is the source archive, + second is the file to extract (exact path within the + archive is required), and third is destination. + ls + List the contents of the archive. + compress + Compresses a file, destination optional. + decompress + Decompresses a file, destination optional. + help + Display this help text. Also -h or --help. + + SPECIFYING THE ARCHIVE FORMAT + The format of the archive is determined by its + file extension. Supported extensions: + .zip + .tar + .tar.gz + .tgz + .tar.bz2 + .tbz2 + .tar.xz + .txz + .tar.lz4 + .tlz4 + .tar.sz + .tsz + .rar (open only) + .bz2 + .gz + .lz4 + .sz + .xz + + (DE)COMPRESSING SINGLE FILES + Some formats are compression-only, and can be used + with the compress and decompress commands on a + single file; they do not bundle multiple files. + + To replace a file when compressing, specify the + source file name for the first argument, and the + compression format (without leading dot) for the + second argument. To replace a file when decompressing, + specify only the source file and no destination. + + PASSWORD-PROTECTED RAR FILES + Export the ARCHIVE_PASSWORD environment variable + to be able to open password-protected rar archives. + + GLOBAL FLAG REFERENCE + The following global flags may be used before the + sub-command (some flags are format-specific): + +` diff --git a/cmd/archiver/main.go b/cmd/archiver/main.go deleted file mode 100644 index d5e57f5c..00000000 --- a/cmd/archiver/main.go +++ /dev/null @@ -1,96 +0,0 @@ -package main - -import ( - "fmt" - "os" - - "github.com/mholt/archiver" -) - -func main() { - if len(os.Args) == 2 && os.Args[1] == "-h" { - fmt.Println(usage) - os.Exit(0) - } - if len(os.Args) < 3 { - fatal(usage) - } - - cmd, filename := os.Args[1], os.Args[2] - - ff := archiver.MatchingFormat(filename) - if ff == nil { - fatalf("%s: Unsupported file extension", filename) - } - - var err error - switch cmd { - case "make": - if len(os.Args) < 4 { - fatal(usage) - } - err = ff.Make(filename, os.Args[3:]) - case "open": - dest, osErr := os.Getwd() - if osErr != nil { - fatal(err) - } - if len(os.Args) == 4 { - dest = os.Args[3] - } else if len(os.Args) > 4 { - fatal(usage) - } - err = ff.Open(filename, dest) - default: - fatal(usage) - } - if err != nil { - fatal(err) - } -} - -func fatal(v ...interface{}) { - fmt.Fprintln(os.Stderr, v...) - os.Exit(1) -} - -func fatalf(s string, v ...interface{}) { - fmt.Fprintf(os.Stderr, s+"\n", v...) - os.Exit(1) -} - -const usage = `Usage: archiver {make|open} [files...] - make - Create a new archive file. List the files/folders - to include in the archive; at least one required. - open - Extract an archive file. Give only the archive to - open and the destination folder to extract into. - - Specifying archive format: - The format of the archive is determined by its - file extension. Supported extensions: - .zip - .tar - .tar.gz - .tgz - .tar.bz2 - .tbz2 - .tar.xz - .txz - .tar.lz4 - .tlz4 - .tar.sz - .tsz - .rar (open only) - .gz - .bz2 - - Existing files: - When creating an archive file that already exists, - archiver will overwrite the existing file. When - extracting files, archiver will NOT overwrite files - that already exist in the destination path; this - is treated as an error and extraction will abort. - - Use "archiver -h" to display this help message` diff --git a/filecompressor.go b/filecompressor.go new file mode 100644 index 00000000..df881468 --- /dev/null +++ b/filecompressor.go @@ -0,0 +1,66 @@ +package archiver + +import ( + "fmt" + "os" +) + +// FileCompressor can compress and decompress single files. +type FileCompressor struct { + Compressor + Decompressor + + OverwriteExisting bool +} + +// CompressFile reads the source file and compresses it to destination. +// The destination must have a matching extension. +func (fc FileCompressor) CompressFile(source, destination string) error { + if err := fc.CheckExt(destination); err != nil { + return err + } + if fc.Compressor == nil { + return fmt.Errorf("no compressor specified") + } + if !fc.OverwriteExisting && fileExists(destination) { + return fmt.Errorf("file exists: %s", destination) + } + + in, err := os.Open(source) + if err != nil { + return err + } + defer in.Close() + + out, err := os.Create(destination) + if err != nil { + return err + } + defer out.Close() + + return fc.Compress(in, out) +} + +// DecompressFile reads the source file and decompresses it to destination. +func (fc FileCompressor) DecompressFile(source, destination string) error { + if fc.Decompressor == nil { + return fmt.Errorf("no decompressor specified") + } + if !fc.OverwriteExisting && fileExists(destination) { + return fmt.Errorf("file exists: %s", destination) + } + + in, err := os.Open(source) + if err != nil { + return err + } + defer in.Close() + + out, err := os.Create(destination) + if err != nil { + return err + } + defer out.Close() + + return fc.Decompress(in, out) +} diff --git a/gz.go b/gz.go index a764d749..7325c0f5 100644 --- a/gz.go +++ b/gz.go @@ -4,110 +4,48 @@ import ( "compress/gzip" "fmt" "io" - "os" - "strings" + "path/filepath" ) -// Gz is for Gz format -var Gz gzFormat - -func init() { - RegisterFormat("Gz", Gz) -} - -type gzFormat struct{} - -func (gzFormat) Match(filename string) bool { - return (strings.HasSuffix(strings.ToLower(filename), ".gz") && - !strings.HasSuffix(strings.ToLower(filename), ".tar.gz") && - !strings.HasSuffix(strings.ToLower(filename), ".tgz")) || - (!isTarGz(filename) && - isGz(filename)) +// Gz facilitates gzip compression. +type Gz struct { + CompressionLevel int } -// isGz checks if the file is a valid gzip. -func isGz(gzPath string) bool { - f, err := os.Open(gzPath) +// Compress reads in, compresses it, and writes it to out. +func (gz *Gz) Compress(in io.Reader, out io.Writer) error { + w, err := gzip.NewWriterLevel(out, gz.CompressionLevel) if err != nil { - return false + return err } - defer f.Close() - - _, err = gzip.NewReader(f) - if err == gzip.ErrHeader { - return false - } - - return true + defer w.Close() + _, err = io.Copy(w, in) + return err } -// Write outputs to a Writer the gzip'd contents of the first file listed in -// filePaths. -func (gzFormat) Write(output io.Writer, filePaths []string) error { - return writeGz(filePaths, output, "") -} - -// Make creates a file at gzPath containing the gzip'd contents of the first file -// listed in filePaths. -func (gzFormat) Make(gzPath string, filePaths []string) error { - out, err := os.Create(gzPath) +// Decompress reads in, decompresses it, and writes it to out. +func (gz *Gz) Decompress(in io.Reader, out io.Writer) error { + r, err := gzip.NewReader(in) if err != nil { - return fmt.Errorf("error creating %s: %v", gzPath, err) + return err } - defer out.Close() - - return writeGz(filePaths, out, gzPath) + defer r.Close() + _, err = io.Copy(out, r) + return err } -func writeGz(filePaths []string, output io.Writer, dest string) error { - if len(filePaths) != 1 { - return fmt.Errorf("only one file supported for gz") - } - firstFile := filePaths[0] - - fileInfo, err := os.Stat(firstFile) - if err != nil { - return fmt.Errorf("%s: stat: %v", firstFile, err) - } - - if fileInfo.IsDir() { - return fmt.Errorf("%s is a directory", firstFile) - } - - in, err := os.Open(firstFile) - if err != nil { - return fmt.Errorf("error reading %s: %v", firstFile, err) - } - defer in.Close() - - gzw := gzip.NewWriter(output) - defer gzw.Close() - - if _, err = io.Copy(gzw, in); err != nil { - return fmt.Errorf("error writing gz: %v", err) +// CheckExt ensures the file extension matches the format. +func (gz *Gz) CheckExt(filename string) error { + if filepath.Ext(filename) != ".gz" { + return fmt.Errorf("filename must have a .gz extension") } return nil } -// Read a gzip'd file from a Reader and decompresses the contents into -// destination. -func (gzFormat) Read(input io.Reader, destination string) error { - gzr, err := gzip.NewReader(input) - if err != nil { - return fmt.Errorf("error decompressing: %v", err) - } - defer gzr.Close() +func (gz *Gz) String() string { return "gz" } - return writeNewFile(destination, gzr, 0644) -} - -// Open decompresses gzip'd source into destination. -func (gzFormat) Open(source, destination string) error { - f, err := os.Open(source) - if err != nil { - return fmt.Errorf("%s: failed to open archive: %v", source, err) - } - defer f.Close() - - return Gz.Read(f, destination) -} +// Compile-time checks to ensure type implements desired interfaces. +var ( + _ = Compressor(new(Gz)) + _ = Decompressor(new(Gz)) +) diff --git a/lz4.go b/lz4.go new file mode 100644 index 00000000..5291791a --- /dev/null +++ b/lz4.go @@ -0,0 +1,46 @@ +package archiver + +import ( + "fmt" + "io" + "path/filepath" + + "github.com/pierrec/lz4" +) + +// Lz4 facilitates LZ4 compression. +type Lz4 struct { + CompressionLevel int +} + +// Compress reads in, compresses it, and writes it to out. +func (lz *Lz4) Compress(in io.Reader, out io.Writer) error { + w := lz4.NewWriter(out) + w.Header.CompressionLevel = lz.CompressionLevel + defer w.Close() + _, err := io.Copy(w, in) + return err +} + +// Decompress reads in, decompresses it, and writes it to out. +func (lz *Lz4) Decompress(in io.Reader, out io.Writer) error { + r := lz4.NewReader(in) + _, err := io.Copy(out, r) + return err +} + +// CheckExt ensures the file extension matches the format. +func (lz *Lz4) CheckExt(filename string) error { + if filepath.Ext(filename) != ".lz4" { + return fmt.Errorf("filename must have a .lz4 extension") + } + return nil +} + +func (lz *Lz4) String() string { return "lz4" } + +// Compile-time checks to ensure type implements desired interfaces. +var ( + _ = Compressor(new(Lz4)) + _ = Decompressor(new(Lz4)) +) diff --git a/rar.go b/rar.go index 3ff61da6..ba55fecf 100644 --- a/rar.go +++ b/rar.go @@ -4,117 +4,372 @@ import ( "bytes" "fmt" "io" + "log" "os" + "path" "path/filepath" - "strings" + "time" "github.com/nwaples/rardecode" ) -// Rar is for RAR archive format -var Rar rarFormat +// Rar provides facilities for reading RAR archives. +// See https://www.rarlab.com/technote.htm. +type Rar struct { + // Whether to overwrite existing files; if false, + // an error is returned if the file exists. + OverwriteExisting bool -func init() { - RegisterFormat("Rar", Rar) + // Whether to make all the directories necessary + // to create a rar archive in the desired path. + MkdirAll bool + + // A single top-level folder can be implicitly + // created by the Unarchive method if the files + // to be extracted from the archive do not all + // have a common root. This roughly mimics the + // behavior of archival tools integrated into OS + // file browsers which create a subfolder to + // avoid unexpectedly littering the destination + // folder with potentially many files, causing a + // problematic cleanup/organization situation. + // This feature is available for both creation + // and extraction of archives, but may be slightly + // inefficient with lots and lots of files, + // especially on extraction. + ImplicitTopLevelFolder bool + + // If true, errors encountered during reading + // or writing a single file will be logged and + // the operation will continue on remaining files. + ContinueOnError bool + + // The password to open archives (optional). + Password string + + rr *rardecode.Reader // underlying stream reader + rc *rardecode.ReadCloser // supports multi-volume archives (files only) } -type rarFormat struct{} +// Unarchive unpacks the .rar file at source to destination. +// Destination will be treated as a folder name. It supports +// multi-volume archives. +func (r *Rar) Unarchive(source, destination string) error { + if !fileExists(destination) && r.MkdirAll { + err := mkdir(destination) + if err != nil { + return fmt.Errorf("preparing destination: %v", err) + } + } + + // if the files in the archive do not all share a common + // root, then make sure we extract to a single subfolder + // rather than potentially littering the destination... + if r.ImplicitTopLevelFolder { + var err error + destination, err = r.addTopLevelFolder(source, destination) + if err != nil { + return fmt.Errorf("scanning source archive: %v", err) + } + } + + err := r.OpenFile(source) + if err != nil { + return fmt.Errorf("opening rar archive for reading: %v", err) + } + defer r.Close() -func (rarFormat) Match(filename string) bool { - return strings.HasSuffix(strings.ToLower(filename), ".rar") || isRar(filename) + for { + err := r.unrarNext(destination) + if err == io.EOF { + break + } + if err != nil { + if r.ContinueOnError { + log.Printf("[ERROR] Reading file in rar archive: %v", err) + continue + } + return fmt.Errorf("reading file in rar archive: %v", err) + } + } + + return nil } -// isRar checks the file has the RAR 1.5 or 5.0 format signature by reading its -// beginning bytes and matching it -func isRar(rarPath string) bool { - f, err := os.Open(rarPath) +// addTopLevelFolder scans the files contained inside +// the tarball named sourceArchive and returns a modified +// destination if all the files do not share the same +// top-level folder. +func (r *Rar) addTopLevelFolder(sourceArchive, destination string) (string, error) { + file, err := os.Open(sourceArchive) if err != nil { - return false + return "", fmt.Errorf("opening source archive: %v", err) } - defer f.Close() + defer file.Close() - buf := make([]byte, 8) - if n, err := f.Read(buf); err != nil || n < 8 { - return false + rc, err := rardecode.NewReader(file, r.Password) + if err != nil { + return "", fmt.Errorf("creating archive reader: %v", err) } - return bytes.Equal(buf[:7], []byte("Rar!\x1a\x07\x00")) || // ver 1.5 - bytes.Equal(buf, []byte("Rar!\x1a\x07\x01\x00")) // ver 5.0 + var files []string + for { + hdr, err := rc.Next() + if err == io.EOF { + break + } + if err != nil { + return "", fmt.Errorf("scanning tarball's file listing: %v", err) + } + files = append(files, hdr.Name) + } + + if multipleTopLevels(files) { + destination = filepath.Join(destination, folderNameFromFileName(sourceArchive)) + } + + return destination, nil } -// Write outputs a .rar archive, but this is not implemented because -// RAR is a proprietary format. It is here only for symmetry with -// the other archive formats in this package. -func (rarFormat) Write(output io.Writer, filePaths []string) error { - return fmt.Errorf("write: RAR not implemented (proprietary format)") +func (r *Rar) unrarNext(to string) error { + f, err := r.Read() + if err != nil { + return err // don't wrap error; calling loop must break on io.EOF + } + header, ok := f.Header.(*rardecode.FileHeader) + if !ok { + return fmt.Errorf("expected header to be *rardecode.FileHeader but was %T", f.Header) + } + return r.unrarFile(f, filepath.Join(to, header.Name)) } -// Make makes a .rar archive, but this is not implemented because -// RAR is a proprietary format. It is here only for symmetry with -// the other archive formats in this package. -func (rarFormat) Make(rarPath string, filePaths []string) error { - return fmt.Errorf("make %s: RAR not implemented (proprietary format)", rarPath) +func (r *Rar) unrarFile(f File, to string) error { + // do not overwrite existing files, if configured + if !f.IsDir() && !r.OverwriteExisting && fileExists(to) { + return fmt.Errorf("file already exists: %s", to) + } + + hdr, ok := f.Header.(*rardecode.FileHeader) + if !ok { + return fmt.Errorf("expected header to be *rardecode.FileHeader but was %T", f.Header) + } + + // if files come before their containing folders, then we must + // create their folders before writing the file + err := mkdir(filepath.Dir(to)) + if err != nil { + return fmt.Errorf("making parent directories: %v", err) + } + + return writeNewFile(to, r.rr, hdr.Mode()) } -// Read extracts the RAR file read from input and puts the contents -// into destination. -func (rarFormat) Read(input io.Reader, destination string) error { - rr, err := rardecode.NewReader(input, "") +// OpenFile opens filename for reading. This method supports +// multi-volume archives, whereas Open does not (but Open +// supports any stream, not just files). +func (r *Rar) OpenFile(filename string) error { + if r.rr != nil { + return fmt.Errorf("rar archive is already open for reading") + } + var err error + r.rc, err = rardecode.OpenReader(filename, r.Password) if err != nil { - return fmt.Errorf("read: failed to create reader: %v", err) + return err } + r.rr = &r.rc.Reader + return nil +} - return extract(rr, destination) +// Open opens t for reading an archive from +// in. The size parameter is not used. +func (r *Rar) Open(in io.Reader, size int64) error { + if r.rr != nil { + return fmt.Errorf("rar archive is already open for reading") + } + var err error + r.rr, err = rardecode.NewReader(in, r.Password) + return err } -// Open extracts the RAR file at source and puts the contents -// into destination. -func (rarFormat) Open(source, destination string) error { - rf, err := rardecode.OpenReader(source, "") +// Read reads the next file from t, which must have +// already been opened for reading. If there are no +// more files, the error is io.EOF. The File must +// be closed when finished reading from it. +func (r *Rar) Read() (File, error) { + if r.rr == nil { + return File{}, fmt.Errorf("rar archive is not open") + } + + hdr, err := r.rr.Next() if err != nil { - return fmt.Errorf("%s: failed to open file: %v", source, err) + return File{}, err // don't wrap error; preserve io.EOF } - defer rf.Close() - return extract(&rf.Reader, destination) + file := File{ + FileInfo: rarFileInfo{hdr}, + Header: hdr, + ReadCloser: ReadFakeCloser{r.rr}, + } + + return file, nil } -func extract(rr *rardecode.Reader, destination string) error { +// Close closes the rar archive(s) opened by Create and Open. +func (r *Rar) Close() error { + var err error + if r.rc != nil { + rc := r.rc + r.rc = nil + err = rc.Close() + } + if r.rr != nil { + r.rr = nil + } + return err +} + +// Walk calls walkFn for each visited item in archive. +func (r *Rar) Walk(archive string, walkFn WalkFunc) error { + file, err := os.Open(archive) + if err != nil { + return fmt.Errorf("opening archive file: %v", err) + } + defer file.Close() + + err = r.Open(file, 0) + if err != nil { + return fmt.Errorf("opening archive: %v", err) + } + defer r.Close() + for { - header, err := rr.Next() + f, err := r.Read() if err == io.EOF { break - } else if err != nil { - return err } - - err = sanitizeExtractPath(header.Name, destination) if err != nil { - return err + if r.ContinueOnError { + log.Printf("[ERROR] Opening next file: %v", err) + continue + } + return fmt.Errorf("opening next file: %v", err) } + err = walkFn(f) + if err != nil { + if err == ErrStopWalk { + break + } + if r.ContinueOnError { + log.Printf("[ERROR] Walking %s: %v", f.Name(), err) + continue + } + return fmt.Errorf("walking %s: %v", f.Name(), err) + } + } - destpath := filepath.Join(destination, header.Name) + return nil +} - if header.IsDir { - err = mkdir(destpath) - if err != nil { - return err - } - continue +// Extract extracts a single file from the rar archive. +// If the target is a directory, the entire folder will +// be extracted into destination. +func (r *Rar) Extract(source, target, destination string) error { + // target refers to a path inside the archive, which should be clean also + target = path.Clean(target) + + // if the target ends up being a directory, then + // we will continue walking and extracting files + // until we are no longer within that directory + var targetDirPath string + + return r.Walk(source, func(f File) error { + th, ok := f.Header.(*rardecode.FileHeader) + if !ok { + return fmt.Errorf("expected header to be *rardecode.FileHeader but was %T", f.Header) } - // if files come before their containing folders, then we must - // create their folders before writing the file - err = mkdir(filepath.Dir(destpath)) - if err != nil { - return err + // importantly, cleaning the path strips tailing slash, + // which must be appended to folders within the archive + name := path.Clean(th.Name) + if f.IsDir() && target == name { + targetDirPath = path.Dir(name) } - err = writeNewFile(destpath, rr, header.Mode()) - if err != nil { - return err + if within(target, th.Name) { + // either this is the exact file we want, or is + // in the directory we want to extract + + // build the filename we will extract to + end, err := filepath.Rel(targetDirPath, th.Name) + if err != nil { + return fmt.Errorf("relativizing paths: %v", err) + } + joined := filepath.Join(destination, end) + + err = r.unrarFile(f, joined) + if err != nil { + return fmt.Errorf("extracting file %s: %v", th.Name, err) + } + + // if our target was not a directory, stop walk + if targetDirPath == "" { + return ErrStopWalk + } + } else if targetDirPath != "" { + // finished walking the entire directory + return ErrStopWalk } + + return nil + }) +} + +// Match returns true if the format of file matches this +// type's format. It should not affect reader position. +func (*Rar) Match(file *os.File) (bool, error) { + currentPos, err := file.Seek(0, io.SeekCurrent) + if err != nil { + return false, err + } + _, err = file.Seek(0, 0) + if err != nil { + return false, err } + defer file.Seek(currentPos, io.SeekStart) - return nil + buf := make([]byte, 8) + if n, err := file.Read(buf); err != nil || n < 8 { + return false, nil + } + hasTarHeader := bytes.Equal(buf[:7], []byte("Rar!\x1a\x07\x00")) || // ver 1.5 + bytes.Equal(buf, []byte("Rar!\x1a\x07\x01\x00")) // ver 5.0 + return hasTarHeader, nil +} + +func (r *Rar) String() string { return "rar" } + +type rarFileInfo struct { + fh *rardecode.FileHeader +} + +func (rfi rarFileInfo) Name() string { return rfi.fh.Name } +func (rfi rarFileInfo) Size() int64 { return rfi.fh.UnPackedSize } +func (rfi rarFileInfo) Mode() os.FileMode { return rfi.fh.Mode() } +func (rfi rarFileInfo) ModTime() time.Time { return rfi.fh.ModificationTime } +func (rfi rarFileInfo) IsDir() bool { return rfi.fh.IsDir } +func (rfi rarFileInfo) Sys() interface{} { return nil } + +// Compile-time checks to ensure type implements desired interfaces. +var ( + _ = Reader(new(Rar)) + _ = Unarchiver(new(Rar)) + _ = Walker(new(Rar)) + _ = Extractor(new(Rar)) + _ = Matcher(new(Rar)) + _ = os.FileInfo(rarFileInfo{}) +) + +// DefaultRar is a convenient archiver ready to use. +var DefaultRar = &Rar{ + MkdirAll: true, } diff --git a/sz.go b/sz.go new file mode 100644 index 00000000..9e9fcd19 --- /dev/null +++ b/sz.go @@ -0,0 +1,43 @@ +package archiver + +import ( + "fmt" + "io" + "path/filepath" + + "github.com/golang/snappy" +) + +// Snappy facilitates Snappy compression. +type Snappy struct{} + +// Compress reads in, compresses it, and writes it to out. +func (s *Snappy) Compress(in io.Reader, out io.Writer) error { + w := snappy.NewWriter(out) + defer w.Close() + _, err := io.Copy(w, in) + return err +} + +// Decompress reads in, decompresses it, and writes it to out. +func (s *Snappy) Decompress(in io.Reader, out io.Writer) error { + r := snappy.NewReader(in) + _, err := io.Copy(out, r) + return err +} + +// CheckExt ensures the file extension matches the format. +func (s *Snappy) CheckExt(filename string) error { + if filepath.Ext(filename) != ".sz" { + return fmt.Errorf("filename must have a .sz extension") + } + return nil +} + +func (s *Snappy) String() string { return "sz" } + +// Compile-time checks to ensure type implements desired interfaces. +var ( + _ = Compressor(new(Snappy)) + _ = Decompressor(new(Snappy)) +) diff --git a/tar.go b/tar.go index ee0c4436..dd9cf0d2 100644 --- a/tar.go +++ b/tar.go @@ -5,240 +5,599 @@ import ( "bytes" "fmt" "io" + "log" "os" + "path" "path/filepath" "strconv" "strings" ) -// Tar is for Tar format -var Tar tarFormat - -func init() { - RegisterFormat("Tar", Tar) +// Tar provides facilities for operating TAR archives. +// See http://www.gnu.org/software/tar/manual/html_node/Standard.html. +type Tar struct { + // Whether to overwrite existing files; if false, + // an error is returned if the file exists. + OverwriteExisting bool + + // Whether to make all the directories necessary + // to create a tar archive in the desired path. + MkdirAll bool + + // A single top-level folder can be implicitly + // created by the Archive or Unarchive methods + // if the files to be added to the archive + // or the files to be extracted from the archive + // do not all have a common root. This roughly + // mimics the behavior of archival tools integrated + // into OS file browsers which create a subfolder + // to avoid unexpectedly littering the destination + // folder with potentially many files, causing a + // problematic cleanup/organization situation. + // This feature is available for both creation + // and extraction of archives, but may be slightly + // inefficient with lots and lots of files, + // especially on extraction. + ImplicitTopLevelFolder bool + + // If true, errors encountered during reading + // or writing a single file will be logged and + // the operation will continue on remaining files. + ContinueOnError bool + + tw *tar.Writer + tr *tar.Reader + + readerWrapFn func(io.Reader) (io.Reader, error) + writerWrapFn func(io.Writer) (io.Writer, error) + cleanupWrapFn func() } -type tarFormat struct{} +// Archive creates a tarball file at destination containing +// the files listed in sources. The destination must end with +// ".tar". File paths can be those of regular files or +// directories; directories will be recursively added. +func (t *Tar) Archive(sources []string, destination string) error { + if t.writerWrapFn == nil && !strings.HasSuffix(destination, ".tar") { + return fmt.Errorf("output filename must have .tar extension") + } + if !t.OverwriteExisting && fileExists(destination) { + return fmt.Errorf("file already exists: %s", destination) + } -func (tarFormat) Match(filename string) bool { - return strings.HasSuffix(strings.ToLower(filename), ".tar") || isTar(filename) -} + // make the folder to contain the resulting archive + // if it does not already exist + destDir := filepath.Dir(destination) + if t.MkdirAll && !fileExists(destDir) { + err := mkdir(destDir) + if err != nil { + return fmt.Errorf("making folder for destination: %v", err) + } + } -const tarBlockSize int = 512 + out, err := os.Create(destination) + if err != nil { + return fmt.Errorf("creating %s: %v", destination, err) + } + defer out.Close() -// isTar checks the file has the Tar format header by reading its beginning -// block. -func isTar(tarPath string) bool { - f, err := os.Open(tarPath) + err = t.Create(out) if err != nil { - return false + return fmt.Errorf("creating tar: %v", err) } - defer f.Close() + defer t.Close() - buf := make([]byte, tarBlockSize) - if _, err = io.ReadFull(f, buf); err != nil { - return false + var topLevelFolder string + if t.ImplicitTopLevelFolder && multipleTopLevels(sources) { + topLevelFolder = folderNameFromFileName(destination) + } + + for _, source := range sources { + err := t.writeWalk(source, topLevelFolder, destination) + if err != nil { + return fmt.Errorf("walking %s: %v", source, err) + } } - return hasTarHeader(buf) + return nil } -// hasTarHeader checks passed bytes has a valid tar header or not. buf must -// contain at least 512 bytes and if not, it always returns false. -func hasTarHeader(buf []byte) bool { - if len(buf) < tarBlockSize { - return false +// Unarchive unpacks the .tar file at source to destination. +// Destination will be treated as a folder name. +func (t *Tar) Unarchive(source, destination string) error { + if !fileExists(destination) && t.MkdirAll { + err := mkdir(destination) + if err != nil { + return fmt.Errorf("preparing destination: %v", err) + } } - b := buf[148:156] - b = bytes.Trim(b, " \x00") // clean up all spaces and null bytes - if len(b) == 0 { - return false // unknown format + // if the files in the archive do not all share a common + // root, then make sure we extract to a single subfolder + // rather than potentially littering the destination... + if t.ImplicitTopLevelFolder { + var err error + destination, err = t.addTopLevelFolder(source, destination) + if err != nil { + return fmt.Errorf("scanning source archive: %v", err) + } } - hdrSum, err := strconv.ParseUint(string(b), 8, 64) + + file, err := os.Open(source) if err != nil { - return false + return fmt.Errorf("opening source archive: %v", err) } + defer file.Close() - // According to the go official archive/tar, Sun tar uses signed byte - // values so this calcs both signed and unsigned - var usum uint64 - var sum int64 - for i, c := range buf { - if 148 <= i && i < 156 { - c = ' ' // checksum field itself is counted as branks - } - usum += uint64(uint8(c)) - sum += int64(int8(c)) + err = t.Open(file, 0) + if err != nil { + return fmt.Errorf("opening tar archive for reading: %v", err) } + defer t.Close() - if hdrSum != usum && int64(hdrSum) != sum { - return false // invalid checksum + for { + err := t.untarNext(destination) + if err == io.EOF { + break + } + if err != nil { + if t.ContinueOnError { + log.Printf("[ERROR] Reading file in tar archive: %v", err) + continue + } + return fmt.Errorf("reading file in tar archive: %v", err) + } } - return true + return nil } -// Write outputs a .tar file to a Writer containing the -// contents of files listed in filePaths. File paths can -// be those of regular files or directories. Regular -// files are stored at the 'root' of the archive, and -// directories are recursively added. -func (tarFormat) Write(output io.Writer, filePaths []string) error { - return writeTar(filePaths, output, "") +// addTopLevelFolder scans the files contained inside +// the tarball named sourceArchive and returns a modified +// destination if all the files do not share the same +// top-level folder. +func (t *Tar) addTopLevelFolder(sourceArchive, destination string) (string, error) { + file, err := os.Open(sourceArchive) + if err != nil { + return "", fmt.Errorf("opening source archive: %v", err) + } + defer file.Close() + + // if the reader is to be wrapped, ensure we do that now + // or we will not be able to read the archive successfully + reader := io.Reader(file) + if t.readerWrapFn != nil { + reader, err = t.readerWrapFn(reader) + if err != nil { + return "", fmt.Errorf("wrapping reader: %v", err) + } + } + if t.cleanupWrapFn != nil { + defer t.cleanupWrapFn() + } + + tr := tar.NewReader(reader) + + var files []string + for { + hdr, err := tr.Next() + if err == io.EOF { + break + } + if err != nil { + return "", fmt.Errorf("scanning tarball's file listing: %v", err) + } + files = append(files, hdr.Name) + } + + if multipleTopLevels(files) { + destination = filepath.Join(destination, folderNameFromFileName(sourceArchive)) + } + + return destination, nil } -// Make creates a .tar file at tarPath containing the -// contents of files listed in filePaths. File paths can -// be those of regular files or directories. Regular -// files are stored at the 'root' of the archive, and -// directories are recursively added. -func (tarFormat) Make(tarPath string, filePaths []string) error { - out, err := os.Create(tarPath) +func (t *Tar) untarNext(to string) error { + f, err := t.Read() if err != nil { - return fmt.Errorf("error creating %s: %v", tarPath, err) + return err // don't wrap error; calling loop must break on io.EOF } - defer out.Close() - - return writeTar(filePaths, out, tarPath) + header, ok := f.Header.(*tar.Header) + if !ok { + return fmt.Errorf("expected header to be *tar.Header but was %T", f.Header) + } + return t.untarFile(f, filepath.Join(to, header.Name)) } -func writeTar(filePaths []string, output io.Writer, dest string) error { - tarWriter := tar.NewWriter(output) - defer tarWriter.Close() +func (t *Tar) untarFile(f File, to string) error { + // do not overwrite existing files, if configured + if !f.IsDir() && !t.OverwriteExisting && fileExists(to) { + return fmt.Errorf("file already exists: %s", to) + } - return tarball(filePaths, tarWriter, dest) -} + hdr, ok := f.Header.(*tar.Header) + if !ok { + return fmt.Errorf("expected header to be *tar.Header but was %T", f.Header) + } -// tarball writes all files listed in filePaths into tarWriter, which is -// writing into a file located at dest. -func tarball(filePaths []string, tarWriter *tar.Writer, dest string) error { - for _, fpath := range filePaths { - err := tarFile(tarWriter, fpath, dest) - if err != nil { - return err - } + switch hdr.Typeflag { + case tar.TypeDir: + return mkdir(to) + case tar.TypeReg, tar.TypeRegA, tar.TypeChar, tar.TypeBlock, tar.TypeFifo: + return writeNewFile(to, f, f.Mode()) + case tar.TypeSymlink: + return writeNewSymbolicLink(to, hdr.Linkname) + case tar.TypeLink: + return writeNewHardLink(to, filepath.Join(to, hdr.Linkname)) + case tar.TypeXGlobalHeader: + return nil // ignore the pax global header from git-generated tarballs + default: + return fmt.Errorf("%s: unknown type flag: %c", hdr.Name, hdr.Typeflag) } - return nil } -// tarFile writes the file at source into tarWriter. It does so -// recursively for directories. -func tarFile(tarWriter *tar.Writer, source, dest string) error { - sourceInfo, err := os.Stat(source) +func (t *Tar) writeWalk(source, topLevelFolder, destination string) error { + sourceAbs, err := filepath.Abs(source) + if err != nil { + return fmt.Errorf("getting absolute path: %v", err) + } + sourceInfo, err := os.Stat(sourceAbs) if err != nil { return fmt.Errorf("%s: stat: %v", source, err) } + destAbs, err := filepath.Abs(destination) + if err != nil { + return fmt.Errorf("%s: getting absolute path of destination %s: %v", source, destination, err) + } var baseDir string + if topLevelFolder != "" { + baseDir = topLevelFolder + } if sourceInfo.IsDir() { - baseDir = filepath.Base(source) + baseDir = path.Join(baseDir, sourceInfo.Name()) } - return filepath.Walk(source, func(path string, info os.FileInfo, err error) error { - if err != nil { - return fmt.Errorf("error walking to %s: %v", path, err) + return filepath.Walk(source, func(fpath string, info os.FileInfo, err error) error { + handleErr := func(err error) error { + if t.ContinueOnError { + log.Printf("[ERROR] Walking %s: %v", fpath, err) + return nil + } + return err } - - header, err := tar.FileInfoHeader(info, path) if err != nil { - return fmt.Errorf("%s: making header: %v", path, err) + return handleErr(fmt.Errorf("traversing %s: %v", fpath, err)) } - - if baseDir != "" { - header.Name = filepath.ToSlash(filepath.Join(baseDir, strings.TrimPrefix(path, source))) + if info == nil { + return handleErr(fmt.Errorf("no file info")) } - if header.Name == dest { - // our new tar file is inside the directory being archived; skip it + // make sure we do not copy our output file into itself + fpathAbs, err := filepath.Abs(fpath) + if err != nil { + return handleErr(fmt.Errorf("%s: getting absolute path: %v", fpath, err)) + } + if within(fpathAbs, destAbs) { return nil } - if info.IsDir() { - header.Name += "/" + // build the name to be used in the archive + name, err := filepath.Rel(source, fpath) + if err != nil { + return handleErr(err) } + nameInArchive := path.Join(baseDir, filepath.ToSlash(name)) - err = tarWriter.WriteHeader(header) + file, err := os.Open(fpath) if err != nil { - return fmt.Errorf("%s: writing header: %v", path, err) + return handleErr(fmt.Errorf("%s: opening: %v", fpath, err)) } - - if info.IsDir() { - return nil + defer file.Close() + + err = t.Write(File{ + FileInfo: FileInfo{ + FileInfo: info, + CustomName: nameInArchive, + }, + ReadCloser: file, + }) + if err != nil { + return handleErr(fmt.Errorf("%s: writing: %s", fpath, err)) } - if header.Typeflag == tar.TypeReg { - file, err := os.Open(path) - if err != nil { - return fmt.Errorf("%s: open: %v", path, err) - } - defer file.Close() + return nil + }) +} - _, err = io.CopyN(tarWriter, file, info.Size()) - if err != nil && err != io.EOF { - return fmt.Errorf("%s: copying contents: %v", path, err) - } +// Create opens t for writing a tar archive to out. +func (t *Tar) Create(out io.Writer) error { + if t.tw != nil { + return fmt.Errorf("tar archive is already created for writing") + } + + // wrapping writers allows us to output + // compressed tarballs, for example + if t.writerWrapFn != nil { + var err error + out, err = t.writerWrapFn(out) + if err != nil { + return fmt.Errorf("wrapping writer: %v", err) } + } + + t.tw = tar.NewWriter(out) + return nil +} + +// Write writes f to t, which must have been opened for writing first. +func (t *Tar) Write(f File) error { + if t.tw == nil { + return fmt.Errorf("tar archive was not created for writing first") + } + if f.FileInfo == nil { + return fmt.Errorf("no file info") + } + if f.FileInfo.Name() == "" { + return fmt.Errorf("missing file name") + } + if f.ReadCloser == nil { + return fmt.Errorf("%s: no way to read file contents", f.Name()) + } + + hdr, err := tar.FileInfoHeader(f, f.Name()) + if err != nil { + return fmt.Errorf("%s: making header: %v", f.Name(), err) + } + + err = t.tw.WriteHeader(hdr) + if err != nil { + return fmt.Errorf("%s: writing header: %v", hdr.Name, err) + } + + if f.IsDir() { return nil - }) + } + + if hdr.Typeflag == tar.TypeReg { + _, err := io.Copy(t.tw, f) + if err != nil { + return fmt.Errorf("%s: copying contents: %v", f.Name(), err) + } + } + + return nil } -// Read untars a .tar file read from a Reader and puts -// the contents into destination. -func (tarFormat) Read(input io.Reader, destination string) error { - return untar(tar.NewReader(input), destination) +// Open opens t for reading an archive from +// in. The size parameter is not used. +func (t *Tar) Open(in io.Reader, size int64) error { + if t.tr != nil { + return fmt.Errorf("tar archive is already open for reading") + } + // wrapping readers allows us to open compressed tarballs + if t.readerWrapFn != nil { + var err error + in, err = t.readerWrapFn(in) + if err != nil { + return fmt.Errorf("wrapping file reader: %v", err) + } + } + t.tr = tar.NewReader(in) + return nil } -// Open untars source and puts the contents into destination. -func (tarFormat) Open(source, destination string) error { - f, err := os.Open(source) +// Read reads the next file from t, which must have +// already been opened for reading. If there are no +// more files, the error is io.EOF. The File must +// be closed when finished reading from it. +func (t *Tar) Read() (File, error) { + if t.tr == nil { + return File{}, fmt.Errorf("tar archive is not open") + } + + hdr, err := t.tr.Next() if err != nil { - return fmt.Errorf("%s: failed to open archive: %v", source, err) + return File{}, err // don't wrap error; preserve io.EOF } - defer f.Close() - return Tar.Read(f, destination) + file := File{ + FileInfo: hdr.FileInfo(), + Header: hdr, + ReadCloser: ReadFakeCloser{t.tr}, + } + + return file, nil +} + +// Close closes the tar archive(s) opened by Create and Open. +func (t *Tar) Close() error { + var err error + if t.tr != nil { + t.tr = nil + } + if t.tw != nil { + tw := t.tw + t.tw = nil + err = tw.Close() + } + // make sure cleanup of "Reader/Writer wrapper" + // (say that ten times fast) happens AFTER the + // underlying stream is closed + if t.cleanupWrapFn != nil { + t.cleanupWrapFn() + } + return err } -// untar un-tarballs the contents of tr into destination. -func untar(tr *tar.Reader, destination string) error { +// Walk calls walkFn for each visited item in archive. +func (t *Tar) Walk(archive string, walkFn WalkFunc) error { + file, err := os.Open(archive) + if err != nil { + return fmt.Errorf("opening archive file: %v", err) + } + defer file.Close() + + err = t.Open(file, 0) + if err != nil { + return fmt.Errorf("opening archive: %v", err) + } + defer t.Close() + for { - header, err := tr.Next() + f, err := t.Read() if err == io.EOF { break - } else if err != nil { - return err } - - if err := untarFile(tr, header, destination); err != nil { - return err + if err != nil { + if t.ContinueOnError { + log.Printf("[ERROR] Opening next file: %v", err) + continue + } + return fmt.Errorf("opening next file: %v", err) + } + err = walkFn(f) + if err != nil { + if err == ErrStopWalk { + break + } + if t.ContinueOnError { + log.Printf("[ERROR] Walking %s: %v", f.Name(), err) + continue + } + return fmt.Errorf("walking %s: %v", f.Name(), err) } } + return nil } -// untarFile untars a single file from tr with header header into destination. -func untarFile(tr *tar.Reader, header *tar.Header, destination string) error { - err := sanitizeExtractPath(header.Name, destination) +// Extract extracts a single file from the tar archive. +// If the target is a directory, the entire folder will +// be extracted into destination. +func (t *Tar) Extract(source, target, destination string) error { + // target refers to a path inside the archive, which should be clean also + target = path.Clean(target) + + // if the target ends up being a directory, then + // we will continue walking and extracting files + // until we are no longer within that directory + var targetDirPath string + + return t.Walk(source, func(f File) error { + th, ok := f.Header.(*tar.Header) + if !ok { + return fmt.Errorf("expected header to be *tar.Header but was %T", f.Header) + } + + // importantly, cleaning the path strips tailing slash, + // which must be appended to folders within the archive + name := path.Clean(th.Name) + if f.IsDir() && target == name { + targetDirPath = path.Dir(name) + } + + if within(target, th.Name) { + // either this is the exact file we want, or is + // in the directory we want to extract + + // build the filename we will extract to + end, err := filepath.Rel(targetDirPath, th.Name) + if err != nil { + return fmt.Errorf("relativizing paths: %v", err) + } + joined := filepath.Join(destination, end) + + err = t.untarFile(f, joined) + if err != nil { + return fmt.Errorf("extracting file %s: %v", th.Name, err) + } + + // if our target was not a directory, stop walk + if targetDirPath == "" { + return ErrStopWalk + } + } else if targetDirPath != "" { + // finished walking the entire directory + return ErrStopWalk + } + + return nil + }) +} + +// Match returns true if the format of file matches this +// type's format. It should not affect reader position. +func (*Tar) Match(file *os.File) (bool, error) { + currentPos, err := file.Seek(0, io.SeekCurrent) if err != nil { - return err + return false, err } + _, err = file.Seek(0, 0) + if err != nil { + return false, err + } + defer file.Seek(currentPos, io.SeekStart) - destpath := filepath.Join(destination, header.Name) + buf := make([]byte, tarBlockSize) + if _, err = io.ReadFull(file, buf); err != nil { + return false, nil + } + return hasTarHeader(buf), nil +} - switch header.Typeflag { - case tar.TypeDir: - return mkdir(destpath) - case tar.TypeReg, tar.TypeRegA, tar.TypeChar, tar.TypeBlock, tar.TypeFifo: - return writeNewFile(destpath, tr, header.FileInfo().Mode()) - case tar.TypeSymlink: - return writeNewSymbolicLink(destpath, header.Linkname) - case tar.TypeLink: - return writeNewHardLink(destpath, filepath.Join(destination, header.Linkname)) - case tar.TypeXGlobalHeader: - // ignore the pax global header from git generated tarballs - return nil - default: - return fmt.Errorf("%s: unknown type flag: %c", header.Name, header.Typeflag) +// hasTarHeader checks passed bytes has a valid tar header or not. buf must +// contain at least 512 bytes and if not, it always returns false. +func hasTarHeader(buf []byte) bool { + if len(buf) < tarBlockSize { + return false } + + b := buf[148:156] + b = bytes.Trim(b, " \x00") // clean up all spaces and null bytes + if len(b) == 0 { + return false // unknown format + } + hdrSum, err := strconv.ParseUint(string(b), 8, 64) + if err != nil { + return false + } + + // According to the go official archive/tar, Sun tar uses signed byte + // values so this calcs both signed and unsigned + var usum uint64 + var sum int64 + for i, c := range buf { + if 148 <= i && i < 156 { + c = ' ' // checksum field itself is counted as branks + } + usum += uint64(uint8(c)) + sum += int64(int8(c)) + } + + if hdrSum != usum && int64(hdrSum) != sum { + return false // invalid checksum + } + + return true +} + +func (t *Tar) String() string { return "tar" } + +const tarBlockSize = 512 + +// Compile-time checks to ensure type implements desired interfaces. +var ( + _ = Reader(new(Tar)) + _ = Writer(new(Tar)) + _ = Archiver(new(Tar)) + _ = Unarchiver(new(Tar)) + _ = Walker(new(Tar)) + _ = Extractor(new(Tar)) + _ = Matcher(new(Tar)) +) + +// DefaultTar is a convenient archiver ready to use. +var DefaultTar = &Tar{ + MkdirAll: true, } diff --git a/tarbz2.go b/tarbz2.go index e0051d3c..2b44bf4b 100644 --- a/tarbz2.go +++ b/tarbz2.go @@ -3,104 +3,110 @@ package archiver import ( "fmt" "io" - "os" "strings" "github.com/dsnet/compress/bzip2" ) -// TarBz2 is for TarBz2 format -var TarBz2 tarBz2Format +// TarBz2 facilitates bzip2 compression +// (https://github.com/dsnet/compress/blob/master/doc/bzip2-format.pdf) +// of tarball archives. +type TarBz2 struct { + *Tar -func init() { - RegisterFormat("TarBz2", TarBz2) + CompressionLevel int } -type tarBz2Format struct{} - -func (tarBz2Format) Match(filename string) bool { - return strings.HasSuffix(strings.ToLower(filename), ".tar.bz2") || - strings.HasSuffix(strings.ToLower(filename), ".tbz2") || - isTarBz2(filename) -} - -// isTarBz2 checks the file has the bzip2 compressed Tar format header by -// reading its beginning block. -func isTarBz2(tarbz2Path string) bool { - f, err := os.Open(tarbz2Path) - if err != nil { - return false - } - defer f.Close() - - bz2r, err := bzip2.NewReader(f, nil) - if err != nil { - return false +// Archive creates a compressed tar file at destination +// containing the files listed in sources. The destination +// must end with ".tar.bz2" or ".tbz2". File paths can be +// those of regular files or directories; directories will +// be recursively added. +func (tbz2 *TarBz2) Archive(sources []string, destination string) error { + if !strings.HasSuffix(destination, ".tar.bz2") && + !strings.HasSuffix(destination, ".tbz2") { + return fmt.Errorf("output filename must have .tar.bz2 or .tbz2 extension") } - defer bz2r.Close() + tbz2.wrapWriter() + return tbz2.Tar.Archive(sources, destination) +} - buf := make([]byte, tarBlockSize) - n, err := bz2r.Read(buf) - if err != nil || n < tarBlockSize { - return false - } +// Unarchive unpacks the compressed tarball at +// source to destination. Destination will be +// treated as a folder name. +func (tbz2 *TarBz2) Unarchive(source, destination string) error { + tbz2.wrapReader() + return tbz2.Tar.Unarchive(source, destination) +} - return hasTarHeader(buf) +// Walk calls walkFn for each visited item in archive. +func (tbz2 *TarBz2) Walk(archive string, walkFn WalkFunc) error { + tbz2.wrapReader() + return tbz2.Tar.Walk(archive, walkFn) } -// Write outputs a .tar.bz2 file to a Writer containing -// the contents of files listed in filePaths. File paths -// can be those of regular files or directories. Regular -// files are stored at the 'root' of the archive, and -// directories are recursively added. -func (tarBz2Format) Write(output io.Writer, filePaths []string) error { - return writeTarBz2(filePaths, output, "") +// Create opens tbz2 for writing a compressed +// tar archive to out. +func (tbz2 *TarBz2) Create(out io.Writer) error { + tbz2.wrapWriter() + return tbz2.Create(out) } -// Make creates a .tar.bz2 file at tarbz2Path containing -// the contents of files listed in filePaths. File paths -// can be those of regular files or directories. Regular -// files are stored at the 'root' of the archive, and -// directories are recursively added. -func (tarBz2Format) Make(tarbz2Path string, filePaths []string) error { - out, err := os.Create(tarbz2Path) - if err != nil { - return fmt.Errorf("error creating %s: %v", tarbz2Path, err) - } - defer out.Close() +// Open opens t for reading a compressed archive from +// in. The size parameter is not used. +func (tbz2 *TarBz2) Open(in io.Reader, size int64) error { + tbz2.wrapReader() + return tbz2.Tar.Open(in, size) +} - return writeTarBz2(filePaths, out, tarbz2Path) +// Extract extracts a single file from the tar archive. +// If the target is a directory, the entire folder will +// be extracted into destination. +func (tbz2 *TarBz2) Extract(source, target, destination string) error { + tbz2.wrapReader() + return tbz2.Tar.Extract(source, target, destination) } -func writeTarBz2(filePaths []string, output io.Writer, dest string) error { - bz2w, err := bzip2.NewWriter(output, nil) - if err != nil { - return fmt.Errorf("error compressing bzip2: %v", err) +func (tbz2 *TarBz2) wrapWriter() { + var bz2w *bzip2.Writer + tbz2.Tar.writerWrapFn = func(w io.Writer) (io.Writer, error) { + var err error + bz2w, err = bzip2.NewWriter(w, &bzip2.WriterConfig{ + Level: tbz2.CompressionLevel, + }) + return bz2w, err + } + tbz2.Tar.cleanupWrapFn = func() { + bz2w.Close() } - defer bz2w.Close() - - return writeTar(filePaths, bz2w, dest) } -// Read untars a .tar.bz2 file read from a Reader and decompresses -// the contents into destination. -func (tarBz2Format) Read(input io.Reader, destination string) error { - bz2r, err := bzip2.NewReader(input, nil) - if err != nil { - return fmt.Errorf("error decompressing bzip2: %v", err) +func (tbz2 *TarBz2) wrapReader() { + var bz2r *bzip2.Reader + tbz2.Tar.readerWrapFn = func(r io.Reader) (io.Reader, error) { + var err error + bz2r, err = bzip2.NewReader(r, nil) + return bz2r, err + } + tbz2.Tar.cleanupWrapFn = func() { + bz2r.Close() } - defer bz2r.Close() - - return Tar.Read(bz2r, destination) } -// Open untars source and decompresses the contents into destination. -func (tarBz2Format) Open(source, destination string) error { - f, err := os.Open(source) - if err != nil { - return fmt.Errorf("%s: failed to open archive: %v", source, err) - } - defer f.Close() +func (tbz2 *TarBz2) String() string { return "tar.bz2" } + +// Compile-time checks to ensure type implements desired interfaces. +var ( + _ = Reader(new(TarBz2)) + _ = Writer(new(TarBz2)) + _ = Archiver(new(TarBz2)) + _ = Unarchiver(new(TarBz2)) + _ = Walker(new(TarBz2)) + _ = Extractor(new(TarBz2)) +) - return TarBz2.Read(f, destination) +// DefaultTarBz2 is a convenient archiver ready to use. +var DefaultTarBz2 = &TarBz2{ + CompressionLevel: bzip2.DefaultCompression, + Tar: DefaultTar, } diff --git a/targz.go b/targz.go index 6751d49d..513e71ed 100644 --- a/targz.go +++ b/targz.go @@ -4,95 +4,107 @@ import ( "compress/gzip" "fmt" "io" - "os" "strings" ) -// TarGz is for TarGz format -var TarGz tarGzFormat +// TarGz facilitates gzip compression +// (RFC 1952) of tarball archives. +type TarGz struct { + *Tar -func init() { - RegisterFormat("TarGz", TarGz) + // The compression level to use, as described + // in the compress/gzip package. + CompressionLevel int } -type tarGzFormat struct{} - -func (tarGzFormat) Match(filename string) bool { - return strings.HasSuffix(strings.ToLower(filename), ".tar.gz") || - strings.HasSuffix(strings.ToLower(filename), ".tgz") || - isTarGz(filename) -} - -// isTarGz checks the file has the gzip compressed Tar format header by reading -// its beginning block. -func isTarGz(targzPath string) bool { - f, err := os.Open(targzPath) - if err != nil { - return false - } - defer f.Close() - - gzr, err := gzip.NewReader(f) - if err != nil { - return false +// Archive creates a compressed tar file at destination +// containing the files listed in sources. The destination +// must end with ".tar.gz" or ".tgz". File paths can be +// those of regular files or directories; directories will +// be recursively added. +func (tgz *TarGz) Archive(sources []string, destination string) error { + if !strings.HasSuffix(destination, ".tar.gz") && + !strings.HasSuffix(destination, ".tgz") { + return fmt.Errorf("output filename must have .tar.gz or .tgz extension") } - defer gzr.Close() - - buf := make([]byte, tarBlockSize) - n, err := gzr.Read(buf) - if err != nil || n < tarBlockSize { - return false - } - - return hasTarHeader(buf) + tgz.wrapWriter() + return tgz.Tar.Archive(sources, destination) } -// Write outputs a .tar.gz file to a Writer containing -// the contents of files listed in filePaths. It works -// the same way Tar does, but with gzip compression. -func (tarGzFormat) Write(output io.Writer, filePaths []string) error { - return writeTarGz(filePaths, output, "") +// Unarchive unpacks the compressed tarball at +// source to destination. Destination will be +// treated as a folder name. +func (tgz *TarGz) Unarchive(source, destination string) error { + tgz.wrapReader() + return tgz.Tar.Unarchive(source, destination) } -// Make creates a .tar.gz file at targzPath containing -// the contents of files listed in filePaths. It works -// the same way Tar does, but with gzip compression. -func (tarGzFormat) Make(targzPath string, filePaths []string) error { - out, err := os.Create(targzPath) - if err != nil { - return fmt.Errorf("error creating %s: %v", targzPath, err) - } - defer out.Close() +// Walk calls walkFn for each visited item in archive. +func (tgz *TarGz) Walk(archive string, walkFn WalkFunc) error { + tgz.wrapReader() + return tgz.Tar.Walk(archive, walkFn) +} - return writeTarGz(filePaths, out, targzPath) +// Create opens txz for writing a compressed +// tar archive to out. +func (tgz *TarGz) Create(out io.Writer) error { + tgz.wrapWriter() + return tgz.Create(out) } -func writeTarGz(filePaths []string, output io.Writer, dest string) error { - gzw := gzip.NewWriter(output) - defer gzw.Close() +// Open opens t for reading a compressed archive from +// in. The size parameter is not used. +func (tgz *TarGz) Open(in io.Reader, size int64) error { + tgz.wrapReader() + return tgz.Tar.Open(in, size) +} - return writeTar(filePaths, gzw, dest) +// Extract extracts a single file from the tar archive. +// If the target is a directory, the entire folder will +// be extracted into destination. +func (tgz *TarGz) Extract(source, target, destination string) error { + tgz.wrapReader() + return tgz.Tar.Extract(source, target, destination) } -// Read untars a .tar.gz file read from a Reader and decompresses -// the contents into destination. -func (tarGzFormat) Read(input io.Reader, destination string) error { - gzr, err := gzip.NewReader(input) - if err != nil { - return fmt.Errorf("error decompressing: %v", err) +func (tgz *TarGz) wrapWriter() { + var gzw *gzip.Writer + tgz.Tar.writerWrapFn = func(w io.Writer) (io.Writer, error) { + var err error + gzw, err = gzip.NewWriterLevel(w, tgz.CompressionLevel) + return gzw, err + } + tgz.Tar.cleanupWrapFn = func() { + gzw.Close() } - defer gzr.Close() - - return Tar.Read(gzr, destination) } -// Open untars source and decompresses the contents into destination. -func (tarGzFormat) Open(source, destination string) error { - f, err := os.Open(source) - if err != nil { - return fmt.Errorf("%s: failed to open archive: %v", source, err) +func (tgz *TarGz) wrapReader() { + var gzr *gzip.Reader + tgz.Tar.readerWrapFn = func(r io.Reader) (io.Reader, error) { + var err error + gzr, err = gzip.NewReader(r) + return gzr, err + } + tgz.Tar.cleanupWrapFn = func() { + gzr.Close() } - defer f.Close() +} + +func (tgz *TarGz) String() string { return "tar.gz" } + +// Compile-time checks to ensure type implements desired interfaces. +var ( + _ = Reader(new(TarGz)) + _ = Writer(new(TarGz)) + _ = Archiver(new(TarGz)) + _ = Unarchiver(new(TarGz)) + _ = Walker(new(TarGz)) + _ = Extractor(new(TarGz)) +) - return TarGz.Read(f, destination) +// DefaultTarGz is a convenient archiver ready to use. +var DefaultTarGz = &TarGz{ + CompressionLevel: gzip.DefaultCompression, + Tar: DefaultTar, } diff --git a/tarlz4.go b/tarlz4.go index 1ddc881f..10be5f26 100644 --- a/tarlz4.go +++ b/tarlz4.go @@ -3,90 +3,105 @@ package archiver import ( "fmt" "io" - "os" "strings" "github.com/pierrec/lz4" ) -// TarLz4 is for TarLz4 format -var TarLz4 tarLz4Format +// TarLz4 facilitates lz4 compression +// (https://github.com/lz4/lz4/tree/master/doc) +// of tarball archives. +type TarLz4 struct { + *Tar -func init() { - RegisterFormat("TarLz4", TarLz4) + // The compression level to use when writing. + // Minimum 0 (fast compression), maximum 12 + // (most space savings). + CompressionLevel int } -type tarLz4Format struct{} +// Archive creates a compressed tar file at destination +// containing the files listed in sources. The destination +// must end with ".tar.lz4" or ".tlz4". File paths can be +// those of regular files or directories; directories will +// be recursively added. +func (tlz4 *TarLz4) Archive(sources []string, destination string) error { + if !strings.HasSuffix(destination, ".tar.lz4") && + !strings.HasSuffix(destination, ".tlz4") { + return fmt.Errorf("output filename must have .tar.lz4 or .tlz4 extension") + } + tlz4.wrapWriter() + return tlz4.Tar.Archive(sources, destination) +} -func (tarLz4Format) Match(filename string) bool { - return strings.HasSuffix(strings.ToLower(filename), ".tar.lz4") || strings.HasSuffix(strings.ToLower(filename), ".tlz4") || isTarLz4(filename) +// Unarchive unpacks the compressed tarball at +// source to destination. Destination will be +// treated as a folder name. +func (tlz4 *TarLz4) Unarchive(source, destination string) error { + tlz4.wrapReader() + return tlz4.Tar.Unarchive(source, destination) } -// isTarLz4 checks the file has the lz4 compressed Tar format header by -// reading its beginning block. -func isTarLz4(tarlz4Path string) bool { - f, err := os.Open(tarlz4Path) - if err != nil { - return false - } - defer f.Close() +// Walk calls walkFn for each visited item in archive. +func (tlz4 *TarLz4) Walk(archive string, walkFn WalkFunc) error { + tlz4.wrapReader() + return tlz4.Tar.Walk(archive, walkFn) +} - lz4r := lz4.NewReader(f) - buf := make([]byte, tarBlockSize) - n, err := lz4r.Read(buf) - if err != nil || n < tarBlockSize { - return false - } +// Create opens tlz4 for writing a compressed +// tar archive to out. +func (tlz4 *TarLz4) Create(out io.Writer) error { + tlz4.wrapWriter() + return tlz4.Create(out) +} - return hasTarHeader(buf) +// Open opens t for reading a compressed archive from +// in. The size parameter is not used. +func (tlz4 *TarLz4) Open(in io.Reader, size int64) error { + tlz4.wrapReader() + return tlz4.Tar.Open(in, size) } -// Write outputs a .tar.lz4 file to a Writer containing -// the contents of files listed in filePaths. File paths -// can be those of regular files or directories. Regular -// files are stored at the 'root' of the archive, and -// directories are recursively added. -func (tarLz4Format) Write(output io.Writer, filePaths []string) error { - return writeTarLz4(filePaths, output, "") +// Extract extracts a single file from the tar archive. +// If the target is a directory, the entire folder will +// be extracted into destination. +func (tlz4 *TarLz4) Extract(source, target, destination string) error { + tlz4.wrapReader() + return tlz4.Tar.Extract(source, target, destination) } -// Make creates a .tar.lz4 file at tarlz4Path containing -// the contents of files listed in filePaths. File paths -// can be those of regular files or directories. Regular -// files are stored at the 'root' of the archive, and -// directories are recursively added. -func (tarLz4Format) Make(tarlz4Path string, filePaths []string) error { - out, err := os.Create(tarlz4Path) - if err != nil { - return fmt.Errorf("error creating %s: %v", tarlz4Path, err) +func (tlz4 *TarLz4) wrapWriter() { + var lz4w *lz4.Writer + tlz4.Tar.writerWrapFn = func(w io.Writer) (io.Writer, error) { + lz4w = lz4.NewWriter(w) + lz4w.Header.CompressionLevel = tlz4.CompressionLevel + return lz4w, nil + } + tlz4.Tar.cleanupWrapFn = func() { + lz4w.Close() } - defer out.Close() - - return writeTarLz4(filePaths, out, tarlz4Path) } -func writeTarLz4(filePaths []string, output io.Writer, dest string) error { - lz4w := lz4.NewWriter(output) - defer lz4w.Close() - - return writeTar(filePaths, lz4w, dest) +func (tlz4 *TarLz4) wrapReader() { + tlz4.Tar.readerWrapFn = func(r io.Reader) (io.Reader, error) { + return lz4.NewReader(r), nil + } } -// Read untars a .tar.xz file read from a Reader and decompresses -// the contents into destination. -func (tarLz4Format) Read(input io.Reader, destination string) error { - lz4r := lz4.NewReader(input) +func (tlz4 *TarLz4) String() string { return "tar.lz4" } - return Tar.Read(lz4r, destination) -} - -// Open untars source and decompresses the contents into destination. -func (tarLz4Format) Open(source, destination string) error { - f, err := os.Open(source) - if err != nil { - return fmt.Errorf("%s: failed to open archive: %v", source, err) - } - defer f.Close() +// Compile-time checks to ensure type implements desired interfaces. +var ( + _ = Reader(new(TarLz4)) + _ = Writer(new(TarLz4)) + _ = Archiver(new(TarLz4)) + _ = Unarchiver(new(TarLz4)) + _ = Walker(new(TarLz4)) + _ = Extractor(new(TarLz4)) +) - return TarLz4.Read(f, destination) +// DefaultTarLz4 is a convenient archiver ready to use. +var DefaultTarLz4 = &TarLz4{ + CompressionLevel: 9, // https://github.com/lz4/lz4/blob/1b819bfd633ae285df2dfe1b0589e1ec064f2873/lib/lz4hc.h#L48 + Tar: DefaultTar, } diff --git a/tarsz.go b/tarsz.go index 2e290190..4533c3df 100644 --- a/tarsz.go +++ b/tarsz.go @@ -3,90 +3,98 @@ package archiver import ( "fmt" "io" - "os" "strings" "github.com/golang/snappy" ) -// TarSz is for TarSz format -var TarSz tarSzFormat - -func init() { - RegisterFormat("TarSz", TarSz) +// TarSz facilitates Snappy compression +// (https://github.com/google/snappy) +// of tarball archives. +type TarSz struct { + *Tar } -type tarSzFormat struct{} +// Archive creates a compressed tar file at destination +// containing the files listed in sources. The destination +// must end with ".tar.sz" or ".tsz". File paths can be +// those of regular files or directories; directories will +// be recursively added. +func (tsz *TarSz) Archive(sources []string, destination string) error { + if !strings.HasSuffix(destination, ".tar.sz") && + !strings.HasSuffix(destination, ".tsz") { + return fmt.Errorf("output filename must have .tar.sz or .tsz extension") + } + tsz.wrapWriter() + return tsz.Tar.Archive(sources, destination) +} -func (tarSzFormat) Match(filename string) bool { - return strings.HasSuffix(strings.ToLower(filename), ".tar.sz") || strings.HasSuffix(strings.ToLower(filename), ".tsz") || isTarSz(filename) +// Unarchive unpacks the compressed tarball at +// source to destination. Destination will be +// treated as a folder name. +func (tsz *TarSz) Unarchive(source, destination string) error { + tsz.wrapReader() + return tsz.Tar.Unarchive(source, destination) } -// isTarSz checks the file has the sz compressed Tar format header by -// reading its beginning block. -func isTarSz(tarszPath string) bool { - f, err := os.Open(tarszPath) - if err != nil { - return false - } - defer f.Close() +// Walk calls walkFn for each visited item in archive. +func (tsz *TarSz) Walk(archive string, walkFn WalkFunc) error { + tsz.wrapReader() + return tsz.Tar.Walk(archive, walkFn) +} - szr := snappy.NewReader(f) - buf := make([]byte, tarBlockSize) - n, err := szr.Read(buf) - if err != nil || n < tarBlockSize { - return false - } +// Create opens tsz for writing a compressed +// tar archive to out. +func (tsz *TarSz) Create(out io.Writer) error { + tsz.wrapWriter() + return tsz.Create(out) +} - return hasTarHeader(buf) +// Open opens t for reading a compressed archive from +// in. The size parameter is not used. +func (tsz *TarSz) Open(in io.Reader, size int64) error { + tsz.wrapReader() + return tsz.Tar.Open(in, size) } -// Write outputs a .tar.sz file to a Writer containing -// the contents of files listed in filePaths. File paths -// can be those of regular files or directories. Regular -// files are stored at the 'root' of the archive, and -// directories are recursively added. -func (tarSzFormat) Write(output io.Writer, filePaths []string) error { - return writeTarSz(filePaths, output, "") +// Extract extracts a single file from the tar archive. +// If the target is a directory, the entire folder will +// be extracted into destination. +func (tsz *TarSz) Extract(source, target, destination string) error { + tsz.wrapReader() + return tsz.Tar.Extract(source, target, destination) } -// Make creates a .tar.sz file at tarszPath containing -// the contents of files listed in filePaths. File paths -// can be those of regular files or directories. Regular -// files are stored at the 'root' of the archive, and -// directories are recursively added. -func (tarSzFormat) Make(tarszPath string, filePaths []string) error { - out, err := os.Create(tarszPath) - if err != nil { - return fmt.Errorf("error creating %s: %v", tarszPath, err) +func (tsz *TarSz) wrapWriter() { + var sw *snappy.Writer + tsz.Tar.writerWrapFn = func(w io.Writer) (io.Writer, error) { + sw = snappy.NewWriter(w) + return sw, nil + } + tsz.Tar.cleanupWrapFn = func() { + sw.Close() } - defer out.Close() - - return writeTarSz(filePaths, out, tarszPath) } -func writeTarSz(filePaths []string, output io.Writer, dest string) error { - szw := snappy.NewBufferedWriter(output) - defer szw.Close() - - return writeTar(filePaths, szw, dest) +func (tsz *TarSz) wrapReader() { + tsz.Tar.readerWrapFn = func(r io.Reader) (io.Reader, error) { + return snappy.NewReader(r), nil + } } -// Read untars a .tar.sz file read from a Reader and decompresses -// the contents into destination. -func (tarSzFormat) Read(input io.Reader, destination string) error { - szr := snappy.NewReader(input) +func (tsz *TarSz) String() string { return "tar.sz" } - return Tar.Read(szr, destination) -} - -// Open untars source and decompresses the contents into destination. -func (tarSzFormat) Open(source, destination string) error { - f, err := os.Open(source) - if err != nil { - return fmt.Errorf("%s: failed to open archive: %v", source, err) - } - defer f.Close() +// Compile-time checks to ensure type implements desired interfaces. +var ( + _ = Reader(new(TarSz)) + _ = Writer(new(TarSz)) + _ = Archiver(new(TarSz)) + _ = Unarchiver(new(TarSz)) + _ = Walker(new(TarSz)) + _ = Extractor(new(TarSz)) +) - return TarSz.Read(f, destination) +// DefaultTarSz is a convenient archiver ready to use. +var DefaultTarSz = &TarSz{ + Tar: DefaultTar, } diff --git a/tarxz.go b/tarxz.go index e222fb4a..c1d27ea9 100644 --- a/tarxz.go +++ b/tarxz.go @@ -3,103 +3,103 @@ package archiver import ( "fmt" "io" - "os" "strings" "github.com/ulikunitz/xz" + fastxz "github.com/xi2/xz" ) -// TarXZ is for TarXZ format -var TarXZ xzFormat - -func init() { - RegisterFormat("TarXZ", TarXZ) +// TarXz facilitates xz compression +// (https://tukaani.org/xz/format.html) +// of tarball archives. +type TarXz struct { + *Tar } -type xzFormat struct{} - -// Match returns whether filename matches this format. -func (xzFormat) Match(filename string) bool { - return strings.HasSuffix(strings.ToLower(filename), ".tar.xz") || - strings.HasSuffix(strings.ToLower(filename), ".txz") || - isTarXz(filename) -} - -// isTarXz checks the file has the xz compressed Tar format header by reading -// its beginning block. -func isTarXz(tarxzPath string) bool { - f, err := os.Open(tarxzPath) - if err != nil { - return false - } - defer f.Close() - - xzr, err := xz.NewReader(f) - if err != nil { - return false +// Archive creates a compressed tar file at destination +// containing the files listed in sources. The destination +// must end with ".tar.gz" or ".txz". File paths can be +// those of regular files or directories; directories will +// be recursively added. +func (txz *TarXz) Archive(sources []string, destination string) error { + if !strings.HasSuffix(destination, ".tar.xz") && + !strings.HasSuffix(destination, ".txz") { + return fmt.Errorf("output filename must have .tar.xz or .txz extension") } + txz.wrapWriter() + return txz.Tar.Archive(sources, destination) +} - buf := make([]byte, tarBlockSize) - n, err := xzr.Read(buf) - if err != nil || n < tarBlockSize { - return false - } +// Unarchive unpacks the compressed tarball at +// source to destination. Destination will be +// treated as a folder name. +func (txz *TarXz) Unarchive(source, destination string) error { + txz.wrapReader() + return txz.Tar.Unarchive(source, destination) +} - return hasTarHeader(buf) +// Walk calls walkFn for each visited item in archive. +func (txz *TarXz) Walk(archive string, walkFn WalkFunc) error { + txz.wrapReader() + return txz.Tar.Walk(archive, walkFn) } -// Write outputs a .tar.xz file to a Writer containing -// the contents of files listed in filePaths. File paths -// can be those of regular files or directories. Regular -// files are stored at the 'root' of the archive, and -// directories are recursively added. -func (xzFormat) Write(output io.Writer, filePaths []string) error { - return writeTarXZ(filePaths, output, "") +// Create opens txz for writing a compressed +// tar archive to out. +func (txz *TarXz) Create(out io.Writer) error { + txz.wrapWriter() + return txz.Create(out) } -// Make creates a .tar.xz file at xzPath containing -// the contents of files listed in filePaths. File -// paths can be those of regular files or directories. -// Regular files are stored at the 'root' of the -// archive, and directories are recursively added. -func (xzFormat) Make(xzPath string, filePaths []string) error { - out, err := os.Create(xzPath) - if err != nil { - return fmt.Errorf("error creating %s: %v", xzPath, err) - } - defer out.Close() +// Open opens t for reading a compressed archive from +// in. The size parameter is not used. +func (txz *TarXz) Open(in io.Reader, size int64) error { + txz.wrapReader() + return txz.Tar.Open(in, size) +} - return writeTarXZ(filePaths, out, xzPath) +// Extract extracts a single file from the tar archive. +// If the target is a directory, the entire folder will +// be extracted into destination. +func (txz *TarXz) Extract(source, target, destination string) error { + txz.wrapReader() + return txz.Tar.Extract(source, target, destination) } -func writeTarXZ(filePaths []string, output io.Writer, dest string) error { - xzw, err := xz.NewWriter(output) - if err != nil { - return fmt.Errorf("error compressing xz: %v", err) +func (txz *TarXz) wrapWriter() { + var xzw *xz.Writer + txz.Tar.writerWrapFn = func(w io.Writer) (io.Writer, error) { + var err error + xzw, err = xz.NewWriter(w) + return xzw, err + } + txz.Tar.cleanupWrapFn = func() { + xzw.Close() } - defer xzw.Close() - - return writeTar(filePaths, xzw, dest) } -// Read untars a .tar.xz file read from a Reader and decompresses -// the contents into destination. -func (xzFormat) Read(input io.Reader, destination string) error { - xzr, err := xz.NewReader(input) - if err != nil { - return fmt.Errorf("error decompressing xz: %v", err) +func (txz *TarXz) wrapReader() { + var xzr *fastxz.Reader + txz.Tar.readerWrapFn = func(r io.Reader) (io.Reader, error) { + var err error + xzr, err = fastxz.NewReader(r, 0) + return xzr, err } - - return Tar.Read(xzr, destination) } -// Open untars source and decompresses the contents into destination. -func (xzFormat) Open(source, destination string) error { - f, err := os.Open(source) - if err != nil { - return fmt.Errorf("%s: failed to open archive: %v", source, err) - } - defer f.Close() +func (txz *TarXz) String() string { return "tar.xz" } + +// Compile-time checks to ensure type implements desired interfaces. +var ( + _ = Reader(new(TarXz)) + _ = Writer(new(TarXz)) + _ = Archiver(new(TarXz)) + _ = Unarchiver(new(TarXz)) + _ = Walker(new(TarXz)) + _ = Extractor(new(TarXz)) +) - return TarXZ.Read(f, destination) +// DefaultTarXz is a convenient archiver ready to use. +var DefaultTarXz = &TarXz{ + Tar: DefaultTar, } diff --git a/xz.go b/xz.go new file mode 100644 index 00000000..f5f5b81e --- /dev/null +++ b/xz.go @@ -0,0 +1,50 @@ +package archiver + +import ( + "fmt" + "io" + "path/filepath" + + "github.com/ulikunitz/xz" + fastxz "github.com/xi2/xz" +) + +// Xz facilitates XZ compression. +type Xz struct{} + +// Compress reads in, compresses it, and writes it to out. +func (x *Xz) Compress(in io.Reader, out io.Writer) error { + w, err := xz.NewWriter(out) + if err != nil { + return err + } + defer w.Close() + _, err = io.Copy(w, in) + return err +} + +// Decompress reads in, decompresses it, and writes it to out. +func (x *Xz) Decompress(in io.Reader, out io.Writer) error { + r, err := fastxz.NewReader(in, 0) + if err != nil { + return err + } + _, err = io.Copy(out, r) + return err +} + +// CheckExt ensures the file extension matches the format. +func (x *Xz) CheckExt(filename string) error { + if filepath.Ext(filename) != ".xz" { + return fmt.Errorf("filename must have a .xz extension") + } + return nil +} + +func (x *Xz) String() string { return "xz" } + +// Compile-time checks to ensure type implements desired interfaces. +var ( + _ = Compressor(new(Xz)) + _ = Decompressor(new(Xz)) +) diff --git a/zip.go b/zip.go index 9d20bc1b..9828c630 100644 --- a/zip.go +++ b/zip.go @@ -1,238 +1,573 @@ -// Package archiver makes it super easy to create and open .zip, -// .tar.gz, and .tar.bz2 files. package archiver import ( "archive/zip" "bytes" + "compress/flate" "fmt" "io" - "io/ioutil" + "log" "os" "path" "path/filepath" "strings" ) -// Zip is for Zip format -var Zip zipFormat - -func init() { - RegisterFormat("Zip", Zip) +// Zip provides facilities for operating ZIP archives. +// See https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT. +type Zip struct { + // The compression level to use, as described + // in the compress/flate package. + CompressionLevel int + + // Whether to overwrite existing files; if false, + // an error is returned if the file exists. + OverwriteExisting bool + + // Whether to make all the directories necessary + // to create a zip archive in the desired path. + MkdirAll bool + + // If enabled, selective compression will only + // compress files which are not already in a + // compressed format; this is decided based + // simply on file extension. + SelectiveCompression bool + + // A single top-level folder can be implicitly + // created by the Archive or Unarchive methods + // if the files to be added to the archive + // or the files to be extracted from the archive + // do not all have a common root. This roughly + // mimics the behavior of archival tools integrated + // into OS file browsers which create a subfolder + // to avoid unexpectedly littering the destination + // folder with potentially many files, causing a + // problematic cleanup/organization situation. + // This feature is available for both creation + // and extraction of archives, but may be slightly + // inefficient with lots and lots of files, + // especially on extraction. + ImplicitTopLevelFolder bool + + // If true, errors encountered during reading + // or writing a single file will be logged and + // the operation will continue on remaining files. + ContinueOnError bool + + zw *zip.Writer + zr *zip.Reader + ridx int } -type zipFormat struct{} +// Archive creates a .zip file at destination containing +// the files listed in sources. The destination must end +// with ".zip". File paths can be those of regular files +// or directories. Regular files are stored at the 'root' +// of the archive, and directories are recursively added. +func (z *Zip) Archive(sources []string, destination string) error { + if !strings.HasSuffix(destination, ".zip") { + return fmt.Errorf("output filename must have .zip extension") + } + if !z.OverwriteExisting && fileExists(destination) { + return fmt.Errorf("file already exists: %s", destination) + } + + // make the folder to contain the resulting archive + // if it does not already exist + destDir := filepath.Dir(destination) + if z.MkdirAll && !fileExists(destDir) { + err := mkdir(destDir) + if err != nil { + return fmt.Errorf("making folder for destination: %v", err) + } + } + + out, err := os.Create(destination) + if err != nil { + return fmt.Errorf("creating %s: %v", destination, err) + } + defer out.Close() + + err = z.Create(out) + if err != nil { + return fmt.Errorf("creating zip: %v", err) + } + defer z.Close() + + var topLevelFolder string + if z.ImplicitTopLevelFolder && multipleTopLevels(sources) { + topLevelFolder = folderNameFromFileName(destination) + } + + for _, source := range sources { + err := z.writeWalk(source, topLevelFolder, destination) + if err != nil { + return fmt.Errorf("walking %s: %v", source, err) + } + } -func (zipFormat) Match(filename string) bool { - return strings.HasSuffix(strings.ToLower(filename), ".zip") || isZip(filename) + return nil } -// isZip checks the file has the Zip format signature by reading its beginning -// bytes and matching it against "PK\x03\x04" -func isZip(zipPath string) bool { - f, err := os.Open(zipPath) +// Unarchive unpacks the .zip file at source to destination. +// Destination will be treated as a folder name. +func (z *Zip) Unarchive(source, destination string) error { + if !fileExists(destination) && z.MkdirAll { + err := mkdir(destination) + if err != nil { + return fmt.Errorf("preparing destination: %v", err) + } + } + + file, err := os.Open(source) if err != nil { - return false + return fmt.Errorf("opening source file: %v", err) } - defer f.Close() + defer file.Close() - buf := make([]byte, 4) - if n, err := f.Read(buf); err != nil || n < 4 { - return false - } - - return bytes.Equal(buf, []byte("PK\x03\x04")) -} - -// Write outputs a .zip file to the given writer with -// the contents of files listed in filePaths. File paths -// can be those of regular files or directories. Regular -// files are stored at the 'root' of the archive, and -// directories are recursively added. -// -// Files with an extension for formats that are already -// compressed will be stored only, not compressed. -func (zipFormat) Write(output io.Writer, filePaths []string) error { - w := zip.NewWriter(output) - for _, fpath := range filePaths { - if err := zipFile(w, fpath); err != nil { - w.Close() - return err + fileInfo, err := file.Stat() + if err != nil { + return fmt.Errorf("statting source file: %v", err) + } + + err = z.Open(file, fileInfo.Size()) + if err != nil { + return fmt.Errorf("opening zip archive for reading: %v", err) + } + defer z.Close() + + // if the files in the archive do not all share a common + // root, then make sure we extract to a single subfolder + // rather than potentially littering the destination... + if z.ImplicitTopLevelFolder { + files := make([]string, len(z.zr.File)) + for i := range z.zr.File { + files[i] = z.zr.File[i].Name + } + if multipleTopLevels(files) { + destination = filepath.Join(destination, folderNameFromFileName(source)) } } - return w.Close() + for { + err := z.extractNext(destination) + if err == io.EOF { + break + } + if err != nil { + if z.ContinueOnError { + log.Printf("[ERROR] Reading file in zip archive: %v", err) + continue + } + return fmt.Errorf("reading file in zip archive: %v", err) + } + } + + return nil } -// Make creates a .zip file in the location zipPath containing -// the contents of files listed in filePaths. File paths -// can be those of regular files or directories. Regular -// files are stored at the 'root' of the archive, and -// directories are recursively added. -// -// Files with an extension for formats that are already -// compressed will be stored only, not compressed. -func (zipFormat) Make(zipPath string, filePaths []string) error { - out, err := os.Create(zipPath) +func (z *Zip) extractNext(to string) error { + f, err := z.Read() if err != nil { - return fmt.Errorf("error creating %s: %v", zipPath, err) + return err // don't wrap error; calling loop must break on io.EOF + } + defer f.Close() + header, ok := f.Header.(zip.FileHeader) + if !ok { + return fmt.Errorf("expected header to be zip.FileHeader but was %T", f.Header) + } + return z.extractFile(f, filepath.Join(to, header.Name)) +} + +func (z *Zip) extractFile(f File, to string) error { + // if a directory, no content; simply make the directory and return + if f.IsDir() { + return mkdir(to) + } + + // do not overwrite existing files, if configured + if !z.OverwriteExisting && fileExists(to) { + return fmt.Errorf("file already exists: %s", to) } - defer out.Close() - return Zip.Write(out, filePaths) + return writeNewFile(to, f, f.Mode()) } -func zipFile(w *zip.Writer, source string) error { - sourceInfo, err := os.Stat(source) +func (z *Zip) writeWalk(source, topLevelFolder, destination string) error { + sourceAbs, err := filepath.Abs(source) + if err != nil { + return fmt.Errorf("getting absolute path: %v", err) + } + sourceInfo, err := os.Stat(sourceAbs) if err != nil { return fmt.Errorf("%s: stat: %v", source, err) } + destAbs, err := filepath.Abs(destination) + if err != nil { + return fmt.Errorf("%s: getting absolute path of destination %s: %v", source, destination, err) + } var baseDir string + if topLevelFolder != "" { + baseDir = topLevelFolder + } if sourceInfo.IsDir() { - baseDir = filepath.Base(source) + baseDir = path.Join(baseDir, sourceInfo.Name()) } return filepath.Walk(source, func(fpath string, info os.FileInfo, err error) error { - if err != nil { - return fmt.Errorf("walking to %s: %v", fpath, err) + handleErr := func(err error) error { + if z.ContinueOnError { + log.Printf("[ERROR] Walking %s: %v", fpath, err) + return nil + } + return err } - - header, err := zip.FileInfoHeader(info) if err != nil { - return fmt.Errorf("%s: getting header: %v", fpath, err) + return handleErr(fmt.Errorf("traversing %s: %v", fpath, err)) } - - if baseDir != "" { - name, err := filepath.Rel(source, fpath) - if err != nil { - return err - } - header.Name = path.Join(baseDir, filepath.ToSlash(name)) - } - - if info.IsDir() { - header.Name += "/" - header.Method = zip.Store - } else { - ext := strings.ToLower(path.Ext(header.Name)) - if _, ok := compressedFormats[ext]; ok { - header.Method = zip.Store - } else { - header.Method = zip.Deflate - } + if info == nil { + return handleErr(fmt.Errorf("%s: no file info", fpath)) } - writer, err := w.CreateHeader(header) + // make sure we do not copy the output file into the output + // file; that results in an infinite loop and disk exhaustion! + fpathAbs, err := filepath.Abs(fpath) if err != nil { - return fmt.Errorf("%s: making header: %v", fpath, err) + return handleErr(fmt.Errorf("%s: getting absolute path: %v", fpath, err)) } - - if info.IsDir() { + if within(fpathAbs, destAbs) { return nil } - if header.Mode().IsRegular() { - file, err := os.Open(fpath) - if err != nil { - return fmt.Errorf("%s: opening: %v", fpath, err) - } - defer file.Close() + // build the name to be used within the archive + name, err := filepath.Rel(source, fpath) + if err != nil { + return handleErr(err) + } + nameInArchive := path.Join(baseDir, filepath.ToSlash(name)) - _, err = io.CopyN(writer, file, info.Size()) - if err != nil && err != io.EOF { - return fmt.Errorf("%s: copying contents: %v", fpath, err) - } + file, err := os.Open(fpath) + if err != nil { + return handleErr(fmt.Errorf("%s: opening: %v", fpath, err)) + } + defer file.Close() + + err = z.Write(File{ + FileInfo: FileInfo{ + FileInfo: info, + CustomName: nameInArchive, + }, + ReadCloser: file, + }) + if err != nil { + return handleErr(fmt.Errorf("%s: writing: %s", fpath, err)) } return nil }) } -// Read unzips the .zip file read from the input Reader into destination. -func (zipFormat) Read(input io.Reader, destination string) error { - buf, err := ioutil.ReadAll(input) - if err != nil { - return err +// Create opens z for writing a ZIP archive to out. +func (z *Zip) Create(out io.Writer) error { + if z.zw != nil { + return fmt.Errorf("zip archive is already created for writing") + } + z.zw = zip.NewWriter(out) + if z.CompressionLevel != flate.DefaultCompression { + z.zw.RegisterCompressor(zip.Deflate, func(out io.Writer) (io.WriteCloser, error) { + return flate.NewWriter(out, z.CompressionLevel) + }) + } + return nil +} + +// Write writes f to z, which must have been opened for writing first. +func (z *Zip) Write(f File) error { + if z.zw == nil { + return fmt.Errorf("zip archive was not created for writing first") + } + if f.FileInfo == nil { + return fmt.Errorf("no file info") + } + if f.FileInfo.Name() == "" { + return fmt.Errorf("missing file name") + } + if f.ReadCloser == nil { + return fmt.Errorf("%s: no way to read file contents", f.Name()) } - rdr := bytes.NewReader(buf) - r, err := zip.NewReader(rdr, rdr.Size()) + header, err := zip.FileInfoHeader(f) if err != nil { - return err + return fmt.Errorf("%s: getting header: %v", f.Name(), err) } - return unzipAll(r, destination) -} + if f.IsDir() { + header.Name += "/" // required - strangely no mention of this in zip spec? but is in godoc... + header.Method = zip.Store + } else { + ext := strings.ToLower(path.Ext(header.Name)) + if _, ok := compressedFormats[ext]; ok && z.SelectiveCompression { + header.Method = zip.Store + } else { + header.Method = zip.Deflate + } + } -// Open unzips the .zip file at source into destination. -func (zipFormat) Open(source, destination string) error { - r, err := zip.OpenReader(source) + writer, err := z.zw.CreateHeader(header) if err != nil { - return err + return fmt.Errorf("%s: making header: %v", f.Name(), err) } - defer r.Close() - return unzipAll(&r.Reader, destination) -} + if f.IsDir() { + return nil + } -func unzipAll(r *zip.Reader, destination string) error { - for _, zf := range r.File { - if err := unzipFile(zf, destination); err != nil { - return err + if header.Mode().IsRegular() { + _, err := io.Copy(writer, f) + if err != nil { + return fmt.Errorf("%s: copying contents: %v", f.Name(), err) } } return nil } -func unzipFile(zf *zip.File, destination string) error { - err := sanitizeExtractPath(zf.Name, destination) +// Open opens z for reading an archive from in, +// which is expected to have the given size and +// which must be an io.ReaderAt. +func (z *Zip) Open(in io.Reader, size int64) error { + inRdrAt, ok := in.(io.ReaderAt) + if !ok { + return fmt.Errorf("reader must be io.ReaderAt") + } + if z.zr != nil { + return fmt.Errorf("zip archive is already open for reading") + } + var err error + z.zr, err = zip.NewReader(inRdrAt, size) if err != nil { - return err + return fmt.Errorf("creating reader: %v", err) } + z.ridx = 0 + return nil +} + +// Read reads the next file from z, which must have +// already been opened for reading. If there are no +// more files, the error is io.EOF. The File must +// be closed when finished reading from it. +func (z *Zip) Read() (File, error) { + if z.zr == nil { + return File{}, fmt.Errorf("zip archive is not open") + } + if z.ridx >= len(z.zr.File) { + return File{}, io.EOF + } + + // access the file and increment counter so that + // if there is an error processing this file, the + // caller can still iterate to the next file + zf := z.zr.File[z.ridx] + z.ridx++ - if strings.HasSuffix(zf.Name, "/") { - return mkdir(filepath.Join(destination, zf.Name)) + file := File{ + FileInfo: zf.FileInfo(), + Header: zf.FileHeader, } rc, err := zf.Open() if err != nil { - return fmt.Errorf("%s: open compressed file: %v", zf.Name, err) + return file, fmt.Errorf("%s: open compressed file: %v", zf.Name, err) } - defer rc.Close() + file.ReadCloser = rc - return writeNewFile(filepath.Join(destination, zf.Name), rc, zf.FileInfo().Mode()) + return file, nil +} + +// Close closes the zip archive(s) opened by Create and Open. +func (z *Zip) Close() error { + if z.zr != nil { + z.zr = nil + } + if z.zw != nil { + zw := z.zw + z.zw = nil + return zw.Close() + } + return nil } +// Walk calls walkFn for each visited item in archive. +func (z *Zip) Walk(archive string, walkFn WalkFunc) error { + zr, err := zip.OpenReader(archive) + if err != nil { + return fmt.Errorf("opening zip reader: %v", err) + } + defer zr.Close() + + for _, zf := range zr.File { + zfrc, err := zf.Open() + if err != nil { + zfrc.Close() + if z.ContinueOnError { + log.Printf("[ERROR] Opening %s: %v", zf.Name, err) + continue + } + return fmt.Errorf("opening %s: %v", zf.Name, err) + } + + err = walkFn(File{ + FileInfo: zf.FileInfo(), + Header: zf.FileHeader, + ReadCloser: zfrc, + }) + zfrc.Close() + if err != nil { + if err == ErrStopWalk { + break + } + if z.ContinueOnError { + log.Printf("[ERROR] Walking %s: %v", zf.Name, err) + continue + } + return fmt.Errorf("walking %s: %v", zf.Name, err) + } + } + + return nil +} + +// Extract extracts a single file from the zip archive. +// If the target is a directory, the entire folder will +// be extracted into destination. +func (z *Zip) Extract(source, target, destination string) error { + // target refers to a path inside the archive, which should be clean also + target = path.Clean(target) + + // if the target ends up being a directory, then + // we will continue walking and extracting files + // until we are no longer within that directory + var targetDirPath string + + return z.Walk(source, func(f File) error { + zfh, ok := f.Header.(zip.FileHeader) + if !ok { + return fmt.Errorf("expected header to be zip.FileHeader but was %T", f.Header) + } + + // importantly, cleaning the path strips tailing slash, + // which must be appended to folders within the archive + name := path.Clean(zfh.Name) + if f.IsDir() && target == name { + targetDirPath = path.Dir(name) + } + + if within(target, zfh.Name) { + // either this is the exact file we want, or is + // in the directory we want to extract + + // build the filename we will extract to + end, err := filepath.Rel(targetDirPath, zfh.Name) + if err != nil { + return fmt.Errorf("relativizing paths: %v", err) + } + joined := filepath.Join(destination, end) + + err = z.extractFile(f, joined) + if err != nil { + return fmt.Errorf("extracting file %s: %v", zfh.Name, err) + } + + // if our target was not a directory, stop walk + if targetDirPath == "" { + return ErrStopWalk + } + } else if targetDirPath != "" { + // finished walking the entire directory + return ErrStopWalk + } + + return nil + }) +} + +// Match returns true if the format of file matches this +// type's format. It should not affect reader position. +func (*Zip) Match(file *os.File) (bool, error) { + currentPos, err := file.Seek(0, io.SeekCurrent) + if err != nil { + return false, err + } + _, err = file.Seek(0, 0) + if err != nil { + return false, err + } + defer file.Seek(currentPos, io.SeekStart) + + buf := make([]byte, 4) + if n, err := file.Read(buf); err != nil || n < 4 { + return false, nil + } + return bytes.Equal(buf, []byte("PK\x03\x04")), nil +} + +func (z *Zip) String() string { return "zip" } + +// Compile-time checks to ensure type implements desired interfaces. +var ( + _ = Reader(new(Zip)) + _ = Writer(new(Zip)) + _ = Archiver(new(Zip)) + _ = Unarchiver(new(Zip)) + _ = Walker(new(Zip)) + _ = Extractor(new(Zip)) + _ = Matcher(new(Zip)) +) + // compressedFormats is a (non-exhaustive) set of lowercased // file extensions for formats that are typically already -// compressed. Compressing already-compressed files often -// results in a larger file, so when possible, we check this -// set to avoid that. +// compressed. Compressing files that are already compressed +// is inefficient, so use this set of extension to avoid that. var compressedFormats = map[string]struct{}{ ".7z": {}, ".avi": {}, + ".br": {}, ".bz2": {}, ".cab": {}, + ".docx": {}, ".gif": {}, ".gz": {}, ".jar": {}, ".jpeg": {}, ".jpg": {}, ".lz": {}, + ".lz4": {}, ".lzma": {}, + ".m4v": {}, ".mov": {}, ".mp3": {}, ".mp4": {}, ".mpeg": {}, ".mpg": {}, ".png": {}, + ".pptx": {}, ".rar": {}, + ".sz": {}, ".tbz2": {}, ".tgz": {}, + ".tsz": {}, ".txz": {}, + ".xlsx": {}, ".xz": {}, ".zip": {}, ".zipx": {}, } + +// DefaultZip is a convenient archiver ready to use. +var DefaultZip = &Zip{ + CompressionLevel: flate.DefaultCompression, + MkdirAll: true, + SelectiveCompression: true, +}