Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multiwriteringester support #214

Merged
merged 1 commit into from
Feb 4, 2021
Merged

Conversation

deitch
Copy link
Contributor

@deitch deitch commented Jan 28, 2021

This is a first cut attempt at this. It needs proper tests, which I am willing to add once we have agreement on the idea.

@jdolitsky this is what we discussed over Slack.

Problem

The DecompressStore has support for determining that a particular blob is tarred and then untarring it. This is great, but it assumes that the tar contains only one file. It even ignores the filename:

			_, err := tr.Next()
			if err == io.EOF {
				// clear the error, since we do not pass an io.EOF
				err = nil
				break // End of archive
			}

One could argue about whether we should or should not support tar with multiple files in it, but as long as tar supports it, people will use it, break this, and we will have issues.

The heart of the issue is that the standards assume one writer per descriptor. But if a descriptor references a tar file (or any other archiving format, but really only tar is used), then it might need multiple writers.

Solution

This PR is a proposed solution. It creates two kinds of "untar writer" - one that has a single writer passed through, and another that has a "multi-writer" passed through. The multi-writer creates multiple writers, one for each filename, and untarwriter picks which one based on the filename.

It continues to have a regular Writer, so that non-tar layers can be handled.

The usage would be something like:

        multiStore := NewMultiStore(rootPath) // some store that can handle different filenames
        decompressStore := store.NewDecompressStore(multiStore, WithBlocksize(blocksize), WithMultiWriterIngester())

where multiStore implements MultiWriterIngester:

type MultiWriterIngester interface {
	ctrcontent.Ingester
	Writers(ctx context.Context, opts ...ctrcontent.WriterOpt) (map[string]ctrcontent.Writer, error)
}

Calling .Writers() returns a map of filename to content.Writer, which is passed to UntarWriter, which uses the appropriate one.

Alternate

I did consider an alternate approach, one where, instead of returning map[string]Writer, it returned a func that can be called to get the writer:

	Writers(ctx context.Context, opts ...ctrcontent.WriterOpt) (func (filename string) ctrcontent.Writer, error)

But I honestly couldn't see how the complexity added value.

Looking forward to feedback.

Signed-off-by: Avi Deitcher <[email protected]>
Copy link
Contributor

@jdolitsky jdolitsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants