feat: handle hardlinks symmetrically #19

zhijie-yang · 2024-09-05T10:52:49Z

Have you signed the CLA?

rebornplusplus

Nice changes, thank you! Left a few comments below.

rebornplusplus · 2024-09-12T14:17:08Z

internal/deb/extract.go

+// The identifier for the hardlink is unique for each unique base file,
+// which counts from 1. The 0 value is reserved for files that are not hard links.


We might need to explain the "base file" since that's something we came up with, or try to think of a better name.

If I understand it correctly, it is probably "base file" == "inode".

Yes, this means "The identifier for the hard link should be unique for all the file entries in the same hard link group"

rebornplusplus · 2024-09-12T14:17:56Z

internal/deb/extract.go

+}
+
+type tarMetadata struct {
+	HardLinkRevMap map[string]hardLinkRevMapEntry


Maybe a comment about this map would be helpful.

rebornplusplus · 2024-09-12T14:19:08Z

internal/deb/extract.go

+// getDataReader returns a ReadCloser for the data payload of the package.
+// Calling the Close method must happen outside of this function to prevent
+// premature closing of the underlying package reader.
+// The xz.Reader is wrapper with io.NopCloser since it does not implement the Close method.


Suggested change

// The xz.Reader is wrapper with io.NopCloser since it does not implement the Close method.

// The xz.Reader is wrapped with io.NopCloser since it does not implement the Close method.

nitpick

rebornplusplus · 2024-09-12T14:21:15Z

internal/deb/extract.go

+	if err != nil {
+		return err
+	}
+	// Fist pass over the tarball to read the header of the tarball


Suggested change

// Fist pass over the tarball to read the header of the tarball

// First pass over the tarball to read the header of the tarball

nitpick

rebornplusplus · 2024-09-12T14:22:45Z

internal/deb/extract.go

@@ -153,6 +197,9 @@ func extractData(dataReader io.Reader, options *ExtractOptions) error {
 			return err
 		}

+		// targetDir is the directory where the file is extracted.
+		// It is either the options.TargetDir or the stagingDir.


Suggested change

// It is either the options.TargetDir or the stagingDir.

// It is either the options.TargetDir or the stagingDir depending on the file.

rebornplusplus · 2024-09-12T14:27:20Z

internal/deb/extract.go

-			// Nothing to do.
-			continue
+			// Extract the hard link base file to the staging directory, when
+			// 1. it is not part of the target paths (len(targetPaths) == 0)


I would probably put it like "it's not listed in the slice definition file".

rebornplusplus · 2024-09-12T14:30:52Z

internal/deb/extract.go

+			// 2. it is required by other hard links (exists as a key in the HardLinkRevMap)
+			// In case that [len(targetPaths) > 0], the hard link base file is extracted normally.
+			// Note that the hard link base file can also be a symlink.
+			// tarHeader.Name is used here since the paths in the HardLinkRevMap are relative.


If it's possible and if it makes sense, let's try to use absolute paths everywhere to avoid confusion.

rebornplusplus · 2024-09-12T14:39:30Z

internal/deb/extract.go

+func newTarMetadata() tarMetadata {
+	return tarMetadata{
+		HardLinkRevMap: make(map[string]hardLinkRevMapEntry),
+	}
+}


You probably don't need this function since we are using it once and it's quite minimal.

rebornplusplus · 2024-09-12T14:42:24Z

internal/slicer/slicer.go

+		if err != nil {
+			return nil, err
+		}


Should this be left out? Not sure which error you are checking.

rebornplusplus · 2024-09-12T14:44:45Z

internal/slicer/slicer_test.go

+		"/hardlink": "hardlink 1 {test-package_myslice}",
+	},
+}, {
+	summary: "Symlink is a valid hard link base file",


Suggested change

summary: "Symlink is a valid hard link base file",

summary: "Hard links can point to symlinks",

letFunny

Left a few comments about the general approach, once we discuss those we can discuss the details. Thanks!

letFunny · 2024-09-17T15:37:39Z

internal/deb/extract.go

+// The identifier for the hardlink is unique for each unique base file,
+// which counts from 1. The 0 value is reserved for files that are not hard links.


If I understand it correctly, it is probably "base file" == "inode".

letFunny · 2024-09-17T15:41:23Z

internal/deb/extract.go

+	tarReader := tar.NewReader(dataReader)
+	tarMetadata, err := readTarMetadata(tarReader)


This is always going to read the file twice even if there are no hardlinks. I think a better design is to do the extraction for all files in the first loop and keep track of the hardlinks that we cannot extract then, only if necessary, do we loop over the tarball again.

This is not regarding "Whether or not we can extract the hardlink", but "How can we have all the files extracted from a hardlink group to be represented symmetrically in the report". Think about a counter-example: When we meet the "regular file" (which is the target of other hardlinks) in the first pass reading the tarball, we don't know if it is associated with a hardlink group. Then in the entry returned by fsutil.Create we can only report it as a regular file. Now as we proceed the rest part of the tarball, we meet the hardlinks associated with this "regular file", we are not able to update the hardLinkId field in the entry for the "regular file", as the fsutil.Create function has already left the scope processing the "regular file".

letFunny · 2024-09-17T15:44:07Z

internal/deb/extract.go

+// getDataReader returns a ReadCloser for the data payload of the package.
+// Calling the Close method must happen outside of this function to prevent
+// premature closing of the underlying package reader.
+// The xz.Reader is wrapper with io.NopCloser since it does not implement the Close method.


In my opinion all of the comment here is unnecessary because it is a mix of Go idiosyncrasies and implementation details. The comment on the function should not state things like "xz.Reader is wrapper with io.NopCloser...", that is something internal that the user does not care about. And the bit about "Calling the Close method must happen outside of this function..." is also something that is taken for granted in Go.

letFunny · 2024-09-17T15:50:00Z

internal/deb/extract.go

+func extractData(pkgReader io.ReadSeeker, options *ExtractOptions) error {
+	// stagingDir is the directory where the hard link base file, which is not
+	// listed in the pendingPaths, to be extracted.
+	stagingDir, err := os.MkdirTemp("", "chisel-staging-")


Why do we need a staging dir in the first place? I think it should not be needed at all. Correct me if I am wrong but:

First iteration over the tarball we extract all entries that are not hardlinks (and possibly hardlinks depending on whether the content entry was extracted).

After the first iteration we have a list of hardlinks and the path they should point to according to the tar, that is a map from target -> list of hardlinks.

On the second iteration we find target on the tarball. We extract target to list of hardlinks [0]. Then we iterate over the remaining elements and create the links from hardlinks[0] to hardlinks[1..].

As I said in 1. there is also the optimization that target might be extracted so there is no need for a second iteration. The downside is that we need to keep track of the names of the extracted files to know if target was extracted and we already do that in the report, and exposing a good API here might be a challenge, we have to think about that.

In fact, because this is a "niche" use case and we do not anticipate it happening a lot, and because the optimization mentioned in 1. is even more niche, I would say let's not worry about it unless it is easy to implement.

zhijie-yang added 4 commits September 5, 2024 12:51

feat: handle hardlinks symmetrically

488b6ad

feat: remove unused hardLinks from tarMetadata and add test

fa7c62a

chore: simplify ExtractOptions

9b8415e

feat: allow symlink to be hardlink targets

9c84aaf

rebornplusplus suggested changes Sep 12, 2024

View reviewed changes

letFunny requested changes Sep 17, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: handle hardlinks symmetrically #19

feat: handle hardlinks symmetrically #19

zhijie-yang commented Sep 5, 2024 •

edited

Loading

rebornplusplus left a comment

rebornplusplus Sep 12, 2024

letFunny Sep 17, 2024

zhijie-yang Sep 19, 2024

rebornplusplus Sep 12, 2024

rebornplusplus Sep 12, 2024

rebornplusplus Sep 12, 2024

rebornplusplus Sep 12, 2024

rebornplusplus Sep 12, 2024

rebornplusplus Sep 12, 2024

rebornplusplus Sep 12, 2024

rebornplusplus Sep 12, 2024

rebornplusplus Sep 12, 2024

letFunny left a comment

letFunny Sep 17, 2024

letFunny Sep 17, 2024

zhijie-yang Sep 19, 2024

letFunny Sep 17, 2024

letFunny Sep 17, 2024

		// The identifier for the hardlink is unique for each unique base file,
		// which counts from 1. The 0 value is reserved for files that are not hard links.

	// The xz.Reader is wrapper with io.NopCloser since it does not implement the Close method.
	// The xz.Reader is wrapped with io.NopCloser since it does not implement the Close method.

	// Fist pass over the tarball to read the header of the tarball
	// First pass over the tarball to read the header of the tarball

	// It is either the options.TargetDir or the stagingDir.
	// It is either the options.TargetDir or the stagingDir depending on the file.

	summary: "Symlink is a valid hard link base file",
	summary: "Hard links can point to symlinks",

		tarReader := tar.NewReader(dataReader)
		tarMetadata, err := readTarMetadata(tarReader)

feat: handle hardlinks symmetrically #19

Are you sure you want to change the base?

feat: handle hardlinks symmetrically #19

Conversation

zhijie-yang commented Sep 5, 2024 • edited Loading

rebornplusplus left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

letFunny left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhijie-yang commented Sep 5, 2024 •

edited

Loading