Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add symlink files to the trusted files list #120

Merged
merged 1 commit into from
Feb 20, 2023

Conversation

jkr0103
Copy link
Contributor

@jkr0103 jkr0103 commented Jan 20, 2023

GSC doesn't add any symlink file to trusted list. In all the ubuntu:20.04 based docker images, /bin is symlink to /usr/bin and because of this GSC doesn't add any /bin file to trusted files.

Any bash script executed at run time having shebang #!/bin/bash is not able to access /bin/bash as it's not part of trusted files list, reports error: /bin/bash: bad interpreter: Permission denied.

As part of the fix, GSC will add all symlink files.

Fixes #116, fixes #25

I tested Curated-Apps (Redis, PyTorch, Sklearn and Tensorflow-serving) with this PR using gramine v1.3.1 and v1.4.

All the Curated-Apps on gramine v1.4 works without requiring any change in enclave_size and pal_internal_memory.

On gramine 1.3.1, Pytorch needed extra 64M pal_internal_memory in release mode and it did not work in debug mode even after increasing pal_internal_memory upto 2G. This seems to an issue with gramine 1.3.1 which doesn't show up in gramine latest master. Please refer below table for more details.

Workload Build Trusted File, no symlinks 1.3.1/1.4 Trusted File with symlinks 1.3.1/1.4 % increment 1.3.1/1.4 pal_internal_memory in gramine 1.3.1 / 1.4
Redis release 19115 / 19194 20950 / 21029 10% / 10% no change / no change
debug 92587 / 104188 94430 / 106031 2% / 2% no change / no change
PyTorch release 99330 / 99409 165798 / 165877 67% / 67% +64M / no change
debug 171027 / 182628 237504 / 249105 39% / 36% gramine hang / no change
Scikit-learn release 41061 / 41140 43188 / 43267 5% / 5% no change / no change
debug 115052 / 126653 115052 / 128788 2% / 2% no change / no change
TF-serving release 20248 / 20327 23076 / 23155 23% / 23% no change / no change

How to test this PR?


This change is Reviewable

@jkr0103 jkr0103 changed the title Add symlink files to the trusted files Add symlink files to the trusted files list Jan 20, 2023
@jkr0103 jkr0103 force-pushed the jkr0103/bin_bash_symlink branch from 1c68bcf to 2bbc3b4 Compare January 20, 2023 11:40
Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r1, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @jkr0103)


-- commits line 3 at r1:
I would also add an explanation as the commit body message:

Since `os.walk(..., followlinks=True)` can lead to infinite recursion
due to symlinks pointing to each other, we need to keep track of
already-visited directories via the `dirs` set.

finalize_manifest.py line 64 at r1 (raw file):

                continue
            st = os.stat(dir)
            dirkey = st.st_dev, st.st_ino, dir

I know we discussed this offline, but I still don't understand why we need to use dev and ino as additional identifiers of "is it the path we already traversed?"...

What happens if we remove st_dev and st_ino? Wouldn't just dir (full path name) be enough?

Then the code will also look simpler and nicer:

        ...
        for dirname in dirnames:
            dir = os.path.join(dirpath, dirname)
            if exclude_re.match(dir.decode('UTF-8')):
                # exclude special paths from list of trusted files
                continue
            if dir not in dirs:
                dirs.add(dir)
                scandirs.append(dirname)
        dirnames[:] = scandirs

@jkr0103 jkr0103 force-pushed the jkr0103/bin_bash_symlink branch from 2bbc3b4 to 199228f Compare January 23, 2023 14:15
Copy link
Contributor Author

@jkr0103 jkr0103 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 files reviewed, 2 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @dimakuv)


-- commits line 3 at r1:

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

I would also add an explanation as the commit body message:

Since `os.walk(..., followlinks=True)` can lead to infinite recursion
due to symlinks pointing to each other, we need to keep track of
already-visited directories via the `dirs` set.

Done.


finalize_manifest.py line 64 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

I know we discussed this offline, but I still don't understand why we need to use dev and ino as additional identifiers of "is it the path we already traversed?"...

What happens if we remove st_dev and st_ino? Wouldn't just dir (full path name) be enough?

Then the code will also look simpler and nicer:

        ...
        for dirname in dirnames:
            dir = os.path.join(dirpath, dirname)
            if exclude_re.match(dir.decode('UTF-8')):
                # exclude special paths from list of trusted files
                continue
            if dir not in dirs:
                dirs.add(dir)
                scandirs.append(dirname)
        dirnames[:] = scandirs

Done.

Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r2, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @jkr0103)

a discussion (no related file):
@jkr0103 I know you experimented with this PR a bit on different workloads. Could you share your results? Like, how many more files are added to sgx.trusted_files with this PR, and whether it requires greater enclave_size in Gramine manifest?


Copy link
Contributor Author

@jkr0103 jkr0103 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 1 unresolved discussion, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @dimakuv)

a discussion (no related file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

@jkr0103 I know you experimented with this PR a bit on different workloads. Could you share your results? Like, how many more files are added to sgx.trusted_files with this PR, and whether it requires greater enclave_size in Gramine manifest?

I have attached the report in PR description.


Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, all discussions resolved, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners

a discussion (no related file):

Previously, jkr0103 (Jitender Kumar) wrote…

I have attached the report in PR description.

Thanks, looks good to me.


Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r2.
Reviewable status: all files reviewed, 3 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @jkr0103)

a discussion (no related file):
@jkr0103: Please don't paste screenshots / attach excel files if you just want to include a simple table. Use Markdown for this.


a discussion (no related file):
Do I understand correctly, that this PR fixes a few workloads, but breaks PyTorch? If that's right, then please debug and fix it.



finalize_manifest.py line 63 at r2 (raw file):

                # exclude special paths from list of trusted files
                continue
            if dir not in dirs:

@woju: Could you verify if this logic is correct for traversing directory tree with symlinks?


finalize_manifest.py line 66 at r2 (raw file):

                dirs.add(dir)
                scandirs.append(dirname)
        dirnames[:] = scandirs

How does this work?

Code quote:

dirnames[:] = scandirs

Copy link
Contributor Author

@jkr0103 jkr0103 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 3 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @mkow)

a discussion (no related file):

Previously, mkow (Michał Kowalczyk) wrote…

@jkr0103: Please don't paste screenshots / attach excel files if you just want to include a simple table. Use Markdown for this.

done


a discussion (no related file):

If that's right, then please debug and fix it.
Pytorch workload, only in debug breaks on gramine 1.3.1 but works on gramine master master. Do we really need to root cause that in gramine 1.3.1 which is not seen in gramine latest master, may be as we don't have to add pal_internal_memory as it's depricated. Hence I don't see the issue is blocker here.



finalize_manifest.py line 66 at r2 (raw file):
it just replaces the list with new one, example below:

>>> p = ['P','y','t','h','o','n']
>>> p
['P', 'y', 't', 'h', 'o', 'n']
>>> p[:] = [1, 2, 3]
>>> p
[1, 2, 3]

Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 3 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @jkr0103 and @mkow)

a discussion (no related file):

Previously, jkr0103 (Jitender Kumar) wrote…

If that's right, then please debug and fix it.
Pytorch workload, only in debug breaks on gramine 1.3.1 but works on gramine master master. Do we really need to root cause that in gramine 1.3.1 which is not seen in gramine latest master, may be as we don't have to add pal_internal_memory as it's depricated. Hence I don't see the issue is blocker here.

@mkow Yes, there is indeed some hang in PyTorch in the debug config and with this PR on Gramine v1.3.1.

Why this happens? This PR increases the number of sgx.trusted_files, and thus increases the TOML representation in enclave memory. Something deep inside Gramine's memory allocation breaks (the old-style one, from the time when PAL memory alloc was separate from LibOS memory alloc). We didn't root cause the exact reason, because this problem disappeared in the latest master branch.

If you insist, we can debug this issue a bit more, to have a more concrete understanding of what exactly went wrong in the old memory allocation logic.

I felt that it was just wasting time, because the only thing this PR changes is the number of sgx.trusted_files in the final manifest.sgx file.



finalize_manifest.py line 66 at r2 (raw file):
@mkow This is slice assignment. See https://stackoverflow.com/questions/10623302/how-does-assignment-work-with-list-slices

Basically, this modified the list dirnames in place, instead of creating a new list.

Why we do it instead of dirnames = scandirs? Because we want to change dirnames in os.walk(). Please read this: https://docs.python.org/3/library/os.html#os.walk

An important exerpt:

When topdown is True, the caller can modify the dirnames list in-place (perhaps using del or slice assignment), and walk() will only recurse into the subdirectories whose names remain in dirnames; this can be used to prune the search, impose a specific order of visiting, or even to inform walk() about directories the caller creates or renames before it resumes walk() again.

Sometimes Python is really unintuitive...

Copy link
Member

@woju woju left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 4 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @jkr0103 and @mkow)


finalize_manifest.py line 59 at r2 (raw file):

        scandirs = []
        for dirname in dirnames:
            dir = os.path.join(dirpath, dirname)

dir() is a builtin (even reviewable highlighted it), please rename this variable


finalize_manifest.py line 63 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

@woju: Could you verify if this logic is correct for traversing directory tree with symlinks?

Depending on what you're attempting to do? Do you want to deduplicate them or copy? os.walk is naive, will traverse a directory twice if there are two links pointing to it (or one subdirectory and a symlink pointing to it). dirs = set() and this if will not prevent it, because the path is what os.walk went through, not the realpath. This if is noop (always true) if that's what you're asking, and the logic of replacing dirnames looks pointless.

But it goes worse, os.walk will happily traverse link loops and will turn around the DFS when one scandir throws OSError for ELOOP (40), in which case os.walk will catch this exception and silently (!) ignore it and just won't follow further.

Try it yourself:

mkdir -p a/b
ln -s ../.. a/b/c
python3 -c "for d, _, _ in __import__('os').walk('a', followlinks=True): print(d)"

In the case from PR description (/bin/ in usrmerged system) it will probably have 2 copies of trusted files (same hashes for 2 different paths), but it's very hard to say if there aren't any unintended consequences in other FS tree layouts. In absolute sense, this is not a "correct logic" from POSIX pov, but I don't know enough about handling symlinks in Gramine and GSC to say if this should be rejected altogether. This if is certainly to be either removed, or corrected with os.path.realpath() or pathlib.Path.resolve().

Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 4 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @jkr0103 and @mkow)


finalize_manifest.py line 63 at r2 (raw file):

In the case from PR description (/bin/ in usrmerged system) it will probably have 2 copies of trusted files (same hashes for 2 different paths)

@woju This is exactly what we want to achieve.

it's very hard to say if there aren't any unintended consequences in other FS tree layouts

I'm unsure what you mean by this? From the Gramine perspective, whatever happens on the host is irrelevant, as long as Gramine can access the host file using the path.

os.walk is naive, will traverse a directory twice if there are two links pointing to it (or one subdirectory and a symlink pointing to it).

Yes, and this is exactly what we want here. We want to traverse the same directory twice, if there are two links (two "path names") leading to it.

This if is noop (always true) if that's what you're asking

I did some quick tests, Woju is right.

So looks like the if should be like this:

        # instead of `if dir not in dirs:`
        st = os.stat(dir)
        dirkey = st.st_dev, st.st_ino
        if dirkey not in dirs:
            dirs.add(dirkey)
            scandirs.append(dirname)

@woju Does this check make more sense to you?


Basically, the whole point of this PR is to this scenario:

  • /bin/bash is an executable, and /bin/ is the "real" directory.
  • /usr/bin/ is a symlink to /bin/; so /usr/bin/bash is the same executable as /bin/bash.

Gramine needs to add both absolute paths to sgx.trusted_files in the manifest:

sgx.trusted_files = [
  "file:/bin/bash",
  "file:/usr/bin/bash",
]

This is literally the only thing we try to achieve with this PR. We don't care that both paths lead to the same file. We don't care that a prefix of one path is a symlink to the other path. We rely on the host to follow symlinks. (And Gramine treats these two as just normal files, Gramine has no idea/concept of symlinks.)

@dimakuv
Copy link

dimakuv commented Feb 10, 2023

finalize_manifest.py line 63 at r2 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

In the case from PR description (/bin/ in usrmerged system) it will probably have 2 copies of trusted files (same hashes for 2 different paths)

@woju This is exactly what we want to achieve.

it's very hard to say if there aren't any unintended consequences in other FS tree layouts

I'm unsure what you mean by this? From the Gramine perspective, whatever happens on the host is irrelevant, as long as Gramine can access the host file using the path.

os.walk is naive, will traverse a directory twice if there are two links pointing to it (or one subdirectory and a symlink pointing to it).

Yes, and this is exactly what we want here. We want to traverse the same directory twice, if there are two links (two "path names") leading to it.

This if is noop (always true) if that's what you're asking

I did some quick tests, Woju is right.

So looks like the if should be like this:

        # instead of `if dir not in dirs:`
        st = os.stat(dir)
        dirkey = st.st_dev, st.st_ino
        if dirkey not in dirs:
            dirs.add(dirkey)
            scandirs.append(dirname)

@woju Does this check make more sense to you?


Basically, the whole point of this PR is to this scenario:

  • /bin/bash is an executable, and /bin/ is the "real" directory.
  • /usr/bin/ is a symlink to /bin/; so /usr/bin/bash is the same executable as /bin/bash.

Gramine needs to add both absolute paths to sgx.trusted_files in the manifest:

sgx.trusted_files = [
  "file:/bin/bash",
  "file:/usr/bin/bash",
]

This is literally the only thing we try to achieve with this PR. We don't care that both paths lead to the same file. We don't care that a prefix of one path is a symlink to the other path. We rely on the host to follow symlinks. (And Gramine treats these two as just normal files, Gramine has no idea/concept of symlinks.)

Well, my proposal doesn't work too.

Here is my quick test, that contains both the scenario we want to achieve (bash accessible through two paths) and an infinite link:

mkdir -p usr/bin
touch usr/bin/bash
ln -s usr/bin bin
ln -s ../.. usr/bin/inflink

This gives us the following tree:

$ tree
.
├── bin -> usr/bin
└── usr
    └── bin
        ├── bash
        └── inflink -> ../..

This is the Python snippet that I used for testing, resembling closely what GSC wants to do:

import os
dirs = set()
for dirpath, dirnames, files in os.walk('.', followlinks=True):
    scandirs = []
    for dirname in dirnames:
        dir = os.path.join(dirpath, dirname)
        st = os.stat(dir)
        dirkey = st.st_dev, st.st_ino
        if dirkey not in dirs:
            dirs.add(dirkey)
            scandirs.append(dirname)
    dirnames[:] = scandirs
    for file in files:
        filename = os.path.join(dirpath, file)
        if not os.path.isfile(filename):
            # only regular files are added as trusted files
            continue
        print(filename)

This is the result of running this script:

./bin/bash

# there is no ./usr/bin/bash !!!

Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 4 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @jkr0103 and @mkow)


finalize_manifest.py line 63 at r2 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Well, my proposal doesn't work too.

Here is my quick test, that contains both the scenario we want to achieve (bash accessible through two paths) and an infinite link:

mkdir -p usr/bin
touch usr/bin/bash
ln -s usr/bin bin
ln -s ../.. usr/bin/inflink

This gives us the following tree:

$ tree
.
├── bin -> usr/bin
└── usr
    └── bin
        ├── bash
        └── inflink -> ../..

This is the Python snippet that I used for testing, resembling closely what GSC wants to do:

import os
dirs = set()
for dirpath, dirnames, files in os.walk('.', followlinks=True):
    scandirs = []
    for dirname in dirnames:
        dir = os.path.join(dirpath, dirname)
        st = os.stat(dir)
        dirkey = st.st_dev, st.st_ino
        if dirkey not in dirs:
            dirs.add(dirkey)
            scandirs.append(dirname)
    dirnames[:] = scandirs
    for file in files:
        filename = os.path.join(dirpath, file)
        if not os.path.isfile(filename):
            # only regular files are added as trusted files
            continue
        print(filename)

This is the result of running this script:

./bin/bash

# there is no ./usr/bin/bash !!!

This is a really hard problem :( Or I'm that stupid.

The only hacky solution that I could come up is to allow X repetitions of the same path (same inode). For example, in the code below I assume 2 repetitions:

import os
dirs = set()
for dirpath, dirnames, files in os.walk('.', followlinks=True):
    scandirs = []
    for dirname in dirnames:
        dir = os.path.join(dirpath, dirname)
        st = os.stat(dir)
        dirkey1 = st.st_dev, st.st_ino, 1  # first repetition
        dirkey2 = st.st_dev, st.st_ino, 2  # second repetition
        if dirkey1 not in dirs:
            dirs.add(dirkey1)
            scandirs.append(dirname)
        elif dirkey2 not in dirs:
            dirs.add(dirkey2)
            scandirs.append(dirname)
    dirnames[:] = scandirs
    for file in files:
        filename = os.path.join(dirpath, file)
        if not os.path.isfile(filename):
            # only regular files are added as trusted files
            continue
        print(filename)

This gives me correct output to the example I posted earlier:

./usr/bin/bash
./bin/bash

But of course if we'd have one more symlink (e.g. ln -s usr/bin one-more-bin), then my script would fail to register the abs path one-more-bin/bash.

@mkow @woju Any smart ideas?

Copy link
Member

@woju woju left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 4 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @dimakuv, @jkr0103, and @mkow)


finalize_manifest.py line 63 at r2 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

This is a really hard problem :( Or I'm that stupid.

The only hacky solution that I could come up is to allow X repetitions of the same path (same inode). For example, in the code below I assume 2 repetitions:

import os
dirs = set()
for dirpath, dirnames, files in os.walk('.', followlinks=True):
    scandirs = []
    for dirname in dirnames:
        dir = os.path.join(dirpath, dirname)
        st = os.stat(dir)
        dirkey1 = st.st_dev, st.st_ino, 1  # first repetition
        dirkey2 = st.st_dev, st.st_ino, 2  # second repetition
        if dirkey1 not in dirs:
            dirs.add(dirkey1)
            scandirs.append(dirname)
        elif dirkey2 not in dirs:
            dirs.add(dirkey2)
            scandirs.append(dirname)
    dirnames[:] = scandirs
    for file in files:
        filename = os.path.join(dirpath, file)
        if not os.path.isfile(filename):
            # only regular files are added as trusted files
            continue
        print(filename)

This gives me correct output to the example I posted earlier:

./usr/bin/bash
./bin/bash

But of course if we'd have one more symlink (e.g. ln -s usr/bin one-more-bin), then my script would fail to register the abs path one-more-bin/bash.

@mkow @woju Any smart ideas?

If you're after expansion of all the symlinks, then just drop the resolving logic? Let the walk, like, walk all the files incl. symlinks? Or am I missing something?

Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 4 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @jkr0103 and @mkow)


finalize_manifest.py line 63 at r2 (raw file):

Previously, woju (Wojtek Porczyk) wrote…

If you're after expansion of all the symlinks, then just drop the resolving logic? Let the walk, like, walk all the files incl. symlinks? Or am I missing something?

@woju But you're suggesting basically your snippet that goes bananas on loop-links? This one:

python3 -c "for d, _, _ in __import__('os').walk('a', followlinks=True): print(d)"

This approach is probably working (I think the current state of this PR boils down to this?), but then we'll add ~40x more files than necessary. Sounds like something nobody wants to discover in their final sgx.trusted_files.

Copy link
Member

@woju woju left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 4 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @dimakuv, @jkr0103, and @mkow)


finalize_manifest.py line 63 at r2 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

@woju But you're suggesting basically your snippet that goes bananas on loop-links? This one:

python3 -c "for d, _, _ in __import__('os').walk('a', followlinks=True): print(d)"

This approach is probably working (I think the current state of this PR boils down to this?), but then we'll add ~40x more files than necessary. Sounds like something nobody wants to discover in their final sgx.trusted_files.

Either I'm gravely misunderstanding something, or this is the only correct approach. If the expectation is that you can access the underlying file by any of those paths, they should be included, right?

Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 4 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @jkr0103 and @mkow)


finalize_manifest.py line 63 at r2 (raw file):

If the expectation is that you can access the underlying file by any of those paths, they should be included, right?

If we're talking about the purist approach, then you're right, this is the only correct approach.

However, no sane application should open files through these nested symlinks (a/b/c/a/b/c/a/b/c/a). So ideally, I'd like an approach that removes back edges (removes cycles in the graph of files). Unfortunately, at least some popular Docker images (iirc, PyTorch) have such nested symlinks.

But writing an algo in Python for detecting cycles in traversed files graph... sounds like a total overkill.

Copy link
Member

@woju woju left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 4 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @dimakuv, @jkr0103, and @mkow)


finalize_manifest.py line 63 at r2 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

If the expectation is that you can access the underlying file by any of those paths, they should be included, right?

If we're talking about the purist approach, then you're right, this is the only correct approach.

However, no sane application should open files through these nested symlinks (a/b/c/a/b/c/a/b/c/a). So ideally, I'd like an approach that removes back edges (removes cycles in the graph of files). Unfortunately, at least some popular Docker images (iirc, PyTorch) have such nested symlinks.

But writing an algo in Python for detecting cycles in traversed files graph... sounds like a total overkill.

I think yes, we are talking about purist approaches and POSIX correctness. The GSC is a general tool which should work with any docker doing whatever it likes, isn't it?

I think if you want to optimize this 40x, then we should think about adding symlink support to trusted files. We'd add another type of trusted file that would not only expose a symlink inside, but would also affect the resolution, i.e. we'd have some equivalent of realpath(3) in trusted PAL that would resolve the path to a canonical version and verify the checksum from that.

Either way, I think GSC is a wrong place to fix this.

Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 4 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @jkr0103 and @mkow)


finalize_manifest.py line 63 at r2 (raw file):

I think if you want to optimize this 40x, then we should think about adding symlink support to trusted files. We'd add another type of trusted file that would not only expose a symlink inside, but would also affect the resolution, i.e. we'd have some equivalent of realpath(3) in trusted PAL that would resolve the path to a canonical version and verify the checksum from that.

Yes, we've been thinking about this. But adding links in Gramine is a non-trivial task, requiring a lot of thought and implementation effort.

But I agree in general that GSC should just work with arbitrary Docker images, without any specific hacks.

@jkr0103 So actually what happens if we simply flip followlinks=False to followlinks=True in normal GSC (without this PR)? Why won't it just work (though it may take much more time and have much bigger manifest file size)?

Copy link
Contributor Author

@jkr0103 jkr0103 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 4 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @dimakuv and @mkow)


finalize_manifest.py line 63 at r2 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

I think if you want to optimize this 40x, then we should think about adding symlink support to trusted files. We'd add another type of trusted file that would not only expose a symlink inside, but would also affect the resolution, i.e. we'd have some equivalent of realpath(3) in trusted PAL that would resolve the path to a canonical version and verify the checksum from that.

Yes, we've been thinking about this. But adding links in Gramine is a non-trivial task, requiring a lot of thought and implementation effort.

But I agree in general that GSC should just work with arbitrary Docker images, without any specific hacks.

@jkr0103 So actually what happens if we simply flip followlinks=False to followlinks=True in normal GSC (without this PR)? Why won't it just work (though it may take much more time and have much bigger manifest file size)?

That was the first thing I tried, and os.walk keep running and files processed were 125,705,000 after 1 hour and still not over. Saw multiple loops in in /sys and one of them was
/sys/kernel/software_nodes/node0/dmi-ipmi-si.0. With the current changes, we exclude /sys etc., hence we don't see that happening. I tested this is bash example.

Ideal approach wouold be to removes cycles in the graph of files. Shell I try that?

Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 4 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @jkr0103 and @mkow)


finalize_manifest.py line 63 at r2 (raw file):

Ideal approach wouold be to removes cycles in the graph of files. Shell I try that?

Can you try it in another PR?

I think at this point we should do the following:

  1. Create a new PR that simply flips followlinks=True and test it extensively (just like you did with this PR)
  2. Create a new PR with item 1 plus "remove-the-cycles" logic (but honestly, this feels like a total overkill)
  3. Mark this PR as blocked, we'll most probably close it in favor of solution 1 or 2

Copy link
Member

@woju woju left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 4 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @dimakuv, @jkr0103, and @mkow)


finalize_manifest.py line 63 at r2 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Ideal approach wouold be to removes cycles in the graph of files. Shell I try that?

Can you try it in another PR?

I think at this point we should do the following:

  1. Create a new PR that simply flips followlinks=True and test it extensively (just like you did with this PR)
  2. Create a new PR with item 1 plus "remove-the-cycles" logic (but honestly, this feels like a total overkill)
  3. Mark this PR as blocked, we'll most probably close it in favor of solution 1 or 2

I think you should start with a list of directories that you certainly don't want measured, like /proc, /sys, /tmp, /run, /var/run etc. See if it solves the problem.

Copy link
Contributor Author

@jkr0103 jkr0103 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 4 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @dimakuv and @mkow)


finalize_manifest.py line 63 at r2 (raw file):

Previously, woju (Wojtek Porczyk) wrote…

I think you should start with a list of directories that you certainly don't want measured, like /proc, /sys, /tmp, /run, /var/run etc. See if it solves the problem.

Current approach in PR exclude the directories which we don't want to measure, but it doesn't avoid loops (cycles). Do we want to fix that as part of this PR?

Another change left is renaming dir keyword.

Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 4 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @jkr0103 and @mkow)


finalize_manifest.py line 63 at r2 (raw file):

Previously, jkr0103 (Jitender Kumar) wrote…

Current approach in PR exclude the directories which we don't want to measure, but it doesn't avoid loops (cycles). Do we want to fix that as part of this PR?

Another change left is renaming dir keyword.

I agree with @woju. If the simple approach works, then let's just use it.

Avoiding loops (a) requires non-trivial Python code and a lot of testing, and (b) is significantly slower than the simple approach.

That's why I suggested to create a new PR with the logic that avoids loops (if you really want to have this functionality), so we can have two approaches and can compare them in terms of number of trusted files and performance.

Copy link
Contributor Author

@jkr0103 jkr0103 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 files reviewed, 4 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @dimakuv and @mkow)


finalize_manifest.py line 63 at r2 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

I agree with @woju. If the simple approach works, then let's just use it.

Avoiding loops (a) requires non-trivial Python code and a lot of testing, and (b) is significantly slower than the simple approach.

That's why I suggested to create a new PR with the logic that avoids loops (if you really want to have this functionality), so we can have two approaches and can compare them in terms of number of trusted files and performance.

Code here

if exclude_re.match(filename):
is for excluding the files but loop exists at directory level even before this code executes and we get stuck there. Hence we still need to loop over dirnames and exclude from this list those dirs that matches.

Only case it doesn't fix is infinite loop created using below commands, hope such a loop doesn't appear in real-world workloads.

Code snippet:

mkdir -p usr/bin
touch usr/bin/bash
ln -s usr/bin bin
ln -s ../.. usr/bin/inflink

Copy link
Contributor Author

@jkr0103 jkr0103 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 files reviewed, 4 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @dimakuv, @mkow, and @woju)


finalize_manifest.py line 59 at r2 (raw file):

Previously, woju (Wojtek Porczyk) wrote…

dir() is a builtin (even reviewable highlighted it), please rename this variable

done

Copy link
Member

@woju woju left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 4 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @jkr0103 and @mkow)


finalize_manifest.py line 59 at r2 (raw file):

Previously, jkr0103 (Jitender Kumar) wrote…

done

Resolving, but this is not really abs_: the path will be either relative or absolute, depending on what was root_dir. Or did you intend this to be absolute? Then you need to explicitly call


finalize_manifest.py line 63 at r2 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Yes, I agree with @jkr0103, we still need to loop over the sub-directories' list and filter out the "not needed ones". The current solution looks good to me.

If we're going to settle on this solution, I'd like a comment in the source about this behaviour.

Also, this won't really be an infinite loop, it will end with ELOOP, that is, when path length hits some arbitrary limit (around 256 bytes), kernel will error out any operation on such path, so no risk it will be stuck forever.


finalize_manifest.py line 50 at r3 (raw file):

                                r'|proc/.*'
                                r'|sys/.*'
                                r'|var/.*)$')

WRT this list:

  1. /var/.* is wrong, what about /var/lib, /var/db etc.?
  2. /run (and /var/run, one of them is probably a symlink to the other) is missing.

And this is just from the top of my head, it needs to be checked with FHS and some variety of real distros, as different as possible.

@woju
Copy link
Member

woju commented Feb 14, 2023

finalize_manifest.py line 59 at r2 (raw file):

Previously, woju (Wojtek Porczyk) wrote…

Resolving, but this is not really abs_: the path will be either relative or absolute, depending on what was root_dir. Or did you intend this to be absolute? Then you need to explicitly call

... call os.path.abspath() or pathlib.Path.resolve or show that root_dir is always absolute.

dimakuv
dimakuv previously approved these changes Feb 14, 2023
Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 4 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @mkow and @woju)


finalize_manifest.py line 59 at r2 (raw file):

Previously, woju (Wojtek Porczyk) wrote…

... call os.path.abspath() or pathlib.Path.resolve or show that root_dir is always absolute.

No, this was not supposed to be absolute. You're right, I'd rename to to_be_walked_dir.


finalize_manifest.py line 63 at r2 (raw file):

Previously, woju (Wojtek Porczyk) wrote…

If we're going to settle on this solution, I'd like a comment in the source about this behaviour.

Also, this won't really be an infinite loop, it will end with ELOOP, that is, when path length hits some arbitrary limit (around 256 bytes), kernel will error out any operation on such path, so no risk it will be stuck forever.

@jkr0103 Could you add a comment right-before the line with os.walk() like this:

# WARNING: below loop does not remove cycles (which can happen due to symlinks), but we assume
# that Docker images (excluding paths/files listed in `excluded_paths_regex`) don't have cycles.

finalize_manifest.py line 50 at r3 (raw file):

Previously, woju (Wojtek Porczyk) wrote…

WRT this list:

  1. /var/.* is wrong, what about /var/lib, /var/db etc.?
  2. /run (and /var/run, one of them is probably a symlink to the other) is missing.

And this is just from the top of my head, it needs to be checked with FHS and some variety of real distros, as different as possible.

We should create a separate PR to improve this list. @woju Any ideas what would be good references/links to populate this list?

Copy link
Contributor Author

@jkr0103 jkr0103 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 files reviewed, 4 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @dimakuv, @mkow, and @woju)


finalize_manifest.py line 59 at r2 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

No, this was not supposed to be absolute. You're right, I'd rename to to_be_walked_dir.

renamed.


finalize_manifest.py line 63 at r2 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

@jkr0103 Could you add a comment right-before the line with os.walk() like this:

# WARNING: below loop does not remove cycles (which can happen due to symlinks), but we assume
# that Docker images (excluding paths/files listed in `excluded_paths_regex`) don't have cycles.

Done.

Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r4, all commit messages.
Reviewable status: all files reviewed, 4 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @mkow and @woju)

woju
woju previously approved these changes Feb 14, 2023
Copy link
Member

@woju woju left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r4.
Reviewable status: all files reviewed, 3 unresolved discussions, "fixup! " found in commit messages' one-liners (waiting on @mkow)


finalize_manifest.py line 50 at r3 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

We should create a separate PR to improve this list. @woju Any ideas what would be good references/links to populate this list?

#128 with references for you (or reassign if you want).

Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 3 unresolved discussions, "fixup! " found in commit messages' one-liners (waiting on @mkow)


finalize_manifest.py line 50 at r3 (raw file):

Previously, woju (Wojtek Porczyk) wrote…

#128 with references for you (or reassign if you want).

Thanks

Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you update the table after changing the implementation? Just to be sure that nothing new broke because of it.

Reviewed 1 of 1 files at r4, all commit messages.
Reviewable status: all files reviewed, 3 unresolved discussions, "fixup! " found in commit messages' one-liners (waiting on @dimakuv and @jkr0103)

a discussion (no related file):

We didn't root cause the exact reason, because this problem disappeared in the latest master branch.

So, it works on v1.4?



finalize_manifest.py line 66 at r2 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

@mkow This is slice assignment. See https://stackoverflow.com/questions/10623302/how-does-assignment-work-with-list-slices

Basically, this modified the list dirnames in place, instead of creating a new list.

Why we do it instead of dirnames = scandirs? Because we want to change dirnames in os.walk(). Please read this: https://docs.python.org/3/library/os.html#os.walk

An important exerpt:

When topdown is True, the caller can modify the dirnames list in-place (perhaps using del or slice assignment), and walk() will only recurse into the subdirectories whose names remain in dirnames; this can be used to prune the search, impose a specific order of visiting, or even to inform walk() about directories the caller creates or renames before it resumes walk() again.

Sometimes Python is really unintuitive...

Ouch, this is really underhanded, I don't know any other standard Python APIs working this way...
Maybe pathlib have something more sane?


finalize_manifest.py line 57 at r4 (raw file):

    # WARNING: below loop does not remove cycles (which can happen due to symlinks), but we assume
    # that Docker images (excluding paths/files listed in `excluded_paths_regex`) don't have cycles.

This comment doesn't match the commit message? Which one is true?

Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 4 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: ITL), "fixup! " found in commit messages' one-liners (waiting on @jkr0103, @mkow, and @woju)

a discussion (no related file):

Previously, mkow (Michał Kowalczyk) wrote…

We didn't root cause the exact reason, because this problem disappeared in the latest master branch.

So, it works on v1.4?

Yes, it works on v1.4. Actually, now that we have the release v1.4, would be good to put this version in the config.yaml of GSC and try with this (instead of master).



-- commits line 7 at r4:
We need to modify the commit body message:

This is done via `os.walk(..., followlinks=True)`. This does not remove cycles
(which can happen due to symlinks), but we assume that Docker images
(excluding paths/files listed in `excluded_paths_regex`) don't have cycles.

finalize_manifest.py line 66 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Ouch, this is really underhanded, I don't know any other standard Python APIs working this way...
Maybe pathlib have something more sane?

Maybe @woju knows?


finalize_manifest.py line 57 at r4 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

This comment doesn't match the commit message? Which one is true?

See my other comment, I proposed the new commit message.

Copy link
Contributor Author

@jkr0103 jkr0103 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested the PR with gramine v1.4 in config.yaml with new implementation and updated the table.

Reviewable status: all files reviewed, 4 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: ITL), "fixup! " found in commit messages' one-liners (waiting on @mkow and @woju)

a discussion (no related file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Yes, it works on v1.4. Actually, now that we have the release v1.4, would be good to put this version in the config.yaml of GSC and try with this (instead of master).

I used gramine v1.4 in config.yaml and tested the Curated-Apps workloads in release and debug modes and have updated the table. All works fine.


dimakuv
dimakuv previously approved these changes Feb 15, 2023
Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 4 unresolved discussions, "fixup! " found in commit messages' one-liners (waiting on @jkr0103, @mkow, and @woju)

a discussion (no related file):

Previously, jkr0103 (Jitender Kumar) wrote…

I used gramine v1.4 in config.yaml and tested the Curated-Apps workloads in release and debug modes and have updated the table. All works fine.

Thanks!


Copy link
Member

@woju woju left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 4 unresolved discussions, "fixup! " found in commit messages' one-liners (waiting on @jkr0103 and @mkow)


finalize_manifest.py line 66 at r2 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Maybe @woju knows?

This is not insane, there are many places where you're supposed to modify an object that you're handed from within stdlib. The difference here is that you're handed a list, not some custom class instance to modify attributes.

pathlib doesn't have anything better. There's https://docs.python.org/3/library/pathlib.html#pathlib.Path.rglob, but you can't tell it to not descend into one subtree like we do here.

If you're after readability and you dislike slice assignment, you can do it like this:

for i in range(len(dirnames), -1, -1):
    if that_big_regex.fullmatch(os.path.join(dirpath, dirnames[i])):
        dirnames.pop(i)

But beware: if you instead do it like this, it's unsound:

for dirname in dirnames:
    if that_big_regex.fullmatch(os.path.join(dirpath, dirname)):
        dirnames.remove(dirname)

Because after first .remove() iterator is invalid (might have glitched). So in the future you need to be careful with reviews, to make sure that no-one ever makes it "more pythonic".

Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 4 unresolved discussions, "fixup! " found in commit messages' one-liners (waiting on @jkr0103 and @mkow)


finalize_manifest.py line 66 at r2 (raw file):

Previously, woju (Wojtek Porczyk) wrote…

This is not insane, there are many places where you're supposed to modify an object that you're handed from within stdlib. The difference here is that you're handed a list, not some custom class instance to modify attributes.

pathlib doesn't have anything better. There's https://docs.python.org/3/library/pathlib.html#pathlib.Path.rglob, but you can't tell it to not descend into one subtree like we do here.

If you're after readability and you dislike slice assignment, you can do it like this:

for i in range(len(dirnames), -1, -1):
    if that_big_regex.fullmatch(os.path.join(dirpath, dirnames[i])):
        dirnames.pop(i)

But beware: if you instead do it like this, it's unsound:

for dirname in dirnames:
    if that_big_regex.fullmatch(os.path.join(dirpath, dirname)):
        dirnames.remove(dirname)

Because after first .remove() iterator is invalid (might have glitched). So in the future you need to be careful with reviews, to make sure that no-one ever makes it "more pythonic".

Based on what @woju explained, I would keep the current code as-is. We may add a quick comment to this line, like:

# slice assignment, to modify the list `dirnames` in place

WDYT @mkow ?

Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 2 unresolved discussions, "fixup! " found in commit messages' one-liners (waiting on @dimakuv, @jkr0103, and @woju)


finalize_manifest.py line 66 at r2 (raw file):

The difference here is that you're handed a list, not some custom class instance to modify attributes.

Yes, that's what I mean. My problem is not about slice assignment, but about library returning a list which you then modify during iteration to influence the future iterations.

A comment like Dmitrii suggest would be good, but should rather say that this changes how os.walk() will continue iteration.

@jkr0103 jkr0103 dismissed stale reviews from dimakuv and woju via 1aacb9d February 16, 2023 16:41
Copy link
Contributor Author

@jkr0103 jkr0103 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 files reviewed, 2 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @dimakuv, @mkow, and @woju)


finalize_manifest.py line 66 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

The difference here is that you're handed a list, not some custom class instance to modify attributes.

Yes, that's what I mean. My problem is not about slice assignment, but about library returning a list which you then modify during iteration to influence the future iterations.

A comment like Dmitrii suggest would be good, but should rather say that this changes how os.walk() will continue iteration.

Done.

Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r5, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @jkr0103 and @mkow)


finalize_manifest.py line 66 at r2 (raw file):

Previously, jkr0103 (Jitender Kumar) wrote…

Done.

The comment looks good to me.

woju
woju previously approved these changes Feb 17, 2023
Copy link
Member

@woju woju left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r5, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: ITL), "fixup! " found in commit messages' one-liners (waiting on @dimakuv and @mkow)


-- commits line 7 at r4:

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

We need to modify the commit body message:

This is done via `os.walk(..., followlinks=True)`. This does not remove cycles
(which can happen due to symlinks), but we assume that Docker images
(excluding paths/files listed in `excluded_paths_regex`) don't have cycles.

I concur.


finalize_manifest.py line 66 at r2 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

The comment looks good to me.

yeah, LGTM

mkow
mkow previously approved these changes Feb 20, 2023
Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r5, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion, not enough approvals from different teams (1 more required, approved so far: ITL), "fixup! " found in commit messages' one-liners (waiting on @dimakuv)

This is done via `os.walk(..., followlinks=True)`. This does not remove
cycles (which can happen due to symlinks), but we assume that Docker
images (excluding paths/files listed in `excluded_paths_regex`) don't
have cycles.

Signed-off-by: jkr0103 <[email protected]>
@dimakuv dimakuv dismissed stale reviews from mkow and woju via c374329 February 20, 2023 12:41
@dimakuv dimakuv force-pushed the jkr0103/bin_bash_symlink branch from 1aacb9d to c374329 Compare February 20, 2023 12:41
Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @mkow)

a discussion (no related file):
I did a final rebase and tested that it works correctly.



-- commits line 7 at r4:

Previously, woju (Wojtek Porczyk) wrote…

I concur.

Done

Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved

@dimakuv dimakuv merged commit c374329 into gramineproject:master Feb 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants