-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DONTMERGE Fix excluded_paths_regex
in finalize_manifest.py
#130
Conversation
The updated list of excluded "trusted files" paths better follows the Filesystem Hierarchy Standard (FHS). Signed-off-by: Dmitrii Kuvaiskii <[email protected]>
This is for quick tests of non-root images. Signed-off-by: Dmitrii Kuvaiskii <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 3 files reviewed, all discussions resolved, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel)
finalize_manifest.py
line 56 at r1 (raw file):
r'|sys/.*' r'|tmp/.*' r'|var/.*)$')
@woju thinks we need to exclude only some subdirs of /var/
(or better, include only some subdirs of /var/
). But I didn't find any useful subdirs for "normal" apps, typically converted by GSC.
@jkr0103 @anjalirai-intel @aneessahib Does this PR make sense to you? |
yes looks good. An additional thought - would it make sense to give the flexibility to the user to add more exclude paths via a proper template? This will help users create smaller GSC images as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An additional thought - would it make sense to give the flexibility to the user to add more exclude paths via a proper template? This will help users create smaller GSC images as well.
Sounds like a bit too much... Unless you have a simple idea how users will provide such excluded paths in command-line args (or in some other way)?
Reviewable status: 0 of 3 files reviewed, all discussions resolved, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
via separate file template. like a file exclude_path.template
for example ? We are having to tell multiple people to manually edit finalize_manifest.py
which is deep in GSC. A documented interface might help. Anyways, thats a separate thing to address.
Reviewable status: 0 of 3 files reviewed, 1 unresolved discussion, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @dimakuv)
Documentation/index.rst
line 264 at r1 (raw file):
generate a list of trusted files. GSC excludes some system files and paths (for the exact list, see :ref:`excluded-paths`), since checksums are required which either don't exist or may vary across different deployment machines.
I think somewhere here we should point to finalize_manifest.py
. So that users can modify the exception list according to their needs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are having to tell multiple people to manually edit
finalize_manifest.py
which is deep in GSC.
But why? Why do they have to modify it? What's breaking?
via separate file template. like a file
exclude_path.template
for example?
I'm not sure how this should work. Could you give an example of such a template? Would it be a list of regexes?
Reviewable status: 0 of 3 files reviewed, 1 unresolved discussion, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @aneessahib)
Documentation/index.rst
line 264 at r1 (raw file):
Previously, aneessahib (Anees Sahib) wrote…
I think somewhere here we should point to
finalize_manifest.py
. So that users can modify the exception list according to their needs?
No, finalize_manifest.py
is an internal detail of GSC. We shouldn't tell users to modify the GSC source code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 3 files at r1.
Reviewable status: 1 of 3 files reviewed, 2 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @aneessahib and @dimakuv)
finalize_manifest.py
line 36 at r1 (raw file):
def generate_trusted_files(root_dir, already_added_files): # please keep this list in sync with the one in GSC documentation excluded_paths_regex = (r'^/('
While you're at it, can you reformat it?
- drop hanging indent, it's wasting too much horizontal space and makes inline comments difficult
- rewrite as multiline string for
re.VERBOSE
(https://docs.python.org/3/library/re.html#re.VERBOSE) - drop common
/
before(
(it's confusing) - drop
^
at the beginning and.*
at the end (.match()
method takes care of those) - inline
re.compile
:
exclude_re = re.compile(r'''
( /boot
| /dev
| /efi
# below files are security-critical
| /etc/\.pwd\.lock
| /gramine/app_files/finalize_manifest\.py # will be removed in final GSC image
)
''', re.VERBOSE)
exclude_re.match(...) # this is already OK below
Or alternatively you can rework this as a list of fnmatch
aka "glob" patterns (https://docs.python.org/3/library/fnmatch.html), which might be better if you want to add CLI-level configuration (--include-files "/var/lib/*"
, --exclude-files /etc/\*passwd
), because many CLI tools have fnmatch-like patterns here, not regexps (like https://manpages.debian.org/bullseye/tar/tar.1.en.html#exclude or https://manpages.debian.org/bullseye-backports/rsync/rsync.1.en.html#PATTERN_MATCHING_RULES).
exclude_globs = [
'/boot',
'/dev',
'/media/*',
#...
]
if any(fnmatch.fnmatch(to_be_walked_dir, pattern) for pattern in exclude_globs):
...
finalize_manifest.py
line 56 at r1 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
@woju thinks we need to exclude only some subdirs of
/var/
(or better, include only some subdirs of/var/
). But I didn't find any useful subdirs for "normal" apps, typically converted by GSC.
From the top of my head:
- Most HTTP servers:
/var/www
- Postgresql:
/var/lib/postgresql
(I guess no-one runs it, because shm, but it's a data point) - OpenLDAP:
/var/lib/ldap
- ...
Now some of them may not be appropriate (e.g. postfix uses /var/lib/postfix
to store lockfiles, when it should've used /run
). But certainly there are some things that are in fact needed. Esp. when they're the data that is actually served (/var/www
). But if you think about adding some configurability, you'd have to add two mechanisms, one for adding to exclude_re, and second for "ouch, I needed this directory after all".
Documentation/index.rst
line 264 at r1 (raw file):
We shouldn't tell users to modify the GSC source code.
Yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 1 of 3 files reviewed, 3 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @aneessahib and @dimakuv)
finalize_manifest.py
line 56 at r1 (raw file):
@woju thinks we need to exclude only some subdirs of /var/ (or better, include only some subdirs of /var/). But I didn't find any useful subdirs for "normal" apps, typically converted by GSC.
I don't think there is a right list of "userful subdirs" for "normal" apps. And if we exclude only some subdirs of /var/
, then what about /run/
(isn't it a similar case: see https://refspecs.linuxfoundation.org/FHS_3.0/fhs/ch03s15.html)?
But if you think about adding some configurability, you'd have to add two mechanisms
I think it's useful but not sure whether we do this in this PR (or as a follow-up).
test/ubuntu18.04-hello-world.dockerfile
line 4 at r1 (raw file):
RUN apt-get -y update RUN groupadd -r user && useradd -r -g user user
Any specific considerations that we apply -r
here?
Code quote:
RUN groupadd -r user && useradd -r -g user user
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 1 of 3 files reviewed, 3 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @aneessahib, @kailun-qin, and @woju)
finalize_manifest.py
line 36 at r1 (raw file):
Previously, woju (Wojtek Porczyk) wrote…
While you're at it, can you reformat it?
- drop hanging indent, it's wasting too much horizontal space and makes inline comments difficult
- rewrite as multiline string for
re.VERBOSE
(https://docs.python.org/3/library/re.html#re.VERBOSE)- drop common
/
before(
(it's confusing)- drop
^
at the beginning and.*
at the end (.match()
method takes care of those)- inline
re.compile
:exclude_re = re.compile(r''' ( /boot | /dev | /efi # below files are security-critical | /etc/\.pwd\.lock | /gramine/app_files/finalize_manifest\.py # will be removed in final GSC image ) ''', re.VERBOSE) exclude_re.match(...) # this is already OK belowOr alternatively you can rework this as a list of
fnmatch
aka "glob" patterns (https://docs.python.org/3/library/fnmatch.html), which might be better if you want to add CLI-level configuration (--include-files "/var/lib/*"
,--exclude-files /etc/\*passwd
), because many CLI tools have fnmatch-like patterns here, not regexps (like https://manpages.debian.org/bullseye/tar/tar.1.en.html#exclude or https://manpages.debian.org/bullseye-backports/rsync/rsync.1.en.html#PATTERN_MATCHING_RULES).exclude_globs = [ '/boot', '/dev', '/media/*', #... ] if any(fnmatch.fnmatch(to_be_walked_dir, pattern) for pattern in exclude_globs): ...
Ok, I really like @woju's proposal on fnmatch
and the new --include-files
/--exclude-files
command-line options. I think I should close this PR, as the proposal is significantly different from what this PR tries to achieve.
Someone (maybe me...) will need to implement and test Woju's proposal.
finalize_manifest.py
line 56 at r1 (raw file):
Previously, kailun-qin (Kailun Qin) wrote…
@woju thinks we need to exclude only some subdirs of /var/ (or better, include only some subdirs of /var/). But I didn't find any useful subdirs for "normal" apps, typically converted by GSC.
I don't think there is a right list of "userful subdirs" for "normal" apps. And if we exclude only some subdirs of
/var/
, then what about/run/
(isn't it a similar case: see https://refspecs.linuxfoundation.org/FHS_3.0/fhs/ch03s15.html)?But if you think about adding some configurability, you'd have to add two mechanisms
I think it's useful but not sure whether we do this in this PR (or as a follow-up).
Yeah, ok, you're both right. This all needs much better handling. See my other comment.
Documentation/index.rst
line 264 at r1 (raw file):
Previously, woju (Wojtek Porczyk) wrote…
We shouldn't tell users to modify the GSC source code.
Yes.
Done.
test/ubuntu18.04-hello-world.dockerfile
line 4 at r1 (raw file):
Previously, kailun-qin (Kailun Qin) wrote…
Any specific considerations that we apply
-r
here?
I took this example from Docker documentation, but I agree, I don't see any reason to create a "system" user. I could remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 3 files at r1.
Reviewable status: all files reviewed, 3 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @aneessahib and @kailun-qin)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 3 files at r1, all commit messages.
Reviewable status: all files reviewed, 4 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @aneessahib, @dimakuv, @kailun-qin, and @woju)
test/ubuntu18.04-hello-world.dockerfile
line 5 at r1 (raw file):
This is for quick tests of non-root images.
Won't this break images which assume root? I.e. aren't you fixing some images and breaking some others at the same time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 4 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @aneessahib, @kailun-qin, and @woju)
test/ubuntu18.04-hello-world.dockerfile
line 5 at r1 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
This is for quick tests of non-root images.
Won't this break images which assume root? I.e. aren't you fixing some images and breaking some others at the same time?
Sorry @mkow I don't understand the question. Why did you pose the question here, in the HelloWorld example?
If you're talking about this PR as a whole, then: I tested it with both root and non-root images, and it works. The change is supposed to be minimal, and it doesn't seem to affect existing workloads.
But anyway, this PR should be considered deprecated (I'll add a DONTMERGE note now), and I would like for someone to create a new PR with @woju's proposal, see other comment.
excluded_paths_regex
in finalize_manifest.py
excluded_paths_regex
in finalize_manifest.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 3 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @aneessahib, @kailun-qin, and @woju)
test/ubuntu18.04-hello-world.dockerfile
line 5 at r1 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
Sorry @mkow I don't understand the question. Why did you pose the question here, in the HelloWorld example?
If you're talking about this PR as a whole, then: I tested it with both root and non-root images, and it works. The change is supposed to be minimal, and it doesn't seem to affect existing workloads.
But anyway, this PR should be considered deprecated (I'll add a DONTMERGE note now), and I would like for someone to create a new PR with @woju's proposal, see other comment.
Ah, I misunderstood what the second commit does, resolving.
Incorrect attempt at fixing the issue, closing this PR. |
Description of the changes
The updated list of excluded "trusted files" paths better follows the Filesystem Hierarchy Standard (FHS).
Based on:
Fixes #128.
Additionally, this PR includes a quick second commit, which sets
USER
inubuntu18.04-hello-world
test. This is for quick tests of non-root images.How to test this PR?
Test some Docker images, no functional changes.
This change is