-
Notifications
You must be signed in to change notification settings - Fork 594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for indexing root filesystem #442
Conversation
Benchmark Test ResultsBenchmark results from the latest changes vs base branch
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few nits and some questions and some larger questions here too, but great work on the improvements in this pR!
A few other testing ideas I had:
- Should we have an integration test(s) for file system indexing / traversal?
- Is the addPathToIndex method missing a unit test? I didn't see any test cases for symlinks but not sure if they exist already or were implicit in the other tests you addeed?
syft/source/directory_resolver.go
Outdated
) | ||
|
||
var systemRuntimePrefixes = []string{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: unixSystemRuntimePrefixes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also will we ever want this to be configurable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect one day we may, for now I've opted out of doing so.
syft/source/directory_resolver.go
Outdated
// why account for multiple roots? To cover cases when there is a symlink that references above the root path, | ||
// in which case we need to additionally index where the link resolves to. it's for this reason why the filetree | ||
// must be relative to the root of the filesystem (and not just relative to the given path). | ||
roots := []string{root} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this strikes me as something that could be its own method, with a unit test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's also not really clear to me what this is doing since the roots
attribute isn't used after this loop. Is it just an error handling mechanism?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
roots is a local variable used as a scratch list of paths that should be indexed. When indexing a path, some symlink resolution may indicate that other paths outside of the current path need to be indexed as well... these paths get added to the list of paths to be indexed. One path is indexed at a time, and is popped off of the front of the list.
The roots variable isn't needed outside of this function since it's only purpose is to track things that should be indexed, and by the time we leave this function that slice should always be empty (unless there is an error).
I'll move this to a separate function, add tests for it, and rename a few of the variables to improve readability here.
syft/source/directory_resolver.go
Outdated
} | ||
|
||
// permission denied, IO error, etc... we keep track of the paths we can't see, but continue with indexing | ||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we be filtering the type of error here? seems pretty broad at present, which makes me question if there are any errors that should stop execution?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I admit it is broad. I couldn't thing of an error that I would get back for a single path that would indicate there is a larger problem that should stop execution. The closest thing I can imagine is an IO error, however, that doesn't necessarily mean that there is a larger over-all problem that should indicate halting execution.
The largest risk is there being an over-arching problem with accessing the storage device and there is nothing indexed, and no error returned. There would be at least one warning (if not several) that would indicate something is wrong.
What I can do for now is restrict this to only permission denied errors, and we can adjust back if we find other cases later.
@dakaneye re: testing suggestions:
There are already integration-level tests that flex directory scanning and check for scanning correctness. An integration test shouldn't be asserting how a task was performed (in this case asserting specific indexing conditions), only that the result is shallowly correct and asserts that components are wired up correctly. I was tempted to add an integration test that would flex skipping system paths (as asserted in unit tests) but decided against it (for the above reasons). But, I think this would at least be good to capture as a regression test (assert dir scanning of root doesn't take that long). I think a CLI test here makes the most sense, but the test harness will be much different than other tests since it would need to be done within a container so I'll need a bit to add this in.
Good find, I'll update to test the subject |
Good stuff! Right now and at least to me this seems to fail on encountering a dangling symlink. Going through the source to see if I can identify a workaround:
|
Here's something quick that worked for me:
|
@bureado great find! And much appreciated for hunting down a patch too! I'll incorporate in the branch shortly 🙌 |
ab01847
to
415ae45
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me! Great work @wagoodman
…ndex Signed-off-by: Alex Goodman <[email protected]>
Signed-off-by: Alex Goodman <[email protected]>
Signed-off-by: Alex Goodman <[email protected]>
Signed-off-by: Alex Goodman <[email protected]>
Signed-off-by: Alex Goodman <[email protected]>
Signed-off-by: Alex Goodman <[email protected]>
Signed-off-by: Alex Goodman <[email protected]>
Signed-off-by: Alex Goodman <[email protected]>
415ae45
to
d326e70
Compare
* change directory resolver to ignore system runtime paths + drive by index Signed-off-by: Alex Goodman <[email protected]> * add event/etui support for filesystem indexing (for dir resolver) Signed-off-by: Alex Goodman <[email protected]> * add warnings for path indexing problems Signed-off-by: Alex Goodman <[email protected]> * add directory resolver index tests Signed-off-by: Alex Goodman <[email protected]> * improve testing around directory resolver Signed-off-by: Alex Goodman <[email protected]> * renamed p var to path when not conflicting with import Signed-off-by: Alex Goodman <[email protected]> * pull docker image in CLI dir scan timeout test Signed-off-by: Alex Goodman <[email protected]> * ensure file not exist errors do not stop directory resolver indexing Signed-off-by: Alex Goodman <[email protected]>
Before this PR indexing the root filesystem would typically not ever complete. This is mainly due to the fact that sensitive system dirs such as /proc, /sys, and /dev were not ignored.
Changes:
When an unreadable path is found (permission denied, io error, etc), it is tracked and ignored so that indexing may continue.
The added advantage of using the existing stereoscope
FileTree
object is now globbing idioms are the same between the directory resolver and any image resolver. Before this PR there were subtle differences (e.g. using*
vs**
would have different recursive search strategies, which could get confusing).Future changes:
Depending on the size of the filesystem, this may still take a while. The only way to improve this process further is to remove globbing and not allow for basename searches (full paths required). Another alternative to speed this up without removing these features would be to try and leverage already-created filesystem indexes, but these indexes might be stale or untrusted.
Closes #283