-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make sure DirectoryPathTree does not allow reading outside the root dir #23692
Make sure DirectoryPathTree does not allow reading outside the root dir #23692
Conversation
independent-projects/bootstrap/app-model/src/main/java/io/quarkus/paths/DirectoryPathTree.java
Show resolved
Hide resolved
This workflow status is outdated as a new workflow run has been triggered. Failing Jobs - Building 35794f6
Full information is available in the Build summary check run. Failures⚙️ JVM Tests - JDK 11 Windows #- Failing: independent-projects/bootstrap/app-model
! Skipped: core/deployment core/launcher core/runtime and 678 more 📦 independent-projects/bootstrap/app-model✖
|
35794f6
to
86b290d
Compare
This workflow status is outdated as a new workflow run has been triggered. Failing Jobs - Building 86b290d
Full information is available in the Build summary check run. Failures⚙️ Gradle Tests - JDK 11 Windows #- Failing: integration-tests/gradle
📦 integration-tests/gradle✖
⚙️ JVM Tests - JDK 11 Windows #- Failing: independent-projects/bootstrap/app-model
! Skipped: core/deployment core/launcher core/runtime and 678 more 📦 independent-projects/bootstrap/app-model✖
|
86b290d
to
e931687
Compare
This workflow status is outdated as a new workflow run has been triggered. Failing Jobs - Building e931687
Full information is available in the Build summary check run. Failures⚙️ Gradle Tests - JDK 11 Windows #- Failing: integration-tests/gradle
📦 integration-tests/gradle✖
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this @aloubyansky, really appreciated!
While I understand this might be better performance-wise than my PR (#23672), there are a few key differences with my PR that I don't think are going in the right direction: IMO we should interpret input strings as paths to resources in the classpath, and never as paths in the filesystem. See my comments below; maybe they explain this more clearly.
Also... unfortunately, regardless of my opinions, this does not fix #23574, which was the point of my PR.
independent-projects/bootstrap/app-model/src/main/java/io/quarkus/paths/DirectoryPathTree.java
Outdated
Show resolved
Hide resolved
independent-projects/bootstrap/app-model/src/main/java/io/quarkus/paths/DirectoryPathTree.java
Outdated
Show resolved
Hide resolved
if (path.contains("..")) { | ||
final Path absolutePath = dir.resolve(path).normalize().toAbsolutePath(); | ||
if (absolutePath.startsWith(dir)) { | ||
return dir.relativize(absolutePath).toString(); | ||
} | ||
return null; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really don't think we should do this... In my opinion, the string being passed as input should be interpreted as a paths to a resource in the classpath, i.e. something that we'd normally pass to Class#getResource
.
The goal of DirectoryPathTree
as I understand it is to convert such "path in the classpath" to an actual path on the filesystem.
With that in mind, it doesn't make sense to allow input strings to include the path to src/main/resources
on the filesystem (like you did further up, see my other comment), or to interpret things such as ..
which are a filesystem concept (and not something that is usually supported for paths to resources in the classpath).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About this one, I think I'd keep it. The thing is that the PathTree is also exposed as an actual file system path tree API in other places. However, we still should not allow reading outside the tree root.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do agree it's ugly though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Supporting ..
does raise a problem, though... But maybe one that was pre-existing.
If I pass foo/bar/..
to DirectoryPathTree
, it will automatically be resolved to foo/
. So in a build step for example, we might end up accepting the path as valid.
Later at runtime, we might pass the same string (foo/bar/..
) to Class#getRessource
. And then it will fail, because ..
has no special meaning for Class#getResource
...
Maybe a better behavior would be to simple escape ..
, so that it's interpreted as a literal folder named ..
? And if that's not possible, maybe we should just reject any path with a component named ..
?
Damn, this is starting to look dodgy. I sure hope there aren't many similar pitfalls with Windows paths.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, there could be path element name a..b
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hu. I didn't know that.
So we have a single API for multiple types of resource trees, some of which don't have the same syntax for their path strings? That looks like a problem in itself...
Ok, I'll trust you to pick what seems best to you, but to me those are the only possible solutions in the scope of this PR:
- Your implementation, which allows using
..
, which could lead to incorrectly accepting invalid paths for classpath resource trees (see my example above). - My implementation, which prevents using
..
, which could lead to incorrectly rejecting valid path for non-classpath resource trees.
Alternatively, we could provide a flag in the constructor, to switch between the two behaviors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, before PathTree
we would simple do path.resolve(childPath)
in ApplicationArchive.getChildPath(childPath)
. The primary goal for PathTree
wasn't actually to represent the classpath resources but to provide a single API to process directory and archive content with the option of applying filters. I would probably keep it that way.
However, I get your point. We actually do have PathTreeClassPathElement
which implements ClassPathElement
and its purpose is pretty much what you are asking for. But it's not yet exposed from ApplicationArchive
. I think it should be, it would have to be a separate refactoring task.
You can actually use QuarkusClassLoader.getElements(resourceName, localOnly)
which will return a list of PathTreeClassPathElement
s. But it was added kind of quickly and isn't a nice/proper API for classpath resource locating. Also, it's different from the ApplicationArchive
, in a sense that ApplicationArchive
s are a subset of all the dependencies that are identified as such.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, I approved the PR; I'm not sure what you're going to do with this ..
handling, so I'll let you merge once you're done?
Thanks again for all the work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am thinking I probably wouldn't mind the change you proposed. It wouldn't be a huge limitation at this point and we could reconsider supporting ..
in case we really need to one day. Would you like to add this change to your original PR? Thank you @yrodiere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do. Thanks.
Perhaps resource names should be validated in the QuarkusClassLoader and not get so far down the call tree. |
That way we would avoid this extra sanitization when loading classes. |
Ah, you are probably using ApplicationArchive? I'll need to check that. Which should follow the same rules for resource names. |
We're using Regardless, I don't really see what we're doing as sanitization, but rather as translation. In Even if we validated paths using |
Yes. This is the relevant piece of code: Lines 1112 to 1124 in 4f81ab8
|
Thanks @yrodiere for all the analysis. The PathTree is exposed in too many places now, so we do need to be careful about how the relative paths are resolved in it. |
e931687
to
9d726e7
Compare
independent-projects/bootstrap/app-model/src/main/java/io/quarkus/paths/DirectoryPathTree.java
Outdated
Show resolved
Hide resolved
I wonder if it's wise to generally try to load a resource both as a resource and as possibly and absolute path. Thinking about the use case expressed in #23574 - could we patch that configuration element to explicitly try the |
Personally, I'd object to retrieving files from the filesystem at build time. Especially using absolute paths. It makes the build environment-specific, and I really don't want to encourage that kind of bad practice. Something relative to the Maven project root... maybe. But then the question becomes how do we make it clear what the user means (resource path or FS path)? Is a try-and-default behavior really a good idea? And also, does that mean we will automatically add a file to the resources, even though it was not initially in Really, the more I think about it, the less I like the idea. If people want to use a file that is not a Java resource, in a place where we expect a Java resource, they can tune their build to generate resources (copying stuff to |
I'm not sure I follow. If it's retrieved at build time, how is that making the build "environment-specific" ? |
9d726e7
to
9ac9675
Compare
Let's say I set a property to |
Hi @yrodiere thanks for explaining and of course I'd agree with that - but it's not what I would expect Quarkus to do. If the resource is "fixed" at build time, I would expect it to either copy the resource from the filesystem into the resources directory, or read its content and encode it as a constant (probably depending on the expected reasonable size - I guess the scripts case could be rather large, making it better fit to be treated as resource). On top of addressing your concern of "environment-specific" it also addresses the semantic aspect of this being in fact a build-time defined constant: if other build-time elements are dependant on its content (e.g. skip some processing because the script is empty), we'd prefer it to be consistent even if you were to edit or remove the file on your local filesystem after the build. |
Yes, I mentioned that solution in a previous message. But this only addresses potential inconsistencies of the environment between build-time and static init (e.g. the file might not be present anymore during static init). Even if Quarkus copies the file at build time somehow, using an absolute path to a file in the filesystem still looks like bad practice to me: the build is still environment-specific because it relies on some file outside of the scope of your Maven/Gradle project. And even if we limit this feature to paths relative to the root of your Maven/Gradle project, there are other problems. I'll quote myself:
If we still disagree on this, maybe we can have a quick video call later. I'm repeating myself, which is a good indication I'm either failing to understand something or failing to explain something... Either way, sorry about that 😅 |
9ac9675
to
16467bc
Compare
16467bc
to
23d49ac
Compare
@yrodiere you have valid points from a perspective of "good engineering" of a stable project, which would dictate scripts need to be part of the project, and versioned together with the rest of the project code. But I do wonder if you're focusing only on the "production" mindset; we should also be pragmatic for the use case of dev-mode, in which people are exploring and experimenting and it would be at least useful to be able to directly point to some scripts which are possibly part of another project and/or being edited (or output) live with different tools that might be open in different windows. In such a scenario, we don't want to get in the way of "the flow" - if possible. I don't presume knowing what workflow people should use, but allowing options in dev-mode shouldn't be considered an anti-pattern to avoid in name of principles. On the other hand, I'm onboad with such restrictions for a normal (or even a native) build, for which I mentioned the content needs to get "made constant" during a regular build. |
My goal here is to fix a bug (see #23574 (comment)) and thus I am indeed focusing on the production mindset. People who want to do weird things and experiment are welcome to open tickets and expose their use case, and their feature request will be addressed when someone agrees with their use case and has time to implement the feature. That's pretty much what I answered here. I the meantime... let's focus on the (already complex) task of having Quarkus produce actionable error messages at build time when it encounters currently unsupported configuration, rather than ignoring it and failing at runtime because of that :) |
I agree, this discussion should probably move to quarkus-dev to include more people. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aloubyansky Almost there! LGTM, except the bit about ..
. See #23692 (comment)
Once that's addressed, I think we can merge?
Because we want the paths to be interpreted in a way that's as similar to ClassLoader#getResource as possible. With the previous behavior, if I pass foo/bar/.. to DirectoryPathTree, it will automatically be resolved to foo/. So in a build step for example, we might end up accepting the path as valid. Later at runtime, we might pass the same string (foo/bar/..) to ClassLoader#getRessource. And then it will fail, because .. has no special meaning for ClassLoader#getResource... See quarkusio#23692 (comment)
Because we want the paths to be interpreted in a way that's as similar to ClassLoader#getResource as possible. With the previous behavior, if I pass foo/bar/.. to DirectoryPathTree, it will automatically be resolved to foo/. So in a build step for example, we might end up accepting the path as valid. Later at runtime, we might pass the same string (foo/bar/..) to ClassLoader#getRessource. And then it will fail, because .. has no special meaning for ClassLoader#getResource... See quarkusio#23692 (comment) (cherry picked from commit b33f815)
An alternative to #23672