Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make sure DirectoryPathTree does not allow reading outside the root dir #23692

Merged
merged 1 commit into from
Feb 22, 2022

Conversation

aloubyansky
Copy link
Member

An alternative to #23672

@quarkus-bot quarkus-bot bot added the area/devtools Issues/PR related to maven, gradle, platform and cli tooling/plugins label Feb 14, 2022
@Sanne Sanne requested a review from yrodiere February 14, 2022 23:03
@quarkus-bot
Copy link

quarkus-bot bot commented Feb 15, 2022

This workflow status is outdated as a new workflow run has been triggered.

Failing Jobs - Building 35794f6

Status Name Step Failures Logs Raw logs
✔️ JVM Tests - JDK 11
JVM Tests - JDK 11 Windows Build Failures Logs Raw logs
✔️ JVM Tests - JDK 17

Full information is available in the Build summary check run.

Failures

⚙️ JVM Tests - JDK 11 Windows #

- Failing: independent-projects/bootstrap/app-model 
! Skipped: core/deployment core/launcher core/runtime and 678 more

📦 independent-projects/bootstrap/app-model

io.quarkus.paths.DirectoryPathTreeTest. line 27 - More details - Source on GitHub

java.nio.file.InvalidPathException: Illegal char <:> at index 2: /D:/a/quarkus/quarkus/independent-projects/bootstrap/app-model/target/test-classes/paths/directory-path-tree
	at java.base/sun.nio.fs.WindowsPathParser.normalize(WindowsPathParser.java:182)
	at java.base/sun.nio.fs.WindowsPathParser.parse(WindowsPathParser.java:153)

@aloubyansky aloubyansky force-pushed the directory-path-tree-read-outside branch from 35794f6 to 86b290d Compare February 15, 2022 08:35
@quarkus-bot
Copy link

quarkus-bot bot commented Feb 15, 2022

This workflow status is outdated as a new workflow run has been triggered.

Failing Jobs - Building 86b290d

Status Name Step Failures Logs Raw logs
Gradle Tests - JDK 11 Windows Build Failures Logs Raw logs
✔️ JVM Tests - JDK 11
JVM Tests - JDK 11 Windows Build Failures Logs Raw logs
✔️ JVM Tests - JDK 17

Full information is available in the Build summary check run.

Failures

⚙️ Gradle Tests - JDK 11 Windows #

- Failing: integration-tests/gradle 

📦 integration-tests/gradle

io.quarkus.gradle.devmode.MultiSourceProjectDevModeTest.main line 22 - More details - Source on GitHub

org.awaitility.core.ConditionTimeoutException: Condition with lambda expression in io.quarkus.test.devmode.util.DevModeTestUtils that uses java.util.function.Supplier, java.util.function.Supplierjava.util.concurrent.atomic.AtomicReference, java.util.concurrent.atomic.AtomicReferencejava.lang.String, java.lang.Stringboolean was not fulfilled within 1 minutes.
	at org.awaitility.core.ConditionAwaiter.await(ConditionAwaiter.java:164)
	at org.awaitility.core.CallableCondition.await(CallableCondition.java:78)

⚙️ JVM Tests - JDK 11 Windows #

- Failing: independent-projects/bootstrap/app-model 
! Skipped: core/deployment core/launcher core/runtime and 678 more

📦 independent-projects/bootstrap/app-model

io.quarkus.paths.DirectoryPathTreeTest.acceptOutsidePathAsUri line 144 - More details - Source on GitHub

java.nio.file.InvalidPathException: Illegal char <:> at index 4: file:///D:/a/quarkus/quarkus/independent-projects/bootstrap/app-model/target/test-classes/paths/directory-path-tree/external.txt
	at java.base/sun.nio.fs.WindowsPathParser.normalize(WindowsPathParser.java:182)
	at java.base/sun.nio.fs.WindowsPathParser.parse(WindowsPathParser.java:153)

@aloubyansky aloubyansky force-pushed the directory-path-tree-read-outside branch from 86b290d to e931687 Compare February 15, 2022 13:08
@quarkus-bot
Copy link

quarkus-bot bot commented Feb 15, 2022

This workflow status is outdated as a new workflow run has been triggered.

Failing Jobs - Building e931687

Status Name Step Failures Logs Raw logs
Gradle Tests - JDK 11 Windows Build Failures Logs Raw logs

Full information is available in the Build summary check run.

Failures

⚙️ Gradle Tests - JDK 11 Windows #

- Failing: integration-tests/gradle 

📦 integration-tests/gradle

io.quarkus.gradle.devmode.MultiSourceProjectDevModeTest.main line 22 - More details - Source on GitHub

org.awaitility.core.ConditionTimeoutException: Condition with lambda expression in io.quarkus.test.devmode.util.DevModeTestUtils that uses java.util.function.Supplier, java.util.function.Supplierjava.util.concurrent.atomic.AtomicReference, java.util.concurrent.atomic.AtomicReferencejava.lang.String, java.lang.Stringboolean was not fulfilled within 1 minutes.
	at org.awaitility.core.ConditionAwaiter.await(ConditionAwaiter.java:164)
	at org.awaitility.core.CallableCondition.await(CallableCondition.java:78)

Copy link
Member

@yrodiere yrodiere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @aloubyansky, really appreciated!

While I understand this might be better performance-wise than my PR (#23672), there are a few key differences with my PR that I don't think are going in the right direction: IMO we should interpret input strings as paths to resources in the classpath, and never as paths in the filesystem. See my comments below; maybe they explain this more clearly.

Also... unfortunately, regardless of my opinions, this does not fix #23574, which was the point of my PR.

Comment on lines 87 to 92
if (path.contains("..")) {
final Path absolutePath = dir.resolve(path).normalize().toAbsolutePath();
if (absolutePath.startsWith(dir)) {
return dir.relativize(absolutePath).toString();
}
return null;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't think we should do this... In my opinion, the string being passed as input should be interpreted as a paths to a resource in the classpath, i.e. something that we'd normally pass to Class#getResource.

The goal of DirectoryPathTree as I understand it is to convert such "path in the classpath" to an actual path on the filesystem.

With that in mind, it doesn't make sense to allow input strings to include the path to src/main/resources on the filesystem (like you did further up, see my other comment), or to interpret things such as .. which are a filesystem concept (and not something that is usually supported for paths to resources in the classpath).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About this one, I think I'd keep it. The thing is that the PathTree is also exposed as an actual file system path tree API in other places. However, we still should not allow reading outside the tree root.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do agree it's ugly though.

Copy link
Member

@yrodiere yrodiere Feb 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Supporting .. does raise a problem, though... But maybe one that was pre-existing.

If I pass foo/bar/.. to DirectoryPathTree, it will automatically be resolved to foo/. So in a build step for example, we might end up accepting the path as valid.
Later at runtime, we might pass the same string (foo/bar/..) to Class#getRessource. And then it will fail, because .. has no special meaning for Class#getResource...

Maybe a better behavior would be to simple escape .., so that it's interpreted as a literal folder named ..? And if that's not possible, maybe we should just reject any path with a component named ..?

Damn, this is starting to look dodgy. I sure hope there aren't many similar pitfalls with Windows paths.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there could be path element name a..b

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hu. I didn't know that.

So we have a single API for multiple types of resource trees, some of which don't have the same syntax for their path strings? That looks like a problem in itself...

Ok, I'll trust you to pick what seems best to you, but to me those are the only possible solutions in the scope of this PR:

  • Your implementation, which allows using .., which could lead to incorrectly accepting invalid paths for classpath resource trees (see my example above).
  • My implementation, which prevents using .., which could lead to incorrectly rejecting valid path for non-classpath resource trees.

Alternatively, we could provide a flag in the constructor, to switch between the two behaviors.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, before PathTree we would simple do path.resolve(childPath) in ApplicationArchive.getChildPath(childPath). The primary goal for PathTree wasn't actually to represent the classpath resources but to provide a single API to process directory and archive content with the option of applying filters. I would probably keep it that way.
However, I get your point. We actually do have PathTreeClassPathElement which implements ClassPathElement and its purpose is pretty much what you are asking for. But it's not yet exposed from ApplicationArchive. I think it should be, it would have to be a separate refactoring task.
You can actually use QuarkusClassLoader.getElements(resourceName, localOnly) which will return a list of PathTreeClassPathElements. But it was added kind of quickly and isn't a nice/proper API for classpath resource locating. Also, it's different from the ApplicationArchive, in a sense that ApplicationArchives are a subset of all the dependencies that are identified as such.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I approved the PR; I'm not sure what you're going to do with this .. handling, so I'll let you merge once you're done?

Thanks again for all the work.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking I probably wouldn't mind the change you proposed. It wouldn't be a huge limitation at this point and we could reconsider supporting .. in case we really need to one day. Would you like to add this change to your original PR? Thank you @yrodiere

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do. Thanks.

@aloubyansky
Copy link
Member Author

Perhaps resource names should be validated in the QuarkusClassLoader and not get so far down the call tree.

@aloubyansky
Copy link
Member Author

That way we would avoid this extra sanitization when loading classes.

@aloubyansky
Copy link
Member Author

Ah, you are probably using ApplicationArchive? I'll need to check that. Which should follow the same rules for resource names.

@yrodiere
Copy link
Member

We're using DirectoryPathTree in build steps; I don't know if QuarkusClassLoader is available at that point? At least, not with access to application resources? I'm not overly familiar with that, though.

Regardless, I don't really see what we're doing as sanitization, but rather as translation. In DirectoryPathTree, we expect resource paths as input, and we need to translate them to filesystem paths. We just need to interpret them correctly, and avoid anything clearly filesystem-related (e.g. interpreting input as a path on the filesystem, interpreting .. in input, ...).

Even if we validated paths using QuarkusClassLoader beforehand, we'd still have to interpret the paths correctly afterwards (and take special care with absolute paths, in particular). Pre-validation might help with .., though.

@yrodiere
Copy link
Member

yrodiere commented Feb 16, 2022

Ah, you are probably using ApplicationArchive? I'll need to check that. Which should follow the same rules for resource names.

Yes. This is the relevant piece of code:

Path loadScriptPath = applicationArchivesBuildItem.getRootArchive().getChildPath(importFile);
if (loadScriptPath != null && !Files.isDirectory(loadScriptPath)) {
// enlist resource if present
nativeImageResources.produce(new NativeImageResourceBuildItem(importFile));
} else if (persistenceUnitConfig.sqlLoadScript.isPresent()) {
//raise exception if explicit file is not present (i.e. not the default)
throw new ConfigurationException(
"Unable to find file referenced in '"
+ HibernateOrmConfig.puPropertyKey(persistenceUnitName, "sql-load-script") + "="
+ String.join(",", persistenceUnitConfig.sqlLoadScript.get())
+ "'. Remove property or add file to your path.");
}

@aloubyansky
Copy link
Member Author

Thanks @yrodiere for all the analysis. The PathTree is exposed in too many places now, so we do need to be careful about how the relative paths are resolved in it.

@aloubyansky aloubyansky force-pushed the directory-path-tree-read-outside branch from e931687 to 9d726e7 Compare February 16, 2022 15:05
@Sanne
Copy link
Member

Sanne commented Feb 16, 2022

I wonder if it's wise to generally try to load a resource both as a resource and as possibly and absolute path.

Thinking about the use case expressed in #23574 - could we patch that configuration element to explicitly try the Path API and then try loading as resource should the first one fail?

@yrodiere
Copy link
Member

yrodiere commented Feb 16, 2022

I wonder if it's wise to generally try to load a resource both as a resource and as possibly and absolute path.

Thinking about the use case expressed in #23574 - could we patch that configuration element to explicitly try the Path API and then try loading as resource should the first one fail?

Personally, I'd object to retrieving files from the filesystem at build time. Especially using absolute paths. It makes the build environment-specific, and I really don't want to encourage that kind of bad practice.

Something relative to the Maven project root... maybe. But then the question becomes how do we make it clear what the user means (resource path or FS path)? Is a try-and-default behavior really a good idea? And also, does that mean we will automatically add a file to the resources, even though it was not initially in src/main/resources? Because the runtime (e.g. Hibernate ORM) may still expect to retrieve the file with Class#getResource...

Really, the more I think about it, the less I like the idea. If people want to use a file that is not a Java resource, in a place where we expect a Java resource, they can tune their build to generate resources (copying stuff to target/generated/...). Or maybe we can add options to Quarkus to make that easy (the copying of stuff to target/generated during the build). But it really should be explicit, IMO.

@Sanne
Copy link
Member

Sanne commented Feb 16, 2022

I wonder if it's wise to generally try to load a resource both as a resource and as possibly and absolute path.
Thinking about the use case expressed in #23574 - could we patch that configuration element to explicitly try the Path API and then try loading as resource should the first one fail?

Personally, I'd object to retrieving files from the filesystem at build time. Especially using absolute paths. It makes the build environment-specific, and I really don't want to encourage that kind of bad practice.

I'm not sure I follow. If it's retrieved at build time, how is that making the build "environment-specific" ?

@aloubyansky aloubyansky force-pushed the directory-path-tree-read-outside branch from 9d726e7 to 9ac9675 Compare February 16, 2022 21:38
@yrodiere
Copy link
Member

yrodiere commented Feb 17, 2022

I'm not sure I follow. If it's retrieved at build time, how is that making the build "environment-specific" ?

Let's say I set a property to /home/yrodiere/myfile.txt in my Quarkus application.properties. The build will work on my machine (my environment), because /home/yrodiere/myfile.txt exists. It won't work on yours, because /home/yrodiere doesn't exist. The build is environment-specific: just checking out the code of the application from the git repository is not enough to successfully build the application.

@Sanne
Copy link
Member

Sanne commented Feb 17, 2022

The build will work on my machine (my environment), because /home/yrodiere/myfile.txt exists. It won't work on yours, because

Hi @yrodiere thanks for explaining and of course I'd agree with that - but it's not what I would expect Quarkus to do.

If the resource is "fixed" at build time, I would expect it to either copy the resource from the filesystem into the resources directory, or read its content and encode it as a constant (probably depending on the expected reasonable size - I guess the scripts case could be rather large, making it better fit to be treated as resource).

On top of addressing your concern of "environment-specific" it also addresses the semantic aspect of this being in fact a build-time defined constant: if other build-time elements are dependant on its content (e.g. skip some processing because the script is empty), we'd prefer it to be consistent even if you were to edit or remove the file on your local filesystem after the build.

@yrodiere
Copy link
Member

If the resource is "fixed" at build time, I would expect it to either copy the resource from the filesystem into the resources directory, or read its content and encode it as a constant (probably depending on the expected reasonable size - I guess the scripts case could be rather large, making it better fit to be treated as resource).

On top of addressing your concern of "environment-specific" it also addresses the semantic aspect of this being in fact a build-time defined constant: if other build-time elements are dependant on its content (e.g. skip some processing because the script is empty), we'd prefer it to be consistent even if you were to edit or remove the file on your local filesystem after the build.

Yes, I mentioned that solution in a previous message. But this only addresses potential inconsistencies of the environment between build-time and static init (e.g. the file might not be present anymore during static init).

Even if Quarkus copies the file at build time somehow, using an absolute path to a file in the filesystem still looks like bad practice to me: the build is still environment-specific because it relies on some file outside of the scope of your Maven/Gradle project.

And even if we limit this feature to paths relative to the root of your Maven/Gradle project, there are other problems. I'll quote myself:

Something relative to the Maven project root... maybe. But then the question becomes how do we make it clear what the user means (resource path or FS path)? Is a try-and-default behavior really a good idea? And also, does that mean we will automatically add a file to the resources, even though it was not initially in src/main/resources? Because the runtime (e.g. Hibernate ORM) may still expect to retrieve the file with Class#getResource...

Really, the more I think about it, the less I like the idea. If people want to use a file that is not a Java resource, in a place where we expect a Java resource, they can tune their build to generate resources (copying stuff to target/generated/...). Or maybe we can add options to Quarkus to make that easy (the copying of stuff to target/generated during the build). But it really should be explicit, IMO.

If we still disagree on this, maybe we can have a quick video call later. I'm repeating myself, which is a good indication I'm either failing to understand something or failing to explain something... Either way, sorry about that 😅

@aloubyansky aloubyansky force-pushed the directory-path-tree-read-outside branch from 9ac9675 to 16467bc Compare February 17, 2022 19:59
@aloubyansky aloubyansky force-pushed the directory-path-tree-read-outside branch from 16467bc to 23d49ac Compare February 17, 2022 23:30
@Sanne
Copy link
Member

Sanne commented Feb 18, 2022

@yrodiere you have valid points from a perspective of "good engineering" of a stable project, which would dictate scripts need to be part of the project, and versioned together with the rest of the project code.

But I do wonder if you're focusing only on the "production" mindset; we should also be pragmatic for the use case of dev-mode, in which people are exploring and experimenting and it would be at least useful to be able to directly point to some scripts which are possibly part of another project and/or being edited (or output) live with different tools that might be open in different windows. In such a scenario, we don't want to get in the way of "the flow" - if possible.

I don't presume knowing what workflow people should use, but allowing options in dev-mode shouldn't be considered an anti-pattern to avoid in name of principles.

On the other hand, I'm onboad with such restrictions for a normal (or even a native) build, for which I mentioned the content needs to get "made constant" during a regular build.

@yrodiere
Copy link
Member

My goal here is to fix a bug (see #23574 (comment)) and thus I am indeed focusing on the production mindset.

People who want to do weird things and experiment are welcome to open tickets and expose their use case, and their feature request will be addressed when someone agrees with their use case and has time to implement the feature. That's pretty much what I answered here.

I the meantime... let's focus on the (already complex) task of having Quarkus produce actionable error messages at build time when it encounters currently unsupported configuration, rather than ignoring it and failing at runtime because of that :)

@aloubyansky
Copy link
Member Author

I agree, this discussion should probably move to quarkus-dev to include more people.

Copy link
Member

@yrodiere yrodiere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aloubyansky Almost there! LGTM, except the bit about ... See #23692 (comment)
Once that's addressed, I think we can merge?

@gsmet gsmet merged commit 1c8994b into quarkusio:main Feb 22, 2022
@quarkus-bot quarkus-bot bot added this to the 2.8 - main milestone Feb 22, 2022
@gsmet gsmet modified the milestones: 2.8 - main, 2.7.2.Final Feb 22, 2022
yrodiere added a commit to yrodiere/quarkus that referenced this pull request Feb 22, 2022
Because we want the paths to be interpreted in a way that's as similar
to ClassLoader#getResource as possible.

With the previous behavior, if I pass foo/bar/.. to DirectoryPathTree,
it will automatically be resolved to foo/.
So in a build step for example, we might end up accepting the path as valid.
Later at runtime, we might pass the same string (foo/bar/..) to ClassLoader#getRessource.
And then it will fail, because .. has no special meaning for ClassLoader#getResource...

See quarkusio#23692 (comment)
gsmet pushed a commit to gsmet/quarkus that referenced this pull request Feb 28, 2022
Because we want the paths to be interpreted in a way that's as similar
to ClassLoader#getResource as possible.

With the previous behavior, if I pass foo/bar/.. to DirectoryPathTree,
it will automatically be resolved to foo/.
So in a build step for example, we might end up accepting the path as valid.
Later at runtime, we might pass the same string (foo/bar/..) to ClassLoader#getRessource.
And then it will fail, because .. has no special meaning for ClassLoader#getResource...

See quarkusio#23692 (comment)

(cherry picked from commit b33f815)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/devtools Issues/PR related to maven, gradle, platform and cli tooling/plugins
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants