Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue #10164 - MetaInfConfiguration Overhaul #10816

Closed
wants to merge 12 commits into from

Conversation

joakime
Copy link
Contributor

@joakime joakime commented Oct 31, 2023

Replacement for PR #10171 for #10164 - now that we have better testing of original behavior, this PR should be smaller that #10171 was.

@joakime joakime requested review from gregw and janbartel October 31, 2023 17:10
@joakime joakime marked this pull request as draft October 31, 2023 20:31
Copy link
Contributor

@janbartel janbartel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we will need to have a hangout, including @gregw to ensure we're all on the same page.

@@ -550,7 +550,7 @@ public void parse(final Set<? extends Handler> handlers, Resource r) throws Exce
if (!r.exists())
return;

if (FileID.isJavaArchive(r.getPath()))
if (FileID.isJavaArchive(r.getPath())) // TODO: this is now always false, as all Resource objects are directories
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the TODO comment. FileID just checks if the path ends in ".jar", so this should be fine?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The resource being passed in is never a raw path to a jar file in this method.
Even if the underlying resource was a JAR, by the time it reaches here the Resource a mounted JAR already, meaning r.getPath() is just a directory inside the JAR.

This entire method parse(handlers , r) is basically a no-op. nothing happens.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean that after these changes `parse(handlers, r) is a no-op, because it certainly isn't now.

@@ -968,7 +968,7 @@ else if (entry.getValue() == null) //can't work out provenance of SCI, so can't
{
for (Map.Entry<ServletContainerInitializer, Resource> entry : sciResourceMap.entrySet())
{
if (webInfJar.equals(entry.getValue()))
if (webInfJar.isContainedIn(entry.getValue()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this change. We're checking here if the resource that contains the SCI is the webInfJar that we're looking at. Makes no sense to check containment.

@@ -754,7 +754,7 @@ public boolean isFromExcludedJar(WebAppContext context, ServletContainerInitiali
boolean included = false;
for (Resource r : orderedJars)
{
included = r.equals(sciResource);
included = r.isContainedIn(sciResource);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment below.

* @throws RuntimeException if unable to convert the URI to URL
* @see URI#toURL()
*/
public static URL toURL(URI uri)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the motivation for changing the checked exception into a runtime exception?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it can be used in a stream

}
else
{
resources.add(resourceFactory.newResource(file.toPath()));
// Treat bundle as jar file that needs to be opened (so that resources within it can be found)
resources.add(resourceFactory.newJarFileResource(file.toPath().toUri()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See previous comments about this.


scanJars(context);
// Scan for META-INF/*.tld entries
metaInfTlds.addAll(scanTlds(streamTargets(containerTargets, webappTargets, true), metaInfTldCache));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the idea was not to iterate over the resources multiple times, but iterate just once and interrogate for tlds, fragments, resources in one go.

}

protected List<URI> getAllContainerJars(final WebAppContext context)
/**
* Only look for Servlet 3+ features ({@code META-INF/web-fragment.xml} and {@code META-INF/resources})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the right algorithm: we only need to skip fragments iff metadata-complete or < servlet3 or not discovered configuration. The current code is correct:

        if (context.getMetaData().isMetaDataComplete() || (context.getServletContext().getEffectiveMajorVersion() < 3) && !context.isConfigurationDiscovered())
            scanTypes.remove(METAINF_FRAGMENTS);

Comment on lines +72 to +81
if (scheme.equals("jar"))
{
URI container = URIUtil.unwrapContainer(uri);
if (container.getScheme().equals("file"))
{
// Just add a reference to the
loader.addClassPath(container.getPath());
continue;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this converting things like "jar:file:/usr/jetty/lib/somedependency.jar!/" back to just "/usr/jetty/lib/somedependency.jar"?

Are we to assume that we will never get anything like: "jar:file:/usr/jetty/lib/somedependency.jar!/WEB-INF/classes" or "jar:file:/usr/jetty/lib/somedependency.jar!/WEB-INF/lib/some.jar" ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@janbartel Can you give us a list of all the different uris that this method might see and what it is meant to do with each of them? i.e. what is "the brief" of this method?

@joakime
Copy link
Contributor Author

joakime commented Nov 10, 2023

@janbartel I've started testing performance of the MetaInfConfiguration in jetty 11 and jetty 12 using the war file produced by https://github.com/joakime/huge-war

That war is 433MB in size, with 319 jars in WEB-INF/lib, and over 218,000 classes (it is ultimately a nonsense war that nobody would deploy, just using it as an example of a worst case scenario of discovery / scanning)

The numbers so far.

  • Jetty 11.0.x (HEAD) - 2023-11-10 16:03:52.088:DEBUG:oejw.MetaInfConfiguration:main: preConfigure took 372ms
  • Jetty 12.0.x (HEAD) - 2023-11-10 16:09:11.018:DEBUG:oejw.MetaInfConfiguration:main: preConfigure took 1801ms
  • Jetty 12.0.x (metainf-overhaul branch) - 2023-11-10 16:13:02.526:DEBUG:oejew.MetaInfConfiguration:main: preConfigure took 575ms

I branched from jetty-12.0.x (and jetty-11.0.x) to add timing to preConfigure() method to get numbers from this effort.

While this PR is a vast improvement over jetty-12.0.x HEAD, it is not as fast as jetty-11.0.x HEAD atm.

Going back to 1 scan (with options) might bring us back down to jetty-11 numbers, but that approach messes up the stream processing behaviors this PR has (i'll noodle it this weekend)

// BEGIN - slf4j 2.x
/* SLF4J 2.0 */
// slf4j-api comes from paxexam now.
// res.add(mavenBundle().groupId("org.slf4j").artifactId("slf4j-api").versionAsInProject().noStart());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's probably not sync with ee9 :(
btw I noticed some issues recently and logging was a pain...
I'd like to extract some parts here which can be common and if possible really fix the logging

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@olamy but not in this PR, which is only for changes to MetaInfConfiguration.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@janbartel sure but my point is either we don't change anything in this PR for this or we keep both files in sync (because something tell me we will forget :) ) BTW they are probably already not in sync anymore.

@gregw gregw linked an issue Nov 13, 2023 that may be closed by this pull request
@gregw
Copy link
Contributor

gregw commented Nov 13, 2023

@joakime and @lorban I think this PR is missing the issue that I raised in #10164. I'm not that concerned about efficiency in our scanning of META-INF (if users want quick start then they can use quickstart).

I'm concerned that for some reason ResourceFactoryInternals$LifeCycle holds a reference to every path that has been looked up relative to a mounted jar, even if it does not exist nor could it ever have been a mount point.

If a webapp contains foo.jar and bar.jar, then I could kind of live with seeing mounts for:

  • jar:file:///tmp/jetty-xyz123/webapp/WEB-INF/lib/foo.jar!/
  • jar:file:///tmp/jetty-xyz123/webapp/WEB-INF/lib/bar.jar!/

but we also see mounts of :

  • jar:file:///tmp/jetty-xyz123/webapp/WEB-INF/lib/foo.jar!/META-INF/resources/
  • jar:file:///tmp/jetty-xyz123/webapp/WEB-INF/lib/foo.jar!/META-INF/web-fragment.xml
  • jar:file:///tmp/jetty-xyz123/webapp/WEB-INF/lib/bar.jar!/META-INF/resources/
  • jar:file:///tmp/jetty-xyz123/webapp/WEB-INF/lib/foo.jar!/META-INF/web-fragment.xml

I do not understand:

  • Why are we mounting entries relative to another mount point? doing myOpenJarFileResource.resolve("META-INF/resources/") should not create another mount point. If we are are mounting foo.jar, then I only expect the factory to remember a single mount point: jar:file:///tmp/jetty-xyz123/webapp/WEB-INF/lib/bar.jar!/. Every other Resource resolved against that is not a new mount point, it is just a new Resource relative to the exiting mount point, not a new mount point.
  • Even if we are mounting new resources rather than resolving against an existing, why are we keeping mounting things that do not exist? These jars do not have META-INF/resources/ nor META-INF/web-fragment.xml within them, so how could they be mount points? We cannot keep lists of things that do not exist as that is an infinite list!
  • How could META-INF/web-fragment.xml ever be a mount point, as it is not a directory and does not end with / ?

Thus my concerns are primarily with ResourceFactory, not with MetaInfConfiguration!

I have some secondary concerns about MetaInfConfiguration:

  • Why do we need to mount these jars anyway? They should already be available via the WebAppClassloader? Can't we just ask the classloader for the list of resources (small 'r') at META-INF/resources/ and META-INF/web-fragment.xml?
  • Why, once the webapp is up and running (scan completed), are references/mounts kept to any of these resources? They should be closed after the scan.

Now I'm sure some of these problems come from bad code in MetaInfConfiguration, but I'm really REALLY cautious about any wholesale refactoring of that class as we don't want to break expected behaviours.

@gregw
Copy link
Contributor

gregw commented Nov 13, 2023

I've had a bit more of a look with @janbartel and we can see a few of the problems.

MetaInfConfiguration is currently using newResource to get something like "META-INF/resources" rather than resolving against an existing Resource. However, that still does not explain why it is mounting non-existent resources. I'm also a little puzzled as to why it can't see that it already has a mount for a parent and just use that rather than mounting again?

With regards to using the Classloader, I think we could do something like context.getClassLoader().resources("META-INF/web-fragment.xml") to discover fragments and similar for META-INF/resources/. But this would not work for TLDs as I don't think there is a list semantic in the Classloader. We can't do context.getClassLoader().resources("META-INF/*.tld"), so unless getting the actual input stream of context.getClassLoader().getResourceAsStream("META-INF") returns a directory listing that we can parse, then I think we do have to mount the jar ourselves.

However, it is wrong to use the lifespan of the context for these mounts. We only needed them during starting and not once started. So perhaps a dedicated configuration ResourceFactory that is closed once started?

@janbartel
Copy link
Contributor

@gregw a dedicated start-time only ResourceFactory would be correct for some of the things that MetaInfConfiguration looks for. The lifetime of the various resources are:

. o.e.j.u.resource.Resources in META-INF/resources: ones that actually exist need to hang around for the entire lifetime of the context as they form part of the baseResource
. o.e.j.u.resource.Resource of META-INF/web-fragment.xml: if it exists, only needs to hang around until it is parsed, and the context is started
. o.e.j.u.resource.Resource of META-INF/*.tld: if they exist, only the URL is remembered and passed on to apache jsp

@joakime
Copy link
Contributor Author

joakime commented Nov 13, 2023

@joakime and @lorban I think this PR is missing the issue that I raised in #10164. I'm not that concerned about efficiency in our scanning of META-INF (if users want quick start then they can use quickstart).

I'm concerned that for some reason ResourceFactoryInternals$LifeCycle holds a reference to every path that has been looked up relative to a mounted jar, even if it does not exist nor could it ever have been a mount point.

If a webapp contains foo.jar and bar.jar, then I could kind of live with seeing mounts for:

  • jar:file:///tmp/jetty-xyz123/webapp/WEB-INF/lib/foo.jar!/
  • jar:file:///tmp/jetty-xyz123/webapp/WEB-INF/lib/bar.jar!/

but we also see mounts of :

  • jar:file:///tmp/jetty-xyz123/webapp/WEB-INF/lib/foo.jar!/META-INF/resources/
  • jar:file:///tmp/jetty-xyz123/webapp/WEB-INF/lib/foo.jar!/META-INF/web-fragment.xml
  • jar:file:///tmp/jetty-xyz123/webapp/WEB-INF/lib/bar.jar!/META-INF/resources/
  • jar:file:///tmp/jetty-xyz123/webapp/WEB-INF/lib/foo.jar!/META-INF/web-fragment.xml

This PR addresses this.
The number of open mount points at runtime is reduced significantly.

I do not understand:

  • Why are we mounting entries relative to another mount point? doing myOpenJarFileResource.resolve("META-INF/resources/") should not create another mount point. If we are are mounting foo.jar, then I only expect the factory to remember a single mount point: jar:file:///tmp/jetty-xyz123/webapp/WEB-INF/lib/bar.jar!/. Every other Resource resolved against that is not a new mount point, it is just a new Resource relative to the exiting mount point, not a new mount point.

If you use newResource that happens, but not resolve with an existing Resource.

  • Even if we are mounting new resources rather than resolving against an existing, why are we keeping mounting things that do not exist? These jars do not have META-INF/resources/ nor META-INF/web-fragment.xml within them, so how could they be mount points? We cannot keep lists of things that do not exist as that is an infinite list!

The MetaInfConfiguration in jetty-12.0.x HEAD was the culprit here.
It does not use Resource properly.

The extra mounts on META-INF/resources/ in jetty-12.0.x HEAD are from ...

resourcesDir = resourceFactory.newResource(URIUtil.uriJarPrefix(uri, "!/META-INF/resources"));

That code causes a new mount. (and there's many such examples)

Again, we WANT to open the mount for a JAR and keep it open for the entirety of the scanning, close when done all of the scanning.
Which is what this PR does.

There's no reason to track references the Jar files as normal files during MetaInfConfiguration.
Keep them as a mounted Resource only for as long as necessary.

  • How could META-INF/web-fragment.xml ever be a mount point, as it is not a directory and does not end with / ?

You can mount files, it is not known if that String points to a file or directory during the mount process.

As for that mount, it comes from jetty-12.0.x HEAD here ...

webFrag = resourceFactory.newResource(jar.getPath().resolve("META-INF/web-fragment.xml"));

and here ...

webFrag = resourceFactory.newResource(URIUtil.uriJarPrefix(uri, "!/META-INF/web-fragment.xml"));

Again, this PR cleans all of this up, and more.

Thus my concerns are primarily with ResourceFactory, not with MetaInfConfiguration!

Nope, of the problems you have listed in this comment, they are all totally within MetaInfConfiguration (as it exists in jetty-12.0.x HEAD)

I have some secondary concerns about MetaInfConfiguration:

  • Why do we need to mount these jars anyway? They should already be available via the WebAppClassloader? Can't we just ask the classloader for the list of resources (small 'r') at META-INF/resources/ and META-INF/web-fragment.xml?

We could, but wouldn't we need to create the WebAppClassLoader earlier and use that exclusively in MetaInfConfiguration?

  • Why, once the webapp is up and running (scan completed), are references/mounts kept to any of these resources? They should be closed after the scan.

That is what #10164 brought up, and this PR fixes this.

Now I'm sure some of these problems come from bad code in MetaInfConfiguration, but I'm really REALLY cautious about any wholesale refactoring of that class as we don't want to break expected behaviours.

We greatly enhanced the unit testing of MetaInfConfiguration with PR #10282
That was done with what is currently in jetty-12.0.x HEAD and is present in HEAD now.

This PR uses that new testcase, with minimal changes (only verifying behavior against mounts while in process).

@joakime
Copy link
Contributor Author

joakime commented Nov 13, 2023

@gregw a dedicated start-time only ResourceFactory would be correct for some of the things that MetaInfConfiguration looks for. The lifetime of the various resources are:

  • .o.e.j.u.resource.Resources in META-INF/resources: ones that actually exist need to hang around for the entire lifetime of the context as they form part of the baseResource .

This PR finds the META-INF/resources references that exist (and are a directory) and makes note of it in the context attribute at the key "org.eclipse.jetty.resources".
This eventually winds up being used in MetaInfConfiguration.configure(WebAppContext context) to update the Base Resource.
Other attempts that fail to resolve those directories are ignored (and do not result in a Resource or extra mount).

  • .o.e.j.u.resource.Resource of META-INF/web-fragment.xml: if it exists, only needs to hang around until it is parsed, and the context is started

These are stored in the context attribute at the key FragmentConfiguration.FRAGMENT_RESOURCES and is used by
FragmentConfiguration.preConfigure(WebAppContext context) to add the fragments to the metadata as FragmentDescriptor (with an internal Resource used later when it is being read from)

  • .o.e.j.u.resource.Resource of META-INF/*.tld: if they exist, only the URL is remembered and passed on to apache jsp

These are stored as a context attribute at key "org.eclipse.jetty.tlds" as a Set<URL>
These are used by the JettyJasperInitializer to establish a configured org.apache.jasper.servlet.TldScanner

@joakime joakime marked this pull request as ready for review November 13, 2023 17:20
@gregw
Copy link
Contributor

gregw commented Nov 13, 2023

@joakime

This PR addresses this. The number of open mount points at runtime is reduced significantly.

I think this PR avoids the issue rather than addresses it. It might be doing so in a good way, but it is not addressing the actual issue I am concerned about.

  • Why are we mounting entries relative to another mount point? doing myOpenJarFileResource.resolve("META-INF/resources/") should not create another mount point. If we are are mounting foo.jar, then I only expect the factory to remember a single mount point: jar:file:///tmp/jetty-xyz123/webapp/WEB-INF/lib/bar.jar!/. Every other Resource resolved against that is not a new mount point, it is just a new Resource relative to the exiting mount point, not a new mount point.

If you use newResource that happens, but not resolve with an existing Resource.

I understand that is what is happening. But the issue is that it should not be happening.

Consider the following code:

       Resource a = factory.newResource("jar:file:///tmp/jetty-xyz123/webapp/WEB-INF/lib/foo.jar!/");
       Resource b = factory.newResource("jar:file:///tmp/jetty-xyz123/webapp/WEB-INF/lib/foo.jar!/bar/");
       Resource c = a.resolve("bar/");

Currently this will create 2 mount points and b & c will be against different mounts. I think this is wrong.
A single mount should be created at "jar:file:///tmp/jetty-xyz123/webapp/WEB-INF/lib/foo.jar!/" and both b & c should reference that mount point and essentially be identical.

The MetaInfConfiguration in jetty-12.0.x HEAD was the culprit here. It does not use Resource properly.

I don't agree. There was never a distinction before between using newResource vs resolving of an existing one. Previously we used the underlying URL cache to cache a "mount" for the jar and the URL resulting from either approach would be identical. We have moved the cache from the JVM URL mechanism to an explicit ResourceFactory``, but that does not mean we needed to change the semantics. A call to newResource("jar:file:///tmp/foo.jar!/bar")should use/create a mount at file:///tmp/foo.jar` and internally resolve against that.

Again, I'm not saying that we cannot improve MetaInfConfiguration, and probably there is a lot of goodness in this PR.... but it is fundamentally avoiding the issue that I am concerned about.

I'll have a look at ResourceFactory and see if I can find exactly what I'm concerned about....

@joakime
Copy link
Contributor Author

joakime commented Nov 14, 2023

@joakime

This PR addresses this. The number of open mount points at runtime is reduced significantly.

I think this PR avoids the issue rather than addresses it. It might be doing so in a good way, but it is not addressing the actual issue I am concerned about.

  • Why are we mounting entries relative to another mount point? doing myOpenJarFileResource.resolve("META-INF/resources/") should not create another mount point. If we are are mounting foo.jar, then I only expect the factory to remember a single mount point: jar:file:///tmp/jetty-xyz123/webapp/WEB-INF/lib/bar.jar!/. Every other Resource resolved against that is not a new mount point, it is just a new Resource relative to the exiting mount point, not a new mount point.

If you use newResource that happens, but not resolve with an existing Resource.

I understand that is what is happening. But the issue is that it should not be happening.

Consider the following code:

       Resource a = factory.newResource("jar:file:///tmp/jetty-xyz123/webapp/WEB-INF/lib/foo.jar!/");
       Resource b = factory.newResource("jar:file:///tmp/jetty-xyz123/webapp/WEB-INF/lib/foo.jar!/bar/");
       Resource c = a.resolve("bar/");

Currently this will create 2 mount points and b & c will be against different mounts. I think this is wrong. A single mount should be created at "jar:file:///tmp/jetty-xyz123/webapp/WEB-INF/lib/foo.jar!/" and both b & c should reference that mount point and essentially be identical.

That example creates 1 mount point in FileSystemPool, with 2 uses being tracked in the pool.
That FileSystemPool mount is only on jar:file:///tmp/jetty-xyz123/webapp/WEB-INF/lib/foo.jar!/
The ResourceFactory factory is tracking 2 uses of it, so that when that ResourceFactory is closed, it can notify the FileSystemPool that those usages of that mount point are done.

The MetaInfConfiguration in jetty-12.0.x HEAD was the culprit here. It does not use Resource properly.

I don't agree. There was never a distinction before between using newResource vs resolving of an existing one. Previously we used the underlying URL cache to cache a "mount" for the jar and the URL resulting from either approach would be identical. We have moved the cache from the JVM URL mechanism to an explicit ResourceFactory, but that does not mean we needed to change the semantics. A call to ``newResource("jar:file:///tmp/foo.jar!/bar")should use/create a mount at file:///tmp/foo.jar` and internally resolve against that.

And in the example of Jetty 9 / Jetty 10 / Jetty 11 that example would have 2 usages in the URL cache.
One usage for the URL jar:file:///tmp/foo.jar!/bar and another, different URL cache entry at jar:file:///tmp/foo.jar!/.

Again, I'm not saying that we cannot improve MetaInfConfiguration, and probably there is a lot of goodness in this PR.... but it is fundamentally avoiding the issue that I am concerned about.

I'll have a look at ResourceFactory and see if I can find exactly what I'm concerned about....

Keep in mind that you are seeing the ResourceFactory tracking, not the mounts themselves.
The mounts themselves are in the FileSystemPool (at the JVM level)

@janbartel
Copy link
Contributor

@joakime I'm going to close this one, as any potential changes would need to be based off of head now.

@janbartel janbartel closed this Dec 13, 2023
@janbartel janbartel deleted the fix/12.0.x/metainf-overhaul branch December 13, 2023 22:43
@joakime
Copy link
Contributor Author

joakime commented Dec 13, 2023

I agree.
This state can be reached in a subsequent series of other PRs, doing it more slowly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants