Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading manifest from jar in jar-based PathTreeWithManifest is suboptimal #41395

Open
gsmet opened this issue Jun 24, 2024 · 8 comments
Open

Comments

@gsmet
Copy link
Member

gsmet commented Jun 24, 2024

I will probably have a look at this soon but I wanted to log my findings. /cc @aloubyansky

(This is part of my overall work on class loader leaking even if it has nothing to do with it)

in ArchivePathTree, we have this method:

    @Override
    protected <T> T apply(String relativePath, Function<PathVisit, T> func, boolean manifestEnabled) {
        ensureResourcePath(relativePath);
        if (!PathFilter.isVisible(pathFilter, relativePath)) {
            return func.apply(null);
        }
        if (manifestEnabled) {
            relativePath = toMultiReleaseRelativePath(relativePath);
        }
        try (FileSystem fs = openFs()) {
            for (Path root : fs.getRootDirectories()) {
                final Path path = root.resolve(relativePath);
                if (!Files.exists(path)) {
                    continue;
                }
                return PathTreeVisit.process(archive, root, path, pathFilter, func);
            }
        } catch (IOException e) {
            throw new UncheckedIOException("Failed to read " + archive, e);
        }
        return func.apply(null);
    }

toMultiReleaseRelativePath(relativePath); will actually read the Manifest (I have a patch to avoid doing that for META-INF entries but in the end we will read the Manifest at some point).

Reading the manifest is done by using an InputStream that will in the end use java.util.zip.Inflater to read the jar. From my reports, it actually takes quite a lot of memory to do that (the quarkus-ide-launcher jar probably doesn't help).

Given we actually open a zip fs at the line below, we could actually use this zip fs to read the manifest and my guess is that it would be a bit more optimal.

Not sure yet how to rework the code to be able to do that without messing things up too much in the hierarchy.

@aloubyansky thoughts?

See the stack here:

Screenshot from 2024-06-24 18-16-28

@dmlloyd
Copy link
Member

dmlloyd commented Jun 24, 2024

I believe that the zipfs would also use an inflater, right? In fact, isn't that what the stack trace shows?

@gsmet
Copy link
Member Author

gsmet commented Jun 24, 2024

No, the stracktrace is showing this call:

try (InputStream is = Files.newInputStream(visit.getPath())) {
return new Manifest(is);

@gsmet
Copy link
Member Author

gsmet commented Jun 24, 2024

So we do this ^ to read the manifest (the manifest is cached but we have to read it once), and then we open the zipfs to read the actual file.

@aloubyansky
Copy link
Member

@gsmet you want to read the manifest in a different way or avoid reading it?
We have to read it to properly support multi-release content.

@aloubyansky
Copy link
Member

There should also be API to not consult manifest entries.

@dmlloyd
Copy link
Member

dmlloyd commented Jun 24, 2024

One simple way we could avoid inflating things is to suggest/encourage/provide a default for users to not use compression on their JARs.

@gsmet
Copy link
Member Author

gsmet commented Jun 24, 2024

I should have been more precise: given we open a Zip FS to actually read the entry, I think we should use this same Zip FS to read the manifest and not open/read the jar twice as we do at the moment. The pattern is always this one: we check if multi-release so we read the manifest and then we open the fs to actually read the entry.
Granted it only happens once but still it looks like something we could avoid.
FWIW, Eclipse MAT was complaining about this particular call using quite a lot of memory.

I created a PR with a simple improvement than would avoid reading the manifest when we read from META-INF: #41397 .

But for most jars, at some point, we will read some classes so I wonder if it would be worth trying to optimize this.
Now it might just be an artifact of Eclipse MAT. I'm trying to open issues when I stumble upon them, otherwise it's hard to have the details later.

Copy link

quarkus-bot bot commented Jun 25, 2024

/cc @Sanne (core), @radcortez (core), @stuartwdouglas (core)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants