Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

org.eclipse.jdt.internal.core.builder.ClasspathJar.knownPackageNames slow and not threadsafe #3250

Closed
jukzi opened this issue Nov 5, 2024 · 13 comments
Assignees
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@jukzi
Copy link
Contributor

jukzi commented Nov 5, 2024

ClasspathJar.isPackage(String, String) seems to be used from multiple threads, uses ClasspathJar.knownPackageNames, which does not seem to have any synchronization. While ClasspathJar.findPackageSet() tries to be thread safe it returns a SimpleSet which itself has no threading guaranties.
Maybe that relates to #2877 ?
It might also be more readable if the SimpleSet would know to include only <String> in this usage. Maybe just use a some standard java Set.

I only stumbled across it when i saw ClasspathJar.isPackage() was reported to be the hotspot of hibernate annotation processing on a big project:
image
that feels inappropriate.

@iloveeclipse
Copy link
Member

ClasspathJar.knownPackageNames seem to be immutable, so ideally we properly compute it once (synchronized) and also replace it with an immutable default set.

Maybe just use a some standard java Set.

I assume replacing SimpleSet with HashSet would be right thing to do, also in terms of performance.

@srikanth-sankaran
Copy link
Contributor

Maybe that relates to #2877 ?

Sounds plausible, it would be terrific if could be confirmed and closed. Good luck.

@iloveeclipse iloveeclipse added help wanted Extra attention is needed bug Something isn't working labels Nov 5, 2024
@jukzi
Copy link
Contributor Author

jukzi commented Nov 5, 2024

In a heapdump i see, that the JDK packages are in those sets
image
and that those sets have been typically oversized (most contain a single or few element while initialized with a threshold of 42):
image

i will work on a PR

@jukzi jukzi self-assigned this Nov 5, 2024
@jukzi
Copy link
Contributor Author

jukzi commented Nov 5, 2024

performance sidenode: that org.eclipse.jdt.internal.core.builder.NameEnvironment.findClass(String, char[], LookupStrategy, String) searches in all libraries ("relevantLocations") with a loop while intuitively it would be clear in which library to search by package name. Also it feels contraproductive in LookupEnvironment.createPlainPackage(char[][]) to search for obviously not existing packages like "com" (which is the first part of the fqn) (just to make sure there is no collision with a class named like a package?)

@iloveeclipse
Copy link
Member

while intuitively it would be clear in which library to search by package name

Not sure how? Library names not always match package names, or usually they don't match.

search for obviously not existing packages like "com"

Why that? "com" is valid package name. Unusual to place classes there, yes, but you never know what crazy code users will use or write :)

@jukzi
Copy link
Contributor Author

jukzi commented Nov 5, 2024

Not sure how?

the libraries don't change frequently so there could be something like a Map<String, Set<ClasspathJar>> that tells in which libraries a package could be found instead in searching in all of them for every class.

The class that is search is for example a.b.c.D and that compiler searches most of the time for a.b.class ... why??? To me it feels more like a bug then a feature.

@stephan-herrmann
Copy link
Contributor

Not sure how?

the libraries don't change frequently so there could be something like a Map<String, Set<ClasspathJar>> that tells in which libraries a package could be found instead in searching in all of them for every class.

The class that is search is for example a.b.c.D and that compiler searches most of the time for a.D.class ... why??? To me it feels more like a bug then a feature.

So you are speaking of caching, not of changing the strategy to search in the first place?

In that way, what you call "intuitively" (which is definitely non-compiler-like) should in fact be: the name environment could remember ... similar to how LookupEnvironment puts TheNotFoundPackage into its caches?

@jukzi
Copy link
Contributor Author

jukzi commented Nov 5, 2024

Yea should be possible either somehow cache the result or precompute such a map whenever the libs change.

Here is an example Stacktrace with manual added parameters in front - which feel wrong:

Thread [Worker-39: Building] (Suspended (breakpoint at line 300 in ClasspathJar))	
"a"	ClasspathMultiReleaseJar(ClasspathJar).isPackage(String, String) line: 300	
"b.class","a"	ClasspathMultiReleaseJar.findClass(String, String, String, String, boolean, Predicate<String>) line: 127	
	ClasspathMultiReleaseJar(ClasspathLocation).findClass(char[], String, String, String, boolean, Predicate<String>) line: 76	
	NameEnvironment.findClass(String, char[], IModuleAwareNameEnvironment$LookupStrategy, String) line: 568	
"b","a"	NameEnvironment.findType(char[], char[][], char[]) line: 600	
"b","a"	NameEnvironment(IModuleAwareNameEnvironment).findType(char[], char[][]) line: 101	
"a.b.c.D"	LookupEnvironment.createPlainPackage(char[][]) line: 1149	
	CompilationUnitScope.buildTypeBindings(AccessRestriction) line: 138	
	LookupEnvironment.buildTypeBindings(CompilationUnitDeclaration, AccessRestriction) line: 505	
	Compiler.internalBeginToCompile(ICompilationUnit[], int) line: 855	
	Compiler.beginToCompile(ICompilationUnit[]) line: 393	
	Compiler.compile(ICompilationUnit[], boolean) line: 447	
	Compiler.compile(ICompilationUnit[]) line: 425	
	BatchImageBuilder(AbstractImageBuilder).compile(SourceFile[], SourceFile[], boolean) line: 411	
	BatchImageBuilder.compile(SourceFile[], SourceFile[], boolean) line: 214	
	BatchImageBuilder(AbstractImageBuilder).compile(SourceFile[]) line: 342	
	BatchImageBuilder.build() line: 79	
	JavaBuilder.buildAll() line: 286	
	JavaBuilder.build(int, Map, IProgressMonitor) line: 192	
	BuildManager$2.run() line: 1077	
	SafeRunner.run(ISafeRunnable) line: 47	
	BuildManager.basicBuild(int, IncrementalProjectBuilder, Map<String,String>, MultiStatus, IProgressMonitor) line: 296	
	BuildManager.basicBuild(IBuildConfiguration, int, IBuildContext, ICommand[], MultiStatus, IProgressMonitor) line: 352	
	BuildManager$1.run() line: 441	
	SafeRunner.run(ISafeRunnable) line: 47	
	BuildManager.basicBuild(IBuildConfiguration, int, IBuildContext, MultiStatus, IProgressMonitor) line: 444	
	BuildManager.basicBuildLoop(IBuildConfiguration[], IBuildConfiguration[], int, MultiStatus, IProgressMonitor, int, boolean) line: 555	
	BuildManager.basicBuildLoop(IBuildConfiguration[], IBuildConfiguration[], int, MultiStatus, IProgressMonitor) line: 503	
	BuildManager.build(IBuildConfiguration[], IBuildConfiguration[], int, IProgressMonitor) line: 585	
	AutoBuildJob.doBuild(IProgressMonitor) line: 207	
	AutoBuildJob.run(IProgressMonitor) line: 300	
	Worker.run() line: 63	

@stephan-herrmann
Copy link
Contributor

which feel wrong

What exactly feels wrong, and why?

@stephan-herrmann
Copy link
Contributor

Yea should be possible either somehow cache the result or precompute such a map whenever the libs change.

I wonder why you keep speaking of lib changes? Libraries typically don't change while a build is running. Are you considering to persist cached information?

@jukzi
Copy link
Contributor Author

jukzi commented Nov 5, 2024

Libraries typically don't change while a build is running

Even better, then just map all packages to libs once per build instead of per cu :-)

What exactly feels wrong, and why?

It does not feel right to first search for a non-existing class a.b just because we search for a.b.c.D - which does exist. Doesn't the existence of a.b.c.D somehow advise that a.b is a package instead of a class? I wouldn't complain if we wouldn't do it a gazillion times.

@iloveeclipse
Copy link
Member

Doesn't the existence of a.b.c.D somehow advise that a.b is a package instead of a class?

No. In the search scope you may have both at same time, multiple times :-)

@stephan-herrmann
Copy link
Contributor

What exactly feels wrong, and why?

It does not feel right to first search for a non-existing class a.b just because we search for a.b.c.D - which does exist. Doesn't the existence of a.b.c.D somehow advise that a.b is a package instead of a class? I wouldn't complain if we wouldn't do it a gazillion times.

Keep in mind that a.b.c.D could be a second-level nested class in a class a.b. So semantically each qualified name must be evaluated one segment at a time. If ever we take a shortcut, solid reasoning is required, why any different interpretations of the name have been ruled out.

The compiler has quite complex code to analyze qualified names. It's a pity that the name environment cannot use the information cached in LookupEnvironment.

Another complication: when one request for potential class a.b fails, it is not safe to assume that such a class doesn't exist. It only means that the requesting module has no such class in its dependencies. During the same build a different request could come from a different module, where such a class exists.

jukzi pushed a commit to jukzi/eclipse.jdt.core that referenced this issue Nov 5, 2024
* use Set<String>
* cache a unmodifiable copy
* simplify PackageCacheEntry to record
* add missing final/volatile modifiers
* remove ClasspathJrt.PackageCache (only module names had been used)
* don't calculate unused location of module.info
* remove closeZipFileAtEnd (was always true when zipfile!=null)

eclipse-jdt#3250
jukzi pushed a commit to jukzi/eclipse.jdt.core that referenced this issue Nov 6, 2024
* use Set<String>
* cache a unmodifiable copy
* simplify PackageCacheEntry to record
* add missing final/volatile modifiers
* remove ClasspathJrt.PackageCache (only module names had been used)
* don't calculate unused location of module.info
* remove closeZipFileAtEnd (was always true when zipfile!=null)

eclipse-jdt#3250
jukzi pushed a commit that referenced this issue Nov 12, 2024
* use Set<String>
* cache a unmodifiable copy
* simplify PackageCacheEntry to record
* add missing final/volatile modifiers
* remove ClasspathJrt.PackageCache (only module names had been used)
* don't calculate unused location of module.info
* remove closeZipFileAtEnd (was always true when zipfile!=null)

#3250
@jukzi jukzi closed this as completed Nov 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

4 participants