Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tika: image parser fails in JVM mode with java.lang.LinkageError #8375

Closed
JiriOndrusek opened this issue Apr 3, 2020 · 11 comments
Closed

Tika: image parser fails in JVM mode with java.lang.LinkageError #8375

JiriOndrusek opened this issue Apr 3, 2020 · 11 comments
Labels
area/tika kind/bug Something isn't working

Comments

@JiriOndrusek
Copy link
Contributor

Describe the bug
According to the documentation of quarkus tika, almost all parsers should be native ready if they are correctly registered.

quarkus.tika.parsers

Comma separated list of the parsers which must be supported. Most of the document formats recognized by Apache Tika are supported by default but it affects the application memory and native executable sizes. One can list only the required parsers in tika-config.xml to minimize a number of parsers loaded into the memory, but using this property is recommended to achieve both optimizations. Either the abbreviated or full parser class names can be used. Only PDF and OpenDocument format parsers can be listed using the reserved 'pdf' and 'odf' abbreviations. Custom class name abbreviations have to be used for all other parsers. For example: // Only PDF parser is required: quarkus.tika.parsers = pdf // Only PDF and OpenDocument parsers are required: quarkus.tika.parsers = pdf,odf This property will have no effect if the `tikaConfigPath' property has been set.

There are several excluded ones (see https://github.com/quarkusio/quarkus/blob/master/extensions/tika/deployment/src/main/java/io/quarkus/tika/deployment/TikaProcessor.java#L39)

Previous text states that eg. imageParser should work correctly
I've tried to use imageParser in native, but it was failing even in JVM.

Expected behavior
Should work in JVM and native mode.

To Reproduce
Steps to reproduce the behavior:

I've created small reproducer

  1. Clone https://github.com/JiriOndrusek/tika_imageParser
  2. Run test via maven (eg. mvn clean install)
  3. See an error for imageParser test.

Error log

2020-04-03 10:28:09,695 INFO  [io.quarkus] (main) Quarkus 1.3.1.Final started in 1.541s. Listening on: http://0.0.0.0:8081
2020-04-03 10:28:09,695 INFO  [io.quarkus] (main) Profile test activated. 
2020-04-03 10:28:09,695 INFO  [io.quarkus] (main) Installed features: [cdi, resteasy, tika]
2020-04-03 10:28:10,689 ERROR [io.qua.ver.htt.run.QuarkusErrorHandler] (executor-thread-1) HTTP Request to /tika/parse failed, error id: a7f305d5-79c2-4b80-a59f-17fdd23268d9-1: org.jboss.resteasy.spi.UnhandledException: java.lang.LinkageError: loader constraint violation: loader (instance of <bootloader>) previously initiated loading for a different type with name "org/w3c/dom/Node"
	at org.jboss.resteasy.core.ExceptionHandler.handleApplicationException(ExceptionHandler.java:106)
	at org.jboss.resteasy.core.ExceptionHandler.handleException(ExceptionHandler.java:372)
	at org.jboss.resteasy.core.SynchronousDispatcher.writeException(SynchronousDispatcher.java:216)
	at org.jboss.resteasy.core.SynchronousDispatcher.invoke(SynchronousDispatcher.java:515)
	at org.jboss.resteasy.core.SynchronousDispatcher.lambda$invoke$4(SynchronousDispatcher.java:259)
	at org.jboss.resteasy.core.SynchronousDispatcher.lambda$preprocess$0(SynchronousDispatcher.java:160)
	at org.jboss.resteasy.core.interception.jaxrs.PreMatchContainerRequestContext.filter(PreMatchContainerRequestContext.java:362)
	at org.jboss.resteasy.core.SynchronousDispatcher.preprocess(SynchronousDispatcher.java:163)
	at org.jboss.resteasy.core.SynchronousDispatcher.invoke(SynchronousDispatcher.java:245)
	at io.quarkus.resteasy.runtime.standalone.RequestDispatcher.service(RequestDispatcher.java:73)
	at io.quarkus.resteasy.runtime.standalone.VertxRequestHandler.dispatch(VertxRequestHandler.java:122)
	at io.quarkus.resteasy.runtime.standalone.VertxRequestHandler.access$000(VertxRequestHandler.java:36)
	at io.quarkus.resteasy.runtime.standalone.VertxRequestHandler$1.run(VertxRequestHandler.java:87)
	at org.jboss.threads.ContextClassLoaderSavingRunnable.run(ContextClassLoaderSavingRunnable.java:35)
	at org.jboss.threads.EnhancedQueueExecutor.safeRun(EnhancedQueueExecutor.java:2027)
	at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.doRunTask(EnhancedQueueExecutor.java:1551)
	at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1442)
	at org.jboss.threads.DelegatingRunnable.run(DelegatingRunnable.java:29)
	at org.jboss.threads.ThreadLocalResettingRunnable.run(ThreadLocalResettingRunnable.java:29)
	at java.lang.Thread.run(Thread.java:748)
	at org.jboss.threads.JBossThread.run(JBossThread.java:479)
Caused by: java.lang.LinkageError: loader constraint violation: loader (instance of <bootloader>) previously initiated loading for a different type with name "org/w3c/dom/Node"
	at com.sun.imageio.plugins.png.PNGMetadata.getNativeTree(PNGMetadata.java:468)
	at com.sun.imageio.plugins.png.PNGMetadata.getAsTree(PNGMetadata.java:457)
	at org.apache.tika.parser.image.ImageParser.loadMetadata(ImageParser.java:103)
	at org.apache.tika.parser.image.ImageParser.parse(ImageParser.java:190)
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
	at io.quarkus.tika.TikaParser.parseStream(TikaParser.java:85)
	at io.quarkus.tika.TikaParser.parse(TikaParser.java:44)
	at io.quarkus.tika.TikaParser.parse(TikaParser.java:40)
	at io.quarkus.tika.TikaParser.parse(TikaParser.java:32)
	at jondruse.reproducers.tika.quickstart.imageparser.TikaParserResource.extractText(TikaParserResource.java:33)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.jboss.resteasy.core.MethodInjectorImpl.invoke(MethodInjectorImpl.java:167)
	at org.jboss.resteasy.core.MethodInjectorImpl.invoke(MethodInjectorImpl.java:130)
	at org.jboss.resteasy.core.ResourceMethodInvoker.internalInvokeOnTarget(ResourceMethodInvoker.java:621)
	at org.jboss.resteasy.core.ResourceMethodInvoker.invokeOnTargetAfterFilter(ResourceMethodInvoker.java:487)
	at org.jboss.resteasy.core.ResourceMethodInvoker.lambda$invokeOnTarget$2(ResourceMethodInvoker.java:437)
	at org.jboss.resteasy.core.interception.jaxrs.PreMatchContainerRequestContext.filter(PreMatchContainerRequestContext.java:362)
	at org.jboss.resteasy.core.ResourceMethodInvoker.invokeOnTarget(ResourceMethodInvoker.java:439)
	at org.jboss.resteasy.core.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:400)
	at org.jboss.resteasy.core.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:374)
	at org.jboss.resteasy.core.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:67)
	at org.jboss.resteasy.core.SynchronousDispatcher.invoke(SynchronousDispatcher.java:488)
	... 17 more



@JiriOndrusek JiriOndrusek added the kind/bug Something isn't working label Apr 3, 2020
@sberyozkin
Copy link
Member

sberyozkin commented Apr 3, 2020

@JiriOndrusek Thanks, unfortunately not all parsers are native ready, or more precisely, probably many of them are, but since there are many of them, I don't know for sure, I'll need to commit to updating the text. And we need to keep fixing the native issues related to specific parsers. I know OOXML parsers have problem, Image parser is another one.
Same applies to the JVM mode. PDF, Open Document do well, these are very mainstream, I think I saw some other parser mentioned.

@sberyozkin
Copy link
Member

There are 69 parsers :-) and a huge number of supported formats.

@sberyozkin
Copy link
Member

@stuartwdouglas Hi Stuart, do you have ideas why would LinkageError be thrown in this case ?

@JiriOndrusek Can you give me a favour please and try the same reproducer but without configuring anything in application.properties (that configuration is mainly about the optimization of the native image size, etc), example, in this project which uses quarkus-tika to parse the movies no tika configuration is done.

@stuartwdouglas
Copy link
Member

This happens if a deployment contains part of the XML parser API but not all of it. Some elements may be loaded from the system class loader, and others from QuarkusClassLoader.

@sberyozkin
Copy link
Member

@stuartwdouglas tnanks; I wonder if xml-apis dependency is to blame, which also causes #7359, if I exclude it in #7359 then DOM API can not be resolved, which is also strange as it should be in JDK now...

@JiriOndrusek
Copy link
Contributor Author

Hi @sberyozkin ,
sorry for the late response.

To be sure, I've tested it on quarkus quickstart: https://github.com/quarkusio/quarkus-quickstarts/tree/master/tika-quickstart

ERROR [io.qua.ver.htt.run.QuarkusErrorHandler] (executor-thread-1) HTTP Request to /parse/text failed, error id: 604c2d83-f830-47fa-b3b9-1365d32208b8-1: org.jboss.resteasy.spi.UnhandledException: java.lang.LinkageError: loader constraint violation: loader (instance of ) previously initiated loading for a different type with name "org/w3c/dom/Node"

  • If I run it with only parsers pdf and odf (original values in quickstart), there is no error.

I suspect that 1+ parsers which are by default used (among these 60+ parsers) are not compatible with native quarkus. (That is the reason, I've used application.properties to select parsers which will be used)I've tried xml and office parsers - they are working correctly. then I've used the image parser and it fails (so it's possible that imageParser is one of the broken ones, but there could be more of them)

@sberyozkin
Copy link
Member

sberyozkin commented Apr 9, 2020

Hi @JiriOndrusek np at all and thanks for the interesting feedback.
It is all probably not too bad then for the most mainstream parsers. The native image compilation itself is not a problem (except for those that I excluded - to be honest I don't know if anyone even uses those parsers), PDF/ODF/old Excel (the tests have an excel file), they are good. We can only find out about the other parsers I haven't even heard of :-) if a user reports an issue.
The project I linked parses the movies - I honestly don't know what parser does it :-). I'm pretty sure I saw other users definitely not using PDF/ODT but some other format parsers. So all in all things are more healthy than not :-)
ImageParser is a problem - which is really a dependency issue at this stage as opposed to some parser issue exposed in Native.
But what is interesting, you said

I've tried xml and office parsers

Which parsers are these ? See #6549 to do with OOXML parsers failing in native.

What may also be happening is that some code path in some parser is only activated if a doc is more complex etc

Thanks

@JiriOndrusek
Copy link
Contributor Author

@sberyozkin
I'was trying following properties:

quarkus.tika.parsers= pdf,odf,office,xml
quarkus.tika.parser.office = org.apache.tika.parser.microsoft.OfficeParser
quarkus.tika.parser.image = org.apache.tika.parser.image.ImageParser
quarkus.tika.parser.xml = org.apache.tika.parser.xml.DcXMLParser

@sberyozkin
Copy link
Member

@JiriOndrusek Thanks, OfficeParser is based on poi so #6549 may be specific to some specific DOCX files

@suchwerk
Copy link
Contributor

Any progress?

@JiriOndrusek
Copy link
Contributor Author

Issue could be closed. I rerun the original test (disabled because of this error) in camel-quarkus and the test works in JVM and native. Because this issue is almost 3 years old, I suppose that some change affected it in positive way.

@gsmet gsmet closed this as completed Jan 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/tika kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants