-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#6549 - Apache Tika can not parse Microsoft Docx format in native mode #7198
#6549 - Apache Tika can not parse Microsoft Docx format in native mode #7198
Conversation
bom/runtime/pom.xml
Outdated
@@ -2762,6 +2775,32 @@ | |||
<artifactId>quarkus-banner</artifactId> | |||
<version>${project.version}</version> | |||
</dependency> | |||
|
|||
<!-- reflections --> | |||
<dependency> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am pretty sure we don't want to add this dependency to the core. We have basically done without it for all this time, so I don't really think it's warranted to add it now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @geoand - if you check the code the reflection framework is used in more than one place. I've just put it on more centralized place. Anyway - what I've tried to achieve here is to register for reflection all classes in given package. Could you please advice is there an other way to achieve it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use Jandex instead to query the application classes structures, see the examples in other extensions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @geoand ,
The package which I want to register contains the implementation of components for parsing docx format (org.openxmlformats.schemas.wordprocessingml.x2006.main.impl). Also will need to include all classes in packagess for parsing xlsx and pptx components.
I've tried to google this for Jandex, but without success. If you know how to do it in Jandex (to register all classes for given package) - this will save me a lot of time and I will remove the reflection framework :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think Jandex does this TBH. If you must register all classes in a package for reflection, then sure go ahead and use the reflections module (if @sberyozkin agrees), but please only use it in your extension and no where else
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @geoand , @gastaldi and @sberyozkin,
Reflections maven artifact is now moved under the tika-deployment.
@@ -27,6 +27,17 @@ | |||
<groupId>org.eclipse.microprofile.context-propagation</groupId> | |||
<artifactId>microprofile-context-propagation-api</artifactId> | |||
</dependency> | |||
<dependency> | |||
<groupId>xalan</groupId> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why ARC needs to depend on Xalan?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @gastaldi ,
If I put it on the other places the tika-integration-tests are failed.
I've added a few more notes it he PR description.
import org.apache.commons.lang3.ArrayUtils; | ||
import org.apache.commons.lang3.StringUtils; | ||
|
||
import java.lang.reflect.*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't use star imports, make sure to use the IDE config file as described in https://github.com/quarkusio/quarkus/blob/master/CONTRIBUTING.md#ide-config-and-code-style
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You were right @gastaldi -the IDE config file wasn't included. Now it is, however the wildcard import is still there :(
core/deployment/pom.xml
Outdated
@@ -79,6 +79,10 @@ | |||
<type>test-jar</type> | |||
<scope>test</scope> | |||
</dependency> | |||
<dependency> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As @geoand mentioned, I don't think it's a good idea to add this library to core
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @gastaldi ,
if you check the code the reflection framework is used in more than one place. I've just put it on more centralized place. Anyway - what I've tried to achieve here is to register for reflection all classes in given package. Could you please advice is there an other way to achieve it?
Not a problem to move it only on Tika related deploynment, but as I said this library is used on more than one place. In my experience if such things is happened, we have to move the dependency on more centralized place, or to not used it at all and to replace it with the similar functionality.
<exclusions> | ||
<exclusion> | ||
<groupId>org.apache.xmlbeans</groupId> | ||
<artifactId>xmlbeans</artifactId> | ||
</exclusion> | ||
</exclusions> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can remove this exclusion and the dependency below (since it's resolved transitively)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @gastaldi - There was a difference in the versions of XML Beans. I found only this way to deal with it.
Thank you @sberyozkin - I've added some comments/answers to @geoand and @gastaldi and added and the explanations in the PR. I also have some concerns (written in PR description) |
…ative mode - move reflections maven artifact under the tika-deployment module
Closing this one as out of date. Let's create a new PR if we still need it. |
Fixes: #6549
A few things to points out:
@ConfigProperty
inio.quarkus.it.tika.TikaEmbeddedContentTest
leads to NPE for native build. This one (@ConfigProperty without @Inject does not work in test #2061) claims that is fixed, but I am receiving it.