-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generate Tika XML configuration during build time #6752
Merged
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
21 changes: 0 additions & 21 deletions
21
.../tika/deployment/src/main/java/io/quarkus/tika/deployment/TikaParsersConfigBuildItem.java
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
41 changes: 0 additions & 41 deletions
41
extensions/tika/runtime/src/main/java/io/quarkus/tika/runtime/TikaParserParameter.java
This file was deleted.
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@machi1990 first of all, thanks for this effort :-)
I'm not sure about removing a build item containing this Map. My plan, with the current master, has been to make this Map visible to various steps dealing with the individual parsers. For example, I suggested to @tpenakov in the issue where he works on OOXML support to group all PDF related build steps into a single method like
preparePdfParser
and check this map if PDF parser key is available and only then do all those PDF specific native registrations, the same for OOXML.For example, this method, this one and this one are specific to PDF and the plan has been to make a single method
preparePDFParser
out of 3 of them and make all of that optional depending on whetherorg.apache.tika.parser.pdf.PDFParser
key is in the map or not.So I think that build item still has to be retained. Let me know please if it makes sense. CC @gsmet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @sberyozkin thank you.
Oh sorry, I was not aware of the future plans around this class.
If I understood the description above correctly, it seems to be that these various methods (e.g
preparePdfParser
) will be consuming this build item from within theTikaProcessor
class, correct?If so, the same be achieved with an invocation of a
private
method passing the parsers' map as a parameter ?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@machi1990 sorry, had to sign off for a bit.
Yes, I actually thought about it while being offline :-), but then TikaProcessor would just have a large single method :-), calling
preparePdfParser
,prepareOOXmlParser
, etc. It just looks a bit cleaner to me to have dedicated build steps per specific parser, or does the new CL model does not allow for it as far as Tika is concerned ? Not a big deal but wonder what is the best approach hereThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @machi1990 @gsmet so what do you think; I've been thinking today, I'd like, I think, to have a build step per each parser which requires some support. The main build step is the one which generates the tika config. may be it can be called as such,
generateTikaConfig
orcreateTikaConfig
, it would return a Map with the build item, and then there would be PDF build step, OOXML build step, etc as required which will optionally register the extra bits if the map has the parser key. Otherwise it would indeed have to be this master build step keeping adding private calls topreparePDF
.etc and it would feel a bit less cool to me :-)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is okay, I missed your ping too.
Hi @sberyozkin the change is not CL related but on whether we need the BuildItem to achieve the above.
If you prefer to keep it around, I can bring it back. @gsmet WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@machi1990 sorry, missed your comment. Yeah, it would be nice, I reckon it is not really related to the class loading issue, though I can see why Guillaume saw it being redundant, because the optimizations plans I'm referring to above are not yet implemented :-) and that item was only used to facilitate the sub-optimal tika config generation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not about being redundant. We certainly cannot pass those classes from build time to runtime and expect them to work.
I don't see the need for a build item if everything is done in the same extension.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gsmet This PR itself has preserved a reasonable number of build steps, one generates the configuration now, there are few build steps which deal with loading some resources, common to all of the Tika parsers, and some - common to PDF, and there would be OOXML specific build step coming in, we are not going ahead with collapsing them all into a single step in this PR. So it is clean and nice. Now, if this build item is removed, then for me to implement the optimization idea I had in mind I'd have to basically collapse everything (but a single step involving the Tika-common resources) into a single build step (generate the config, optionally load PDF/OOXML specific resources).
I appreciate a build item is great for coordinating between different extensions, but in this case it would help the tika processor keep the parser specific build steps separate, keeping the native optimization in mind. Np if you don't agree, please go ahead with the merge.
@machi1990 thanks for fix, cheers