-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ingest Attachment: Upgrade Tika to 1.18 #31252
Conversation
Fixes ES from hanging when a bad zip file is loaded through Tika.
Pinging @elastic/es-core-infra |
@@ -23,7 +23,7 @@ esplugin { | |||
} | |||
|
|||
versions << [ | |||
'tika': '1.17', | |||
'tika': '1.18', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we add a note here about the discrepancy between ES's dependency on jackson, and tika's?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
@@ -214,6 +214,10 @@ public void testAsciidocDocument() throws Exception { | |||
assertThat(attachmentData.get("content_type").toString(), containsString("text/plain")); | |||
} | |||
|
|||
public void testZipFileDoesNotHang() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a comment here linking to the apache-issue?
so that it is clear where this "bad_tika" file came from
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
@talevy Thanks for the feedback. Added comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM if the build is happy
@talevy Thanks again. Will wait for green and then commit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left one comment about the transitive dependencies.
@@ -23,7 +23,7 @@ esplugin { | |||
} | |||
|
|||
versions << [ | |||
'tika': '1.17', | |||
'tika': '1.18', | |||
'pdfbox': '2.0.8', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to check if these have been bumped in tika
itself as we don't pull in transitive dependencies automatically. If tika
bumped its dependency versions here, we should too. The same goes for bouncycastle
and poi
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Jackson is at 2.9.5, but it seems like we're at 2.8.10? Is there anything I can/should do about this?
For the others --
- Updated pdfbox 2.08 -> 2.09
- Bouncycastle is already at 1.55 and the compile dependency is 1.54 (I assume higher is okay?)
- poi and mime4j are already the same
- org.tukaani:xz 1.6 -> 1.8
*commons-io-commons-io 2.5 -> 2.6 - org.apache.commons:commons-compress 1.14 -> 1.16.1
I only checked against the versions in the gradle file specifically for ingest-attachment. Are there any other places I need to check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Jackson one is tricky since we inherit the dependency from core and we are version locked there. I think that means we need to leave that one as-is for now. The rest look good to be but I think we should take Bouncy Castle to 1.54 since the Tika POM does not say 1.54+. I would rather play it safe.
@jasontedor Thanks for the review. Please see the response to your comment. Updated the dependencies based on maven as best I could. |
@jasontedor I traced the dependency tree as far as I could based on your suggestions yesterday. Here's what I came up with: name expects / used
I think the dependencies as they are in this PR are the most up-to-date they can be now. Do you see anything that stands out to you? |
@jasontedor As for the build failure - looks like it happens with pdfbox 2.0.9 when built with java 10 and run against java 8. Run against java 10 it works fine. I attempted to add the permission that seemed like it was missing, but that did not fix the issue in java 8. Pdfbox 2.0.8 seems to work in all cases. Do you have any recommendations? Edit: Also tried adding the permission to the special tika permissions with no luck. Edit 2: So I don't know the security manager very well, but it appears what's happening is the top level doPrivileged domain has the correct permission. As we go further down the stack a bunch of other domains get added without the appropriate permission as it seems the first domain's permissions aren't propagated. This in turn is leading to the failure. Is there a way to ensure the permission is propagated or somehow special case this one? (Note this only is happening when run against Java 8 and Java 9. Java 10 seems to be unaffected.) Going to pause work on this for now as I need a bit of guidance to move forward. |
The commit that appears to be causing the issue: |
More info:
|
Looks like this passes in 10 because the class doesn't exist, so it never gets far enough to check permissions during Class.forName |
This LGTM and passes CI now (as well as local testing with java 10/8). I will commit and backport now. |
Fixes ES from hanging when a bad zip file is loaded through Tika.
Fixes ES from hanging when a bad zip file is loaded through Tika.
Fixes ES from hanging when a bad zip file is loaded through Tika.
Fixes ES from hanging when a bad zip file is loaded through Tika.
* elastic/master: (92 commits) Reduce number of raw types warnings (elastic#31523) Migrate scripted metric aggregation scripts to ScriptContext design (elastic#30111) turn GetFieldMappingsResponse to ToXContentObject (elastic#31544) Close xcontent parsers (partial) (elastic#31513) Ingest Attachment: Upgrade Tika to 1.18 (elastic#31252) TEST: Correct the assertion arguments order (elastic#31540) Add get field mappings to High Level REST API Client (elastic#31423) [DOCS] Updates Watcher examples for code testing (elastic#31152) TEST: Add bwc recovery tests with synced-flush index [DOCS] Move sql to docs (elastic#31474) [DOCS] Move monitoring to docs folder (elastic#31477) Core: Combine doExecute methods in TransportAction (elastic#31517) IndexShard should not return null stats (elastic#31528) fix repository update with the same settings but different type (elastic#31458) Fix Mockito trying to mock IOException that isn't thrown by method (elastic#31433) (elastic#31527) Node selector per client rather than per request (elastic#31471) Core: Combine messageRecieved methods in TransportRequestHandler (elastic#31519) Upgrade to Lucene 7.4.0. (elastic#31529) [ML] Add ML filter update API (elastic#31437) Allow multiple unicast host providers (elastic#31509) ...
@rjernst Thanks for finishing this! |
* 6.x: Fix broken backport of #31578 by adjusting constructor (#31587) ingest: Add ignore_missing property to foreach filter (#22147) (#31578) Add package pre-install check for java binary (#31343) Docs: Clarify sensitive fields watcher encryption (#31551) Watcher: Remove never executed code (#31135) Improve test times for tests using `RandomObjects::addFields` (#31556) Revert "Remove RestGetAllAliasesAction (#31308)" REST high-level client: add simulate pipeline API (#31158) Get Mapping API to honour allow_no_indices and ignore_unavailable (#31507) Fix Mockito trying to mock IOException that isn't thrown by method (#31433) (#31527) [Test] Add full cluster restart test for Rollup (#31533) Enhance thread context uniqueness assertion fix writeIndex evaluation for aliases (#31562) Add x-opaque-id to search slow logs (#31539) Watcher: Fix put watch action (#31524) [DOCS] Significantly improve SQL docs turn GetFieldMappingsResponse to ToXContentObject (#31544) TEST: Unmute testHistoryUUIDIsGenerated Ingest Attachment: Upgrade Tika to 1.18 (#31252) TEST: Correct the assertion arguments order (#31540)
* master: ingest: Add ignore_missing property to foreach filter (#22147) (#31578) Fix a formatting issue in the docvalue_fields documentation. (#31563) reduce log level at gradle configuration time [TEST] Close additional clients created while running yaml tests (#31575) Docs: Clarify sensitive fields watcher encryption (#31551) Watcher: Remove never executed code (#31135) Add support for switching distribution for all integration tests (#30874) Improve robustness of geo shape parser for malformed shapes (#31449) QA: Create xpack yaml features (#31403) Improve test times for tests using `RandomObjects::addFields` (#31556) [Test] Add full cluster restart test for Rollup (#31533) Enhance thread context uniqueness assertion [DOCS] Fix heading format errors (#31483) fix writeIndex evaluation for aliases (#31562) Add x-opaque-id to search slow logs (#31539) Watcher: Fix put watch action (#31524) Add package pre-install check for java binary (#31343) Reduce number of raw types warnings (#31523) Migrate scripted metric aggregation scripts to ScriptContext design (#30111) turn GetFieldMappingsResponse to ToXContentObject (#31544) Close xcontent parsers (partial) (#31513) Ingest Attachment: Upgrade Tika to 1.18 (#31252) TEST: Correct the assertion arguments order (#31540)
Fixes ES from hanging when a bad zip file is loaded through Tika.
Zip File found here: https://issues.apache.org/jira/browse/COMPRESS-432
Upgrades include
tika: 1.17 -> 1.18
commons-compress: 1.14 -> 1.16