Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade to lucene 8.0.0-snapshot-ff9509a8df #39350

Merged
merged 5 commits into from
Feb 27, 2019

Conversation

romseygeek
Copy link
Contributor

@romseygeek romseygeek commented Feb 25, 2019

Contains the following:

  • LUCENE-8635: Move terms dictionary off-heap for non-primary-key fields in MMapDirectory
  • LUCENE-8292: TermsEnum is fully abstract
  • LUCENE-8679: Return WITHIN in EdgeTree#relateTriangle only when polygon and triangle share one edge
  • LUCENE-8676: Nori tokenizer deals correctly with large buffers
  • LUCENE-8697: GraphTokenStreamFiniteStrings better handles side paths with gaps
  • LUCENE-8664: Add equals and hashCode to TotalHits
  • LUCENE-8660: TopDocsCollector returns accurate hit counts if the total equals the threshold
  • LUCENE-8654: Polygon2D#relateTriangle fix for when the polygon is inside the triangle
  • LUCENE-8645: Intervals#fixField can merge intervals from different fields
  • LUCENE-8585: Create jump-tables for DocValues at index time

@romseygeek romseygeek added :Search/Search Search-related issues that do not fall into other categories >upgrade v7.0.0 v8.0.0 v7.2.0 labels Feb 25, 2019
@romseygeek romseygeek self-assigned this Feb 25, 2019
@romseygeek romseygeek requested review from jpountz and s1monw February 25, 2019 10:03
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search

@romseygeek romseygeek added the :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. label Feb 25, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@romseygeek romseygeek removed the :Search/Search Search-related issues that do not fall into other categories label Feb 25, 2019
Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that FilterTermsEnum#seekExact(BytesRef, TermState) delegates to the wrapper instance rather than to FilterTermsEnum.seekExact(BytesRef), I think it'd be safer if we checked whether the term is accepted in FieldNamesTermsEnum#seekExact(BytesRef, TermState).

@romseygeek
Copy link
Contributor Author

Looks like there might be a BWC issue in the Lucene80 Codec for norms:

org.apache.lucene.index.CorruptIndexException: codec mismatch: actual codec=Lucene70NormsMetadata vs expected codec=Lucene80NormsMetadata (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/var/lib/jenkins/workspace/elastic+elasticsearch+pull-request-bwc/qa/full-cluster-restart/build/cluster/v7.0.0#oldClusterTestCluster node1/elasticsearch-7.0.0-SNAPSHOT/data/nodes/0/indices/zYyufqjjSJuLRCN2RdQP-Q/2/index/_b.cfs") [slice=_b.nvm]))

@romseygeek
Copy link
Contributor Author

OK, BWC tests fail here because it's targetting 7.0.0, which currently is on a previous lucene snapshot without jump tables, so the indexes are not readable across both versions. Should I disable BWC tests while this PR is merged and backported?

@romseygeek
Copy link
Contributor Author

I think it'd be safer if we checked whether the term is accepted in FieldNamesTermsEnum#seekExact(BytesRef, TermState)

I updated to throw an IllegalStateException if the term isn't accepted, as this would mean that the TermState was from the underlying reader without being wrapped in the filter.

@romseygeek
Copy link
Contributor Author

Failure in ci/2 is due to #32299

@elasticmachine run elasticsearch-ci/2

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the ISE, LGTM.

@romseygeek
Copy link
Contributor Author

Packaging tests were still running with BWC enabled

@elasticmachine test elasticsearch-ci/packaging-sample

@romseygeek
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/packaging-sample

Copy link
Member

@jasontedor jasontedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a question.

boolean bwc_tests_enabled = true
final String bwc_tests_disabled_issue = "" /* place a PR link here when committing bwc changes */
boolean bwc_tests_enabled = false
final String bwc_tests_disabled_issue = "https://github.com/elastic/elasticsearch/pull/39350" /* place a PR link here when committing bwc changes */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason for disabling all BWC tests with this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BWC tests are currently being run against the 7.0.0 branch, which is using a previous lucene 8.0.0 snapshot. There has been a change in the Codec format between these two snapshots, which means that an elasticsearch running with this change cannot read lucene indexes built with the previous snapshot; upgrade tests between the current 7.0.0 branch head and this PR fail because the indexes cannot be opened.

If there's a finer-grained BWC suspension I should be doing instead, please let me know!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also means that people using 7.0-alpha and 7.0-beta won't be able to do a rolling upgrade to 7.0-GA, because they're using unreleased lucene snapshots that do not contain this Codec change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This relates to #38919. The issue here being that that change is still under discussion. Let me see if we can accelerate it.

This also means that people using 7.0-alpha and 7.0-beta won't be able to do a rolling upgrade to 7.0-GA, because they're using unreleased lucene snapshots that do not contain this Codec change.

That's okay, we do not support that to begin with. 😇

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I merged and back-ported #38919 . The discussion is still ongoing, but we should not hold anything up for it.

@romseygeek
Copy link
Contributor Author

It looks as though I need to separately disable BWC on the packaging tests?

romseygeek added a commit to romseygeek/elasticsearch that referenced this pull request Feb 27, 2019
romseygeek added a commit to romseygeek/elasticsearch that referenced this pull request Feb 27, 2019
@dadoonet
Copy link
Member

Side note, when we release a version, it would be helpful to think about updating this page? Not sure if this should be done before the release though as for example 7.0 branch has been yet released with 7.0.0-beta1 and users must use the previous Snapshot Lucene version when using HL Client 7.0.0-beta1 and not the one that this PR brings.

@romseygeek romseygeek merged commit 68c57ad into elastic:master Feb 27, 2019
romseygeek added a commit that referenced this pull request Feb 27, 2019
Backport of #39350

Contains the following:

* LUCENE-8635: Move terms dictionary off-heap for non-primary-key fields in `MMapDirectory`
* LUCENE-8292: `TermsEnum` is fully abstract
* LUCENE-8679: Return WITHIN in `EdgeTree#relateTriangle` only when polygon and triangle share one edge
* LUCENE-8676: Nori tokenizer deals correctly with large buffers
* LUCENE-8697: `GraphTokenStreamFiniteStrings` better handles side paths with gaps
* LUCENE-8664: Add `equals` and `hashCode` to `TotalHits`
* LUCENE-8660: `TopDocsCollector` returns accurate hit counts if the total equals the threshold
* LUCENE-8654: `Polygon2D#relateTriangle` fix for when the polygon is inside the triangle
* LUCENE-8645: `Intervals#fixField` can merge intervals from different fields
* LUCENE-8585: Create jump-tables for DocValues at index time
romseygeek added a commit that referenced this pull request Feb 27, 2019
Backport of #39350

Contains the following:

* LUCENE-8635: Move terms dictionary off-heap for non-primary-key fields in `MMapDirectory`
* LUCENE-8292: `TermsEnum` is fully abstract
* LUCENE-8679: Return WITHIN in `EdgeTree#relateTriangle` only when polygon and triangle 
  share one edge
* LUCENE-8676: Nori tokenizer deals correctly with large buffers
* LUCENE-8697: `GraphTokenStreamFiniteStrings` better handles side paths with gaps
* LUCENE-8664: Add `equals` and `hashCode` to `TotalHits`
* LUCENE-8660: `TopDocsCollector` returns accurate hit counts if the total equals the threshold
* LUCENE-8654: `Polygon2D#relateTriangle` fix for when the polygon is inside the triangle
* LUCENE-8645: `Intervals#fixField` can merge intervals from different fields
* LUCENE-8585: Create jump-tables for DocValues at index time
@romseygeek romseygeek deleted the upgrade-lucene branch February 27, 2019 14:38
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Feb 27, 2019
* master:
  Upgrade to lucene 8.0.0-snapshot-ff9509a8df (elastic#39350)
romseygeek added a commit that referenced this pull request Feb 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. >upgrade v7.0.0-rc1 v7.2.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants