-
-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[experiment] Disable Elasticsearch field norms #323
Conversation
f05a94e
to
b6f3d9b
Compare
🎉 Unit tests and Integration tests passing on CI for all combinations of nodejs versions |
b6f3d9b
to
4a9ba18
Compare
This is awesome, so many good changes here. |
mappings/document.js
Outdated
analyzer: 'peliasPhrase', | ||
fielddata : { | ||
loading: 'eager_global_ordinals' | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It appears in 2.4 fielddata is enabled by default. How about we remove this change from this PR to keep it as small as possible, and let #302 handle it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removal of the fielddata
sections has been undone via rebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
4a9ba18
to
8c77019
Compare
8c77019
to
5feb5c5
Compare
cc/ @Ashleyholt, please forward this on to your team :) |
That's a great job! |
Ran a full planet import yesterday on this branch with ES 5.6; here are some observations: Import
Performance
For context, the following services were running during the loadtest:
Interpolation was not running. |
Hey @gopi-ar, All our indications are that ES5 should be much faster (especially reverse queries, as they've improved the algorithms for We are planning an Elasticsearch 5 checkin sometime next week, perhaps you can join us and we can figure out what sort of performance to expect with ES5? |
@orangejulius yep, we were also expecting a huge bump with reverse since photon saw a drastic increase as well. Also, pre-loading the whole index into RAM didn't seem to help. RAM usage for 2 types of settings we tried for the whole build are:
|
…ards compatibility with 2.x
5feb5c5
to
a51b0f2
Compare
An update on the status of this branch: we initially tried out disabling It turns out those changes didn't translate very well to real world testing, so we split most of the work in this PR out into #328 and merged that. We'll do a little more testing, but most likely will close this particular PR. |
This is the PR to add schema support for elasticsearch 5.6.12 while still maintaining backwards compatibility with 2.4.
This feature has been long requested, so I'm happy to have a PR ready for testing 😃
About the upgrade
Elasticsearch 5.x is mostly backwards compatible with 2.x, so not many changes are required at this stage, we will make more significant changes in the migration from 5->6, but this PR is the start of a bunch of work to modernize the elasticsearch version supported by Pelias.
One of the major changes between the two versions is the 'Similarity', in 2.4 this was something called 'default' (now called 'classic'). In version 5 this was replaced by BM25 as the default similarity scoring engine.
As a result of the similarity algorthm changes, queries will score differently, the actual floating point numbers allocated by elasticsearch are different when compared to 2.4.
Pelias is only concerned with relative scoring between documents, however in some cases there are minor difference in relative scoring.
These scoring differences are fairly rare, I have disabled 'norms' in this PR which helps in mitigating this issue.
Please be aware that there will inevitably be changes to sorting order in a small amount of queries, for the most part these will be improvements, if there are regressions please open an issue to discuss.
The other major change was in query composition, for instance the error message
no [query] registered for [filtered]
is thrown for queries which begin{ "query": { "filtered"
.We started rewriting these queries in Pelias some time ago, when I ran automated tests and manual checks I was not able to trigger any query errors, I suspect that we have already completed the query migration, again we would be happy to hear if we've missed any 😃
Upgrade period & deprecation notice
I wanted to do this in a way that is also backwards compatible with [email protected], so that users don't have a 'hard cutover', giving them some time to allocate resources to get their infrastructure moved over to
5.6.12
.The plan is to maintain support for both 2.4 & 5.6 for only as long is required to confirm that the transition was successful, at which stage we deprecate support for 2.4.
We'll try to support both where we can, but sometime in 2019 we will completely drop support for 2.4 so that we can move forward and start taking advantage of new features only available in versions 5 & 6.
How to enable support for 5.6?
You'll need to run the code in this PR and ensure you have a current version of
npm elasticsearch
installed. (ie. runnpm install
)Then you will need to change the
esclient.apiVersion
in yourpelias.json
file.A quick note on versions:
apiVersion
to5.6
then it will work fine against either version of elasticsearch.apiVersion
to2.4
then it will only work against2.4
, an error will be thrown when trying to use a5.6
server with a2.4
client.pelias/config
to be5.6
Which docker images should I use?
For elasticsearch 5.6 you can use
pelias/elasticsearch:5.6.12
:For elasticsearch 2.4 you can use
pelias/elasticsearch:2.4
:Note that a PR will be coming soon for pelias/docker which updates the existing 'projects'.
What's in the PR?
npm elastictest
to support5.6
npm elasticsearch
to support new APIs5.6.12
(updated readme on how to run this locally)elastictest
so that the correct APIs are used for integration testsSchema Changes
geo_point
The geo_point properties were removed in #321
norms
norms have been disabled for all relevant fields using the syntax:
norms: { enabled: false }
norms are useful for a traditional fulltext search engine where 'smaller' fields like a title are considered more important than 'larger' fields such as the text body.
this is not relevant to geocoding so I have disabled norms, this is required in this PR because without it the BM25 algorithm takes norms in to account and scores integration test queries in a way which isn't compatible between versions.
the other benefit of disabling norms is reducing the amount of disk storage (see the documentation linked above for more info).