Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade search app to Elastic Search 5.4 #3373

Closed
wants to merge 22 commits into from
Closed

Conversation

ericholscher
Copy link
Member

@ericholscher ericholscher commented Dec 7, 2017

This upgrades elastic search, and fixes indexing so that it works on 5.4.

Also integrates the new 5.4 search code in ext. I'm not 100% convinced this shouldn't live in the .org codebase, but just following existing convention of where the docsearch code was.

Test failures are unrelated to this change.

@RichardLitt RichardLitt added the PR: work in progress Pull request is not ready for full review label Dec 7, 2017
@ericholscher ericholscher removed the PR: work in progress Pull request is not ready for full review label Dec 11, 2017
Copy link
Member

@humitos humitos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have the knowledge to judge the ES part, but it seems OK otherwise.

@@ -43,7 +43,7 @@ def get_version_compare_data(project, base_version=None):
}
if highest_version_obj:
ret_val['url'] = highest_version_obj.get_absolute_url()
ret_val['slug'] = highest_version_obj.slug,
ret_val['slug'] = highest_version_obj.slug
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing this comma produces tests to fail... I had the same problem and I ended up writing

(highest_version_obj.slug, )

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the test should be fixed. Its expecting a tuple but it should get the slug in sting

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@safwanrahman agreed -- want to PR it to master, and I'll merge it in here?

@@ -116,6 +116,7 @@ def INSTALLED_APPS(self): # noqa
if donate:
apps.append('django_countries')
apps.append('readthedocsext.donate')
apps.append('readthedocsext.search')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this under if donate:?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merged #3356, this can be rebased

groups.append([
url(r'^sustainability/', include('readthedocsext.donate.urls')),
])
groups.insert(0,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change?

Seems to be the same, anyway.

groups.insert(0,
[url(r'^sustainability/', include('readthedocsext.donate.urls'))]
)
for num, _url in enumerate(rtd_urls):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question again. This is only if the donate app is installed, why? I don't find the relationship between donate app and search app.

@safwanrahman
Copy link
Member

Is it possible to upgrade latest 6.x?

@ericholscher
Copy link
Member Author

ericholscher commented Dec 12, 2017

@safwanrahman that breaks a lot of what we do, specifically removing the ability to have multiple doctypes per index, so I want to keep things here until we have more time to do a full rewrite. This is an incremental step.

'lang': {'type': 'string', 'index': 'not_analyzed'},
'tags': {'type': 'string', 'index': 'not_analyzed'},
'privacy': {'type': 'string', 'index': 'not_analyzed'},
'id': {'type': 'keyword'},
Copy link
Member

@safwanrahman safwanrahman Dec 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is not id is integer or long?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the new ES doesn't let these vary types, or something. I was getting an error, and it doesn't really need to be an ID, so we will just use it as a string like the others.

Copy link
Member

@safwanrahman safwanrahman Dec 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ericholscher Seems like they support it
https://www.elastic.co/guide/en/elasticsearch/reference/current/number.html

Without ID, what can be here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does seem strange that we're coercing to string here, but I'm not familiar enough with ES to say why token/string is better.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not that it's better, it just doesn't work otherwise. I don't fully understand it either.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It throws this error: RequestError: TransportError(400, u'illegal_argument_exception', u'mapper [id] cannot be changed from type [long] to [keyword]')

Copy link
Contributor

@agjohnson agjohnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I can't comment much on the ES implementation details, but it seems reasonable. The long -> string coercion does seem like something we want to avoid if possible, but not really sure of the answer there.

'lang': {'type': 'string', 'index': 'not_analyzed'},
'tags': {'type': 'string', 'index': 'not_analyzed'},
'privacy': {'type': 'string', 'index': 'not_analyzed'},
'id': {'type': 'keyword'},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does seem strange that we're coercing to string here, but I'm not familiar enough with ES to say why token/string is better.

@agjohnson agjohnson added the PR: work in progress Pull request is not ready for full review label Dec 14, 2017
@ericholscher ericholscher added the Status: blocked Issue is blocked on another issue label Dec 14, 2017
@xrmx
Copy link
Contributor

xrmx commented Mar 7, 2018

I've rebased this on master here: https://github.com/italia/readthedocs.org/tree/upgrade-es-rebased

It fails for me against ES 5.6.8 with the following:

{u'index':
  {u'status': 400, u'_type': u'page', u'_index': u'readthedocs', u'error': {
    u'caused_by': {
      u'reason': u'For input string: "de3a71b6edf5001ece51f4ea16ca4758"', u'type': u'number_format_exception'
    },
    u'reason': u'failed to parse [id]', u'type': u'mapper_parsing_exception'
  },

@xrmx
Copy link
Contributor

xrmx commented Mar 13, 2018

I've done some more work here: https://github.com/italia/readthedocs.org/tree/upgrade-es-rebased

Now we are indexing again, updated the search regarding filtering and faceting (now called aggs) to what ES 5 expects.

Still broken:

TransportError(400, u'search_phase_execution_exception', u'Fielddata is disabled on text fields by default. Set fielddata=true on [lang] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead.')

@ericholscher
Copy link
Member Author

ericholscher commented Jun 6, 2018

Closing this, hoping to be replaced by #4183 which uses Elastic Search 6.

@stsewd stsewd deleted the upgrade-es branch September 6, 2018 20:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PR: work in progress Pull request is not ready for full review Status: blocked Issue is blocked on another issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants