-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fix #4265] Port Document search API for Elasticsearch 6.x #4309
[fix #4265] Port Document search API for Elasticsearch 6.x #4309
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think most of my questions are around whether we're using proper querysets, or we have things that look like querysets, but are actually search results that we're filtering. Could use a bit more explanation, and code comments if that is what we're doing.
@@ -60,14 +61,19 @@ class Meta(object): | |||
content = fields.TextField(attr='processed_json.content') | |||
path = fields.TextField(attr='processed_json.path') | |||
|
|||
# Fields to perform search with weight | |||
search_fields = ['title^10', 'headers^5', 'content'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this used by the DocType
class directly, or are we storing it here to add later? If the prior, we should probably set it in an __init__
method or somewhere else, so it doesn't get confused with the other data here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its not actually used by DocType
class, but we are using it in class method. I dont know if keeping it in __init__
method will make it work in class method!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrm, good point. I guess it's fine for now then.
readthedocs/search/api.py
Outdated
|
||
def get_queryset(self): | ||
query = self.request.query_params.get('query') | ||
queryset = PageDocument.search(query=query) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does search()
return a queryset? Seems like it would return a search object or similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. it returns a Search object.
readthedocs/urls.py
Outdated
@@ -67,6 +67,7 @@ | |||
url(r'^api/', include(v1_api.urls)), | |||
url(r'^api/v2/', include('readthedocs.restapi.urls')), | |||
url(r'^api-auth/', include('rest_framework.urls', namespace='rest_framework')), | |||
url(r'^api/search/', PageSearchAPIView.as_view()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should likely continue to live in v2
for now. Perhaps as api/v2/indocsearch
or something, so it can live beside the old docsearch
during rollout?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah! I added it for testing purpose actually!
readthedocs/search/filters.py
Outdated
project_slug = request.query_params.get('project') | ||
project_slug_list = get_project_slug_list_or_404(project_slug=project_slug, | ||
user=request.user) | ||
return queryset.filter('terms', project=project_slug_list) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels like we're conflating querysets and Search queries here again. It feels a bit odd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah! I understand. Its not queryset, but a Search object but it is similar to queryset.
@ericholscher We actually have a |
@ericholscher Its ready for a final review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. We still need to wire this up to the front-end JS right?
readthedocs/restapi/urls.py
Outdated
@@ -48,7 +48,7 @@ | |||
url(r'index_search/', | |||
search_views.index_search, | |||
name='index_search'), | |||
url(r'search/$', views.search_views.search, name='api_search'), | |||
# url(r'search/$', views.search_views.search, name='api_search'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not remove this yet, until we've deployed a full version to replace it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont know, but with this, the reverse
function act differently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without having comment, its returns something.
resolve('/api/v2/docsearch/')
>>> ResolverMatch(func=readthedocs.restapi.views.search_views.search, args=(), kwargs={}, url_name=api_search, app_names=[], namespaces=[])
But it should return
>>> ResolverMatch(func=readthedocs.search.api.PageSearchAPIView, args=(), kwargs={}, url_name=doc_search, app_names=[], namespaces=[])
I have no idea why this is happening. maybe a bug in django?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Believe it's a bad regex. It should be r'^search/$'
-- now it's just catching anything that ends in /api/v2/<anythinghere>search/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ericholscher . I did not suspect the regex can be wrong! It worked!
# Run bool query with should, so it returns result where either of the query matches | ||
bool_query = Bool(should=all_queries) | ||
search = search.query(bool_query) | ||
search = search.query(query) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels like we shouldn't be overwriting the object here. Will we return something invalid if there is no query
because we have the same object name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have actually a mixed feeling about this. Do you have any idea about how to do this withtout overriding the search
method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean mostly just the name of the object. search = search.query(query)
-- is that similar to a queryset = queryset.filter()
? It just feels like a weird bit of syntax to rewrite the name of the object over again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. search = search.query(query)
-- is that similar to a queryset = queryset.filter()
. If we do not overwrite the same object, then another variable need to be assigned and check if that is not None
@@ -66,6 +66,8 @@ | |||
api_urls = [ | |||
url(r'^api/', include(v1_api.urls)), | |||
url(r'^api/v2/', include('readthedocs.restapi.urls')), | |||
# Keep the `doc_search` at root level, so the test does not fail for other API | |||
url(r'^api/v2/docsearch/$', PageSearchAPIView.as_view(), name='doc_search'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I follow. What tests were failing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is failing:
https://travis-ci.org/rtfd/readthedocs.org/jobs/398428841#L902-L911
@ericholscher Anything left for fixing? |
@ericholscher while I was working in the frontend, I realized that the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good with a few small nits, and fixing the tests on py3. Glad we caught the link addition from the client side, I had a feeling we'd need more to render the HTML properly :)
readthedocs/search/api.py
Outdated
context['projects_info'] = self.get_projects_info() | ||
return context | ||
|
||
def _get_all_projects(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this the same logic as in get_project_slug_list_or_404
? It should probably use the util function for this I think? Perhaps it could be get_project_list_or_404
and we can pass slug_only
param to it or something?
readthedocs/search/api.py
Outdated
|
||
def get_serializer_context(self): | ||
context = super(PageSearchAPIView, self).get_serializer_context() | ||
context['projects_info'] = self.get_projects_info() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
projects_info
I think is a bad name. It seems like we're just getting the docs_url
from it for now, so we should probably be more clear about that. Perhaps context['project_urls']
or something?
@ericholscher Fixed python3 compatibility and changes as you reviewed. r? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. 👍
[fix readthedocs#4265] Port Document search API for Elasticsearch 6.x
[fix readthedocs#4265] Port Document search API for Elasticsearch 6.x
This is the first etaration of making a API for Documentation Search. Its not finalized yet and the work is not finished yet.
Need more work on highlighting as well as test.
Needed to do some refactor to keep it DRY.
@ericholscher Can you look for a slight review?