[fix #4265] Port Document search API for Elasticsearch 6.x #4309

safwanrahman · 2018-06-27T21:11:05Z

This is the first etaration of making a API for Documentation Search. Its not finalized yet and the work is not finished yet.
Need more work on highlighting as well as test.
Needed to do some refactor to keep it DRY.
@ericholscher Can you look for a slight review?

safwanrahman · 2018-06-27T21:18:06Z

Currently the API looks something like this!

ericholscher

I think most of my questions are around whether we're using proper querysets, or we have things that look like querysets, but are actually search results that we're filtering. Could use a bit more explanation, and code comments if that is what we're doing.

ericholscher · 2018-06-28T14:23:29Z

readthedocs/search/documents.py

@@ -60,14 +61,19 @@ class Meta(object):
    content = fields.TextField(attr='processed_json.content')
    path = fields.TextField(attr='processed_json.path')

+    # Fields to perform search with weight
+    search_fields = ['title^10', 'headers^5', 'content']


Is this used by the DocType class directly, or are we storing it here to add later? If the prior, we should probably set it in an __init__ method or somewhere else, so it doesn't get confused with the other data here.

Its not actually used by DocType class, but we are using it in class method. I dont know if keeping it in __init__ method will make it work in class method!

Hrm, good point. I guess it's fine for now then.

ericholscher · 2018-06-28T14:41:04Z

readthedocs/search/api.py

+
+    def get_queryset(self):
+        query = self.request.query_params.get('query')
+        queryset = PageDocument.search(query=query)


Does search() return a queryset? Seems like it would return a search object or similar.

Yeah. it returns a Search object.

ericholscher · 2018-06-28T14:44:12Z

readthedocs/urls.py

@@ -67,6 +67,7 @@
    url(r'^api/', include(v1_api.urls)),
    url(r'^api/v2/', include('readthedocs.restapi.urls')),
    url(r'^api-auth/', include('rest_framework.urls', namespace='rest_framework')),
+    url(r'^api/search/', PageSearchAPIView.as_view()),


This should likely continue to live in v2 for now. Perhaps as api/v2/indocsearch or something, so it can live beside the old docsearch during rollout?

Yeah! I added it for testing purpose actually!

ericholscher · 2018-06-28T14:48:28Z

readthedocs/search/filters.py

+        project_slug = request.query_params.get('project')
+        project_slug_list = get_project_slug_list_or_404(project_slug=project_slug,
+                                                         user=request.user)
+        return queryset.filter('terms', project=project_slug_list)


Feels like we're conflating querysets and Search queries here again. It feels a bit odd.

Yeah! I understand. Its not queryset, but a Search object but it is similar to queryset.

safwanrahman · 2018-06-28T15:09:48Z

I think most of my questions are around whether we're using proper querysets, or we have things that look like querysets, but are actually search results that we're filtering.

@ericholscher We actually have a Search object that act closely similar to queryset. So we can use the Search object as queryset to easily implement the API!

safwanrahman · 2018-06-30T20:42:34Z

@ericholscher Its ready for a final review.
r?

ericholscher

Looks good to me. We still need to wire this up to the front-end JS right?

ericholscher · 2018-07-02T13:00:21Z

readthedocs/restapi/urls.py

@@ -48,7 +48,7 @@
    url(r'index_search/',
        search_views.index_search,
        name='index_search'),
-    url(r'search/$', views.search_views.search, name='api_search'),
+    # url(r'search/$', views.search_views.search, name='api_search'),


Let's not remove this yet, until we've deployed a full version to replace it.

I dont know, but with this, the reverse function act differently.

Without having comment, its returns something.

resolve('/api/v2/docsearch/') >>> ResolverMatch(func=readthedocs.restapi.views.search_views.search, args=(), kwargs={}, url_name=api_search, app_names=[], namespaces=[])

But it should return

>>> ResolverMatch(func=readthedocs.search.api.PageSearchAPIView, args=(), kwargs={}, url_name=doc_search, app_names=[], namespaces=[])

I have no idea why this is happening. maybe a bug in django?

Believe it's a bad regex. It should be r'^search/$' -- now it's just catching anything that ends in /api/v2/<anythinghere>search/

Thanks @ericholscher . I did not suspect the regex can be wrong! It worked!

ericholscher · 2018-07-02T13:05:08Z

readthedocs/search/faceted_search.py

-            # Run bool query with should, so it returns result where either of the query matches
-            bool_query = Bool(should=all_queries)
-            search = search.query(bool_query)
+            search = search.query(query)


Feels like we shouldn't be overwriting the object here. Will we return something invalid if there is no query because we have the same object name?

I have actually a mixed feeling about this. Do you have any idea about how to do this withtout overriding the search method?

I mean mostly just the name of the object. search = search.query(query) -- is that similar to a queryset = queryset.filter()? It just feels like a weird bit of syntax to rewrite the name of the object over again.

Yeah. search = search.query(query) -- is that similar to a queryset = queryset.filter(). If we do not overwrite the same object, then another variable need to be assigned and check if that is not None

ericholscher · 2018-07-02T13:37:54Z

readthedocs/urls.py

@@ -66,6 +66,8 @@
 api_urls = [
    url(r'^api/', include(v1_api.urls)),
    url(r'^api/v2/', include('readthedocs.restapi.urls')),
+    # Keep the `doc_search` at root level, so the test does not fail for other API
+    url(r'^api/v2/docsearch/$', PageSearchAPIView.as_view(), name='doc_search'),


Not sure I follow. What tests were failing?

This test is failing:
https://travis-ci.org/rtfd/readthedocs.org/jobs/398428841#L902-L911

safwanrahman · 2018-07-03T19:15:11Z

@ericholscher Anything left for fixing?

safwanrahman · 2018-07-04T21:39:33Z

@ericholscher while I was working in the frontend, I realized that the link of document is needed. I have implemented that. Can you please make a review?

ericholscher

Looks good with a few small nits, and fixing the tests on py3. Glad we caught the link addition from the client side, I had a feeling we'd need more to render the HTML properly :)

ericholscher · 2018-07-05T07:40:52Z

readthedocs/search/api.py

+        context['projects_info'] = self.get_projects_info()
+        return context
+
+    def _get_all_projects(self):


Isn't this the same logic as in get_project_slug_list_or_404? It should probably use the util function for this I think? Perhaps it could be get_project_list_or_404 and we can pass slug_only param to it or something?

ericholscher · 2018-07-05T07:43:00Z

readthedocs/search/api.py

+
+    def get_serializer_context(self):
+        context = super(PageSearchAPIView, self).get_serializer_context()
+        context['projects_info'] = self.get_projects_info()


projects_info I think is a bad name. It seems like we're just getting the docs_url from it for now, so we should probably be more clear about that. Perhaps context['project_urls'] or something?

safwanrahman · 2018-07-05T14:14:55Z

@ericholscher Fixed python3 compatibility and changes as you reviewed. r?

ericholscher

Looks great. 👍

[fix readthedocs#4265] Port Document search API for Elasticsearch 6.x

[fix readthedocs#4265] Port Document search API for Elasticsearch 6.x

bb4e5aa

safwanrahman requested a review from ericholscher June 27, 2018 21:11

safwanrahman self-assigned this Jun 27, 2018

safwanrahman added the PR: work in progress Pull request is not ready for full review label Jun 27, 2018

ericholscher reviewed Jun 28, 2018

View reviewed changes

safwanrahman added 4 commits June 29, 2018 02:24

fixup and adding test

4dc3e35

more fixup

fe2aef1

adding more tests

41a67a3

adding more tests and fixup

85b4686

safwanrahman force-pushed the search_api branch from 4e30b4d to 85b4686 Compare June 30, 2018 20:01

safwanrahman removed the PR: work in progress Pull request is not ready for full review label Jun 30, 2018

fixing lint

b01981f

ericholscher reviewed Jul 2, 2018

View reviewed changes

fixing regex

75de7f6

Adding link to serialized data

e2e8cbb

ericholscher reviewed Jul 5, 2018

View reviewed changes

fixing python3 compatibility

733b030

ericholscher approved these changes Jul 6, 2018

View reviewed changes

ericholscher merged commit d6638b9 into readthedocs:search_upgrade Jul 6, 2018

safwanrahman mentioned this pull request Jul 6, 2018

Port Document search API for Elasticsearch 6.x #4265

Closed

safwanrahman deleted the search_api branch July 7, 2018 19:23

safwanrahman mentioned this pull request Jul 8, 2018

[Fix #4265] Porting frontend docsearch to work with new API #4340

Merged

safwanrahman pushed a commit to safwanrahman/readthedocs.org that referenced this pull request Jul 16, 2018

Merge pull request readthedocs#4309 from safwanrahman/search_api

96681e2

[fix readthedocs#4265] Port Document search API for Elasticsearch 6.x

safwanrahman pushed a commit to safwanrahman/readthedocs.org that referenced this pull request Jul 16, 2018

Merge pull request readthedocs#4309 from safwanrahman/search_api

1b47227

[fix readthedocs#4265] Port Document search API for Elasticsearch 6.x

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[fix #4265] Port Document search API for Elasticsearch 6.x #4309

[fix #4265] Port Document search API for Elasticsearch 6.x #4309

safwanrahman commented Jun 27, 2018

safwanrahman commented Jun 27, 2018

ericholscher left a comment

ericholscher Jun 28, 2018

safwanrahman Jun 28, 2018

ericholscher Jun 28, 2018

ericholscher Jun 28, 2018

safwanrahman Jun 28, 2018

ericholscher Jun 28, 2018

safwanrahman Jun 28, 2018

ericholscher Jun 28, 2018

safwanrahman Jun 28, 2018

safwanrahman commented Jun 28, 2018

safwanrahman commented Jun 30, 2018

ericholscher left a comment

ericholscher Jul 2, 2018

safwanrahman Jul 2, 2018

safwanrahman Jul 2, 2018

ericholscher Jul 2, 2018

safwanrahman Jul 2, 2018

ericholscher Jul 2, 2018

safwanrahman Jul 2, 2018

ericholscher Jul 2, 2018

safwanrahman Jul 3, 2018

ericholscher Jul 2, 2018

safwanrahman Jul 2, 2018

safwanrahman commented Jul 3, 2018

safwanrahman commented Jul 4, 2018

ericholscher left a comment •

edited

Loading

ericholscher Jul 5, 2018

ericholscher Jul 5, 2018

safwanrahman commented Jul 5, 2018

ericholscher left a comment

[fix #4265] Port Document search API for Elasticsearch 6.x #4309

[fix #4265] Port Document search API for Elasticsearch 6.x #4309

Conversation

safwanrahman commented Jun 27, 2018

safwanrahman commented Jun 27, 2018

ericholscher left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

safwanrahman commented Jun 28, 2018

safwanrahman commented Jun 30, 2018

ericholscher left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

safwanrahman commented Jul 3, 2018

safwanrahman commented Jul 4, 2018

ericholscher left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

safwanrahman commented Jul 5, 2018

ericholscher left a comment

Choose a reason for hiding this comment

ericholscher left a comment •

edited

Loading