Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port Document search API for Elasticsearch 6.x #4265

Closed
safwanrahman opened this issue Jun 19, 2018 · 7 comments
Closed

Port Document search API for Elasticsearch 6.x #4265

safwanrahman opened this issue Jun 19, 2018 · 7 comments
Assignees
Labels
Accepted Accepted issue on our roadmap

Comments

@safwanrahman
Copy link
Member

safwanrahman commented Jun 19, 2018

For Document search functionality, there are already an API exist in readthedocs-ext repository.
As we upgrade the search as part of GSOC project, that API need to be ported to work with the new search infrastructure. The search API should be rewritten by using the PageDocument class but the API schema should not be changed. That will be a complex work to do as the API schema need to be kept same.

Before starting work, there should be good amount of test in the current API schema because if API schema is changed, document search will be broken in the frontend and it will have major effect. See #4197 for ongoing tests.

@ericholscher @agjohnson Do you think the API schema should be kept same or we should design a new Document Search API which mataches with the new search upgrade code?

@safwanrahman safwanrahman added the Needed: design decision A core team decision is required label Jun 19, 2018
@safwanrahman safwanrahman self-assigned this Jun 19, 2018
@ericholscher
Copy link
Member

I don't think we need to heavily change the API, but we are able to if we want. We control the client code, so we can easily deploy new client code that works with the new API, if we need to.

@safwanrahman you should discuss any specific changes you think we should make, and we can talk about them specifically, instead of in the abstract.

@safwanrahman
Copy link
Member Author

safwanrahman commented Jun 19, 2018

Currently the API returns nested structure! There are a lot of information that is not needed actually. we only need some of the information like title, link, highlight and number of results. So I was thinking to design a API that provide only the needed values.

Current API Structure

{
  "results": {
    "hits": {
      "hits": [
        {
          "_type": "page",
          "_index": "readthedocs-20150526133301",
          "_score": 0.014734331,
          "fields": {
            "project": [
              "kuma"
            ],
            "path": [
              "index"
            ],
            "version": [
              "latest"
            ],
            "link": "https://kuma.readthedocs.io/en/latest/index",
            "title": [
              "Kuma"
            ]
          },
          "highlight": {
            "content": [
              "Kuma\nKuma is the platform that powers MDN (developer.<em>mozilla</em>.org)\nDevelopment\nCode:\nhttps://github",
              ".com/<em>mozilla</em>/kuma\nIssues:\nP1 Bugs (to be fixed in 90 days)\nP2 Bugs (to be fixed in 180 days)\nAll",
              " developer.<em>mozilla</em>.org bugs\nPull Request Queues\nDev Docs:\nhttps://kuma.readthedocs.io/en/latest",
              "/installation.html\nCI Server:\nhttps://travis-ci.org/<em>mozilla</em>/kuma\nForum:\nhttps://discourse.<em>mozilla</em>.org/c/mdn",
              "\nIRC:\nirc://irc.<em>mozilla</em>.org/mdndev\nhttp://logs.glob.uno/?c=<em>mozilla</em>%23mdndev (logs)\nServers:\nWhat’s"
            ]
          },
          "_id": "4ad0a08beb6ae308b1737f4c943e5210"
        },
      ],
      "total": 8,
      "max_score": 0.014734331
    },
    "_shards": {
      "successful": 1,
      "failed": 0,
      "total": 1
    },
    "took": 39,
    "facets": {
      "project": {
        "_type": "terms",
        "total": 8,
        "terms": [
          {
            "count": 8,
            "term": "kuma"
          }
        ],
        "other": 0,
        "missing": 0
      },
      "taxonomy": {
        "_type": "terms",
        "total": 0,
        "terms": [
          
        ],
        "other": 0,
        "missing": 8
      },
      "version": {
        "_type": "terms",
        "total": 8,
        "terms": [
          {
            "count": 8,
            "term": "latest"
          }
        ],
        "other": 0,
        "missing": 0
      }
    },
    "timed_out": false
  }
}

@safwanrahman
Copy link
Member Author

@ericholscher I would propose the API should be something like this Restful!

{
  "count": 37,
  "next": "https://readthedocs.org/api/v3/search/?q=foo&project=bar&page=2",
  "previous": null,
  "results": [
    {
      "id": 65,
      "title": "foo bar",
      "link": "https://docs.readthedocs.org",
      "highlight": {
        "content": [
          "bla bla"
        ]
      }
    }
  ]
}

@ericholscher
Copy link
Member

@safwanrahman I agree, we are returning too much data (and very Elastic Search specific data) from the API. We should condense it down.

I believe the original intent was to be able to build a "full" search client with the API. So you could offer faceting and other data in the API, so that the user could build a nicer interface for it than just a search box with results. I think for now, we can simplify the API down to what we need for documentation pages. We can then extend it in the future if we need to.

@ericholscher
Copy link
Member

safwanrahman added a commit to safwanrahman/readthedocs.org that referenced this issue Jun 27, 2018
ericholscher added a commit that referenced this issue Jul 6, 2018
[fix #4265] Port Document search API for Elasticsearch 6.x
@safwanrahman
Copy link
Member Author

safwanrahman commented Jul 6, 2018

The backend API has been implemented by #4309. Need to port the frontend javascript to work with the new API!

@safwanrahman safwanrahman added Accepted Accepted issue on our roadmap and removed Needed: design decision A core team decision is required labels Jul 6, 2018
safwanrahman added a commit to safwanrahman/readthedocs.org that referenced this issue Jul 16, 2018
safwanrahman pushed a commit to safwanrahman/readthedocs.org that referenced this issue Jul 16, 2018
[fix readthedocs#4265] Port Document search API for Elasticsearch 6.x
safwanrahman added a commit to safwanrahman/readthedocs.org that referenced this issue Jul 16, 2018
safwanrahman pushed a commit to safwanrahman/readthedocs.org that referenced this issue Jul 16, 2018
[fix readthedocs#4265] Port Document search API for Elasticsearch 6.x
ericholscher added a commit that referenced this issue Jul 16, 2018
[Fix #4265] Porting frontend docsearch to work with new API
safwanrahman pushed a commit to safwanrahman/readthedocs.org that referenced this issue Jul 16, 2018
[Fix readthedocs#4265] Porting frontend docsearch to work with new API
safwanrahman pushed a commit to safwanrahman/readthedocs.org that referenced this issue Jul 16, 2018
[Fix readthedocs#4265] Porting frontend docsearch to work with new API
@safwanrahman
Copy link
Member Author

safwanrahman commented Jul 27, 2018

Fixed with #4309. #4340

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Accepted Accepted issue on our roadmap
Projects
None yet
Development

No branches or pull requests

2 participants