-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Browse files
Browse the repository at this point in the history
- Loading branch information
1 parent
183b176
commit 97e3ad8
Showing
59 changed files
with
1,431 additions
and
1,436 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,110 @@ | ||
Search | ||
====== | ||
|
||
Read The Docs uses Elasticsearch_ instead of the built in Sphinx search for providing better search | ||
results. Documents are indexed in the Elasticsearch index and the search is made through the API. | ||
All the Search Code is open source and lives in the `GitHub Repository`_. | ||
Currently we are using the `Elasticsearch 6.3`_ version. | ||
|
||
Local Development Configuration | ||
------------------------------- | ||
|
||
Installing and running Elasticsearch | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
You need to install and run Elasticsearch_ version 6.3 on your local development machine. | ||
You can get the installation instructions | ||
`here <https://www.elastic.co/guide/en/elasticsearch/reference/6.3/install-elasticsearch.html>`_. | ||
Otherwise, you can also start an Elasticsearch Docker container by running the following command:: | ||
|
||
docker run -p 9200:9200 -p 9300:9300 \ | ||
-e "discovery.type=single-node" \ | ||
docker.elastic.co/elasticsearch/elasticsearch:6.3.2 | ||
|
||
Indexing into Elasticsearch | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
For using search, you need to index data to the Elasticsearch Index. Run ``reindex_elasticsearch`` | ||
management command:: | ||
|
||
./manage.py reindex_elasticsearch | ||
|
||
For performance optimization, we implemented our own version of management command rather than | ||
the built in management command provided by the `django-elasticsearch-dsl`_ package. | ||
|
||
Auto Indexing | ||
^^^^^^^^^^^^^ | ||
By default, Auto Indexing is turned off in development mode. To turn it on, change the | ||
``ELASTICSEARCH_DSL_AUTOSYNC`` settings to `True` in the `readthedocs/settings/dev.py` file. | ||
After that, whenever a documentation successfully builds, or project gets added, | ||
the search index will update automatically. | ||
|
||
|
||
Architecture | ||
------------ | ||
The search architecture is devided into 2 parts. | ||
One part is responsible for **indexing** the documents and projects and | ||
the other part is responsible for querying the Index to show the proper results to users. | ||
We use the `django-elasticsearch-dsl`_ package mostly to the keep the search working. | ||
`django-elasticsearch-dsl`_ is a wrapper around `elasticsearch-dsl`_ for easy configuration | ||
with Django. | ||
|
||
Indexing | ||
^^^^^^^^ | ||
All the Sphinx documents are indexed into Elasticsearch after the build is successful. | ||
Currently, we do not index MkDocs documents to elasticsearch, but | ||
`any kind of help is welcome <https://github.com/rtfd/readthedocs.org/issues/1088>`_. | ||
|
||
How we index documentations | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
After any build is successfully finished, `HTMLFile` objects are created for each of the | ||
``HTML`` files and the old version's `HTMLFile` object is deleted. By default, | ||
`django-elasticsearch-dsl`_ package listens to the `post_create`/`post_delete` signals | ||
to index/delete documents, but it has performance drawbacks as it send HTTP request whenever | ||
any `HTMLFile` objects is created or deleted. To optimize the performance, `bulk_post_create` | ||
and `bulk_post_delete` Signals_ are dispatched with list of `HTMLFIle` objects so its possible | ||
to bulk index documents in elasticsearch ( `bulk_post_create` signal is dispatched for created | ||
and `bulk_post_delete` is dispatched for deleted objects). Both of the signals are dispatched | ||
with the list of the instances of `HTMLFile` in `instance_list` parameter. | ||
|
||
We listen to the `bulk_post_create` and `bulk_post_delete` signals in our `Search` application | ||
and index/delete the documentation content from the `HTMLFile` instances. | ||
|
||
|
||
How we index projects | ||
~~~~~~~~~~~~~~~~~~~~~ | ||
We also index project information in our search index so that the user can search for projects | ||
from the main site. `django-elasticsearch-dsl`_ listen `post_create` and `post_delete` signals of | ||
`Project` model and index/delete into Elasticsearch accordingly. | ||
|
||
|
||
Elasticsearch Document | ||
~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
`elasticsearch-dsl`_ provides model-like wrapper for the `Elasticsearch document`_. | ||
As per requirements of `django-elasticsearch-dsl`_, it is stored in the | ||
`readthedocs/search/documents.py` file. | ||
|
||
**ProjectDocument:** It is used for indexing projects. Signal listener of | ||
`django-elasticsearch-dsl`_ listens to the `post_save` signal of `Project` model and | ||
then index/delete into Elasticsearch. | ||
|
||
**PageDocument**: It is used for indexing documentation of projects. By default, the auto | ||
indexing is turned off by `ignore_signals = settings.ES_PAGE_IGNORE_SIGNALS`. | ||
`settings.ES_PAGE_IGNORE_SIGNALS` is `False` both in development and production. | ||
As mentioned above, our `Search` app listens to the `bulk_post_create` and `bulk_post_delete` | ||
signals and indexes/deleted documentation into Elasticsearch. The signal listeners are in | ||
the `readthedocs/search/signals.py` file. Both of the signals are dispatched | ||
after a successful documentation build. | ||
|
||
The fields and ES Datatypes are specified in the `PageDocument`. The indexable data is taken | ||
from `processed_json` property of `HTMLFile`. This property provides python dictionary with | ||
document data like `title`, `headers`, `content` etc. | ||
|
||
|
||
.. _Elasticsearch: https://www.elastic.co/products/elasticsearch | ||
.. _Elasticsearch 6.3: https://www.elastic.co/guide/en/elasticsearch/reference/6.3/index.html | ||
.. _GitHub Repository: https://github.com/rtfd/readthedocs.org/tree/master/readthedocs/search | ||
.. _Elasticsearch document: https://www.elastic.co/guide/en/elasticsearch/guide/current/document.html | ||
.. _django-elasticsearch-dsl: https://github.com/sabricot/django-elasticsearch-dsl | ||
.. _elasticsearch-dsl: http://elasticsearch-dsl.readthedocs.io/en/latest/ | ||
.. _Signals: https://docs.djangoproject.com/en/2.1/topics/signals/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
33 changes: 0 additions & 33 deletions
33
readthedocs/core/management/commands/provision_elasticsearch.py
This file was deleted.
Oops, something went wrong.
Oops, something went wrong.