Add search pagination and mock ES #1132

seav · 2017-02-16T08:37:45Z

Proposed changes in this pull request

Add pagination to the search results.
- Background: There are 3 alternative approaches to pagination: ES-side, platform-side, or client-side.
  - For the client-side approach, we dump all search results to the client and the DataTables handles the pagination on its own similar to how all other DataTables in the site are currently handled. But as discussed in Research front-end performance improvement #893, we want to move away from this approach in order to improve the front-end performance. So this approach is not selected for this PR.
  - For the platform-side approach, the platform does the pagination by itself (generally by slicing the array of results provided by ES) and the client uses DataTables' server-side processing mode. This approach is not selected for this PR because the ES-side approach is still feasible and would lead to less processing on the platform.
  - For the ES-side approach, the platform just acts as a glorified proxy between the client and the ES API by translating DataTables' server-side processing start and length parameters into ES' from and size parameters. Note that this approach is only feasible if there is a 1-to-1 correspondence between ES' search results and the results presented in the UI. If the platform decides to insert or remove results (for example, the platform splits an ES party-relationship document result into a party UI result and a relationship UI result), this approach completely breaks down and we need to go to the platform-side approach. Currently I am still assuming a 1-to-1 correspondence.
- Improve the platform search async endpoint to proxy between the DataTables' server-side processing API and the ES' from and size input parameters and hits.total return value. The platform endpoint and UI is also converted from using GET to using POST, and the UI also provides the CSRF token for the POST AJAX request.
- Add _score sorting to the API request to ES.
- Note that ES has a (configurable) hard maximum limit of 10,000 results. I think we can stick to this default value. If the user wants more, then they should export the project data instead and then slice and dice the data using whatever tool they want.
Add a mock ES API to the platform:
- Background: Issue Implement a mock ES cluster for testing purposes #910 was submitted because we do need something to mock the ES API for the platform's unit tests. That was already solved by using Python's built-in unittest.mock library. But as suggested by @oliverroick, it would also be nice to have some some sort of functional search so that developers can see how the UI functions while using the dev VM and avoiding having the search code deployed on staging. This PR addresses this need.
- Add a mock_ES Django app to the platform dev environment with corresponding URLs and views. The URLs match the ES API in our deployment environments.
- The mock ES API cannot provide any fake results because the platform in some cases retrieves objects from the database to augment the results provided by ES. So the mock API uses actual entities in the current project being searched. The mock API does not do any search processing at all. It basically provides all of the project's locations, parties, relationships, and resources (interleaved) as its "search results". The mock ES API does honor the pagination parameters discussed above.
- The mock ES API has two overrides. Searching for "none" in the UI will have the mock API return no results to mimic what happens if there are no results. (The alternative would be to search in a project with no records.) Searching for "error" will have the mock API return an error response to mimic what happens if the ES is not working or encounters an error.
Update the UI search results table to only have 1 column and to hide the table header (for now).

When should this PR be merged

Anytime.

Risks

This should be extensively tested in the dev VM (for the mock ES app) and the staging environment (for the pagination feature) to test that everything works correctly.

Follow up actions

None. This PR needs no updated packages or dependencies or ES reindexing.

Checklist (for reviewing)

General

Is this PR explained thoroughly? All code changes must be accounted for in the PR description.
Is the PR labeled correctly? It should have the migration label if a new migration is added.
Is the risk level assessment sufficient? The risks section should contain all risks that might be introduced with the PR and which actions we need to take to mitigate these risks. Possible risks are database migrations, new libraries that need to be installed or changes to deployment scripts.

Functionality

Are all requirements met? Compare implemented functionality with the requirements specification.
Does the UI work as expected? There should be no Javascript errors in the console; all resources should load. There should be no unexpected errors. Deliberately try to break the feature to find out if there are corner cases that are not handled.

Code

Do you fully understand the introduced changes to the code? If not ask for clarification, it might uncover ways to solve a problem in a more elegant and efficient way.
Does the PR introduce any inefficient database requests? Use the debug server to check for duplicate requests.
Are all necessary strings marked for translation? All strings that are exposed to users via the UI must be marked for translation.

Tests

Are there sufficient test cases? Ensure that all components are tested individually; models, forms, and serializers should be tested in isolation even if a test for a view covers these components.
If this is a bug fix, are tests for the issue in place There must be a test case for the bug to ensure the issue won’t regress. Make sure that the tests break without the new code to fix the issue.
If this is a new feature or a significant change to an existing feature has the manual testing spreadsheet been updated with instructions for manual testing?

Documentation

Are changes to the UI documented in the platform docs? If this PR introduces new platform site functionality or changes existing ones, the changes must be documented in the Cadasta Platform Documentation.
Are changes to the API documented in the API docs? If this PR introduces new API functionality or changes existing ones, the changes must be documented in the API docs.
Are reusable components documented? If this PR introduces components that are relevant to other developers (for instance a mixin for a view or a generic form) they should be documented in the Wiki.

oliverroick

The functionality for the mock part looks good. I just have a few questions and remarks.

One general question:

Instead of creating a new app (mock_es) could that functionality not live in search, maybe inside a sub-module called mock_es? It's related to search after all and it will keep our app structure neater.

oliverroick · 2017-02-17T09:57:18Z

cadasta/mock_es/views.py

+        resources = list(Resource.objects.filter(project=project))
+
+        entities = []
+        while len(locations) + len(parties) + len(rels) + len(resources) > 0:


Please correct me if I don't understand this loop properly. Inside the loop you're removing entities from each of the entities one-by-obe and them to the list entities until all of the original lists are empty. What's the idea behind this?

The idea is to provide a deterministic ordered list of "search results" that's not just a list of locations followed by parties, then relationships, then resources. So I'm interleaving all 4 types of entities.

oliverroick · 2017-02-17T09:59:22Z

cadasta/mock_es/views.py

+        num_page_results = request.data.get('size', 10)
+
+        hits = []
+        for entity in entities[start_idx:start_idx + num_page_results]:


I would write this as a list comprehension; it will be more efficient:

hits = [self.transform(entity) for entity in entities[start_idx:start_idx + num_page_results]]

oliverroick · 2017-02-17T10:06:03Z

cadasta/search/views/async.py

-        query = request.query_params.get('q')
+    def post(self, request, *args, **kwargs):
+        query = request.data.get('q')
+        start_idx = self.convert_field_to_int(request.data, 'start', 0)


Does this really need a separate method to cast this to int? In which cases would ValueError in convert_field_to_int be thrown?

The idea to cast values to integer came from the recommendation of DataTables in their server-side processing mode:

The draw counter that this object is a response to - from the draw parameter sent as part of the data request. Note that it is strongly recommended for security reasons that you cast this parameter to an integer, rather than simply echoing back to the client what it sent in the draw parameter, in order to prevent Cross Site Scripting (XSS) attacks.

So aside from casting the draw parameter to an integer, I am casting all expected integer values (including start and length) into integers just to be safe.

I wasn't questioning the need for casting to int. I was wondering whether it's necessary to implement this into a method if it might be possible to make this explicit in this line. I notice that you catch ValueError exceptions in the method; I guess that was the reason for putting it into a method. Is it likely that a value that cannot be cast to int is provided to any of these parameters (draw, start, length)? If it's likely under which circumstances will this happen? If it's not likely, it might be better not to catch the exception and fail loudly; otherwise there's a risk of introducing a bug that will be difficult to work out.

Oh, OK. I understand. It is expected that these parameters have only integer values. If they aren't, then somebody is doing something malicious and I agree that it would be better not to catch the ValueError exception in that case. So, I have removed the conversion method and inlined the integer conversions.

seav · 2017-02-19T08:57:00Z

Instead of creating a new app (mock_es) could that functionality not live in search, maybe inside a sub-module called mock_es? It's related to search after all and it will keep our app structure neater.

Yeah, that makes sense. OK, I'll just move everything under search.

oliverroick

Good stuff!

bjohare · 2017-02-20T14:00:13Z

cadasta/search/views/async.py

                    'error': 'unavailable',
                })

+            num_hits = raw_results['hits']['total']
            results = raw_results['hits']['hits']

            if len(results) == 0:


This seems a little strange. Is there no way to configure ES to return a timestamp even when there are no results?

@bjohare, unfortunately, no. The timestamp is always stored with the records. So if no results are returned, we would need to fetch a dummy result to get the timestamp. (There's a relevant discussion on the search Slack channel.)

bjohare

Looks good. Have tested in Dev VM. My inline comment about the timestamp is not a blocker.

oliverroick · 2017-02-20T14:40:07Z

@amplifi Can we get this branch on staging next to test pagination under production conditions?

amplifi · 2017-02-24T08:34:35Z

Can't verify functionality of _score sorting due to known issue that current search implementation only supports exact string matches -- we'll need to test this as a follow-up when that bugfix is ready.

Pagination drop-down still displays options to show 10/25/50/100 results even when less than 10 results are returned; should be hidden.

Help text under "More Search Guidelines" should be edited to remove the obsolete directive that "Search results are currently capped to 10 matches." Search example prompts should avoid use of quotation marks around search terms because quotation marks are an operator with distinct meaning. Encouraging users to apply quotation marks as a default will lead to unnecessarily narrowed results.

Manual testing spreadsheet hasn't been updated to adequately test search, including changes in this PR.

Documentation needs to be updated as per PR review requirements: devwiki and docs.cadasta.org are out of date for this implementation.

Still testing mock ES

amplifi · 2017-02-24T11:07:22Z

Mock ES looks good to me; just the above.

seav · 2017-03-01T08:01:36Z

Pagination drop-down still displays options to show 10/25/50/100 results even when less than 10 results are returned; should be hidden.

This is the current behavior of all DataTables in the website. I agree that it would be nice if this option can be hidden if DataTable has less than 10 rows, but I prefer that this be done in a separate PR that affects all DataTables and not just the search results table.

Help text under "More Search Guidelines" should be edited to remove the obsolete directive that "Search results are currently capped to 10 matches." Search example prompts should avoid use of quotation marks around search terms because quotation marks are an operator with distinct meaning. Encouraging users to apply quotation marks as a default will lead to unnecessarily narrowed results.

I've eliminated the quotation marks and instead used the <kbd> tag, which represents user input, and I've added custom styling for this tag. I've also removed the guideline regarding the 10-entry limit.

Manual testing spreadsheet hasn't been updated to adequately test search, including changes in this PR.

I've added test cases.

Documentation needs to be updated as per PR review requirements: devwiki and docs.cadasta.org are out of date for this implementation.

This is tricky. docs.cadasta.org is generally quite out-of-date for several recent features aside from search. I think updates for these need to be coordinated between the programs team and @bethschechter. As for devwiki, what is documented there are the feature requirements. I'm not sure if requirements need to be updated to reflect the actual implementation. @dpalomino, do you think we need to always update the requirements on the devwiki to reflect the actual implementation?

seav mentioned this pull request Feb 16, 2017

Implement a mock ES cluster for testing purposes #910

Closed

seav added PR: needs review PR: needs testing search labels Feb 16, 2017

oliverroick requested review from oliverroick and bjohare February 16, 2017 16:51

seav force-pushed the feature/search-paging-mock branch from ed51415 to 1b6e0e9 Compare February 17, 2017 08:40

oliverroick requested changes Feb 17, 2017

View reviewed changes

oliverroick approved these changes Feb 20, 2017

View reviewed changes

bjohare reviewed Feb 20, 2017

View reviewed changes

bjohare approved these changes Feb 20, 2017

View reviewed changes

oliverroick added PR: approved and removed PR: needs review labels Feb 20, 2017

seav added 4 commits March 1, 2017 14:55

Add search pagination and mock ES

e419a96

Resolve review feedback

1afe7ea

Resolve additional review feedback

62976fb

Tweak search guidelines

98b50bf

seav force-pushed the feature/search-paging-mock branch from a64c01d to 98b50bf Compare March 1, 2017 07:30

amplifi merged commit 7acb2fb into master Mar 2, 2017

amplifi deleted the feature/search-paging-mock branch March 2, 2017 13:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add search pagination and mock ES #1132

Add search pagination and mock ES #1132

seav commented Feb 16, 2017 •

edited by amplifi

Loading

oliverroick left a comment

oliverroick Feb 17, 2017

seav Feb 19, 2017

oliverroick Feb 17, 2017

seav Feb 19, 2017

oliverroick Feb 17, 2017

seav Feb 19, 2017

oliverroick Feb 20, 2017

seav Feb 20, 2017

seav commented Feb 19, 2017

oliverroick left a comment

bjohare Feb 20, 2017

seav Feb 21, 2017

bjohare left a comment

oliverroick commented Feb 20, 2017

amplifi commented Feb 24, 2017

amplifi commented Feb 24, 2017

seav commented Mar 1, 2017

Add search pagination and mock ES #1132

Add search pagination and mock ES #1132

Conversation

seav commented Feb 16, 2017 • edited by amplifi Loading

Proposed changes in this pull request

When should this PR be merged

Risks

Follow up actions

Checklist (for reviewing)

General

Functionality

Code

Tests

Documentation

oliverroick left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

seav commented Feb 19, 2017

oliverroick left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bjohare left a comment

Choose a reason for hiding this comment

oliverroick commented Feb 20, 2017

amplifi commented Feb 24, 2017

amplifi commented Feb 24, 2017

seav commented Mar 1, 2017

seav commented Feb 16, 2017 •

edited by amplifi

Loading