Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add search pagination and mock ES #1132

Merged
merged 4 commits into from
Mar 2, 2017
Merged

Conversation

seav
Copy link
Contributor

@seav seav commented Feb 16, 2017

Proposed changes in this pull request

  • Add pagination to the search results.
    • Background: There are 3 alternative approaches to pagination: ES-side, platform-side, or client-side.
      • For the client-side approach, we dump all search results to the client and the DataTables handles the pagination on its own similar to how all other DataTables in the site are currently handled. But as discussed in Research front-end performance improvement #893, we want to move away from this approach in order to improve the front-end performance. So this approach is not selected for this PR.
      • For the platform-side approach, the platform does the pagination by itself (generally by slicing the array of results provided by ES) and the client uses DataTables' server-side processing mode. This approach is not selected for this PR because the ES-side approach is still feasible and would lead to less processing on the platform.
      • For the ES-side approach, the platform just acts as a glorified proxy between the client and the ES API by translating DataTables' server-side processing start and length parameters into ES' from and size parameters. Note that this approach is only feasible if there is a 1-to-1 correspondence between ES' search results and the results presented in the UI. If the platform decides to insert or remove results (for example, the platform splits an ES party-relationship document result into a party UI result and a relationship UI result), this approach completely breaks down and we need to go to the platform-side approach. Currently I am still assuming a 1-to-1 correspondence.
    • Improve the platform search async endpoint to proxy between the DataTables' server-side processing API and the ES' from and size input parameters and hits.total return value. The platform endpoint and UI is also converted from using GET to using POST, and the UI also provides the CSRF token for the POST AJAX request.
    • Add _score sorting to the API request to ES.
    • Note that ES has a (configurable) hard maximum limit of 10,000 results. I think we can stick to this default value. If the user wants more, then they should export the project data instead and then slice and dice the data using whatever tool they want.
  • Add a mock ES API to the platform:
    • Background: Issue Implement a mock ES cluster for testing purposes #910 was submitted because we do need something to mock the ES API for the platform's unit tests. That was already solved by using Python's built-in unittest.mock library. But as suggested by @oliverroick, it would also be nice to have some some sort of functional search so that developers can see how the UI functions while using the dev VM and avoiding having the search code deployed on staging. This PR addresses this need.
    • Add a mock_ES Django app to the platform dev environment with corresponding URLs and views. The URLs match the ES API in our deployment environments.
    • The mock ES API cannot provide any fake results because the platform in some cases retrieves objects from the database to augment the results provided by ES. So the mock API uses actual entities in the current project being searched. The mock API does not do any search processing at all. It basically provides all of the project's locations, parties, relationships, and resources (interleaved) as its "search results". The mock ES API does honor the pagination parameters discussed above.
    • The mock ES API has two overrides. Searching for "none" in the UI will have the mock API return no results to mimic what happens if there are no results. (The alternative would be to search in a project with no records.) Searching for "error" will have the mock API return an error response to mimic what happens if the ES is not working or encounters an error.
  • Update the UI search results table to only have 1 column and to hide the table header (for now).

When should this PR be merged

Anytime.

Risks

This should be extensively tested in the dev VM (for the mock ES app) and the staging environment (for the pagination feature) to test that everything works correctly.

Follow up actions

None. This PR needs no updated packages or dependencies or ES reindexing.

Checklist (for reviewing)

General

  • Is this PR explained thoroughly? All code changes must be accounted for in the PR description.
  • Is the PR labeled correctly? It should have the migration label if a new migration is added.
  • Is the risk level assessment sufficient? The risks section should contain all risks that might be introduced with the PR and which actions we need to take to mitigate these risks. Possible risks are database migrations, new libraries that need to be installed or changes to deployment scripts.

Functionality

  • Are all requirements met? Compare implemented functionality with the requirements specification.
  • Does the UI work as expected? There should be no Javascript errors in the console; all resources should load. There should be no unexpected errors. Deliberately try to break the feature to find out if there are corner cases that are not handled.

Code

  • Do you fully understand the introduced changes to the code? If not ask for clarification, it might uncover ways to solve a problem in a more elegant and efficient way.
  • Does the PR introduce any inefficient database requests? Use the debug server to check for duplicate requests.
  • Are all necessary strings marked for translation? All strings that are exposed to users via the UI must be marked for translation.

Tests

  • Are there sufficient test cases? Ensure that all components are tested individually; models, forms, and serializers should be tested in isolation even if a test for a view covers these components.
  • If this is a bug fix, are tests for the issue in place There must be a test case for the bug to ensure the issue won’t regress. Make sure that the tests break without the new code to fix the issue.
  • If this is a new feature or a significant change to an existing feature has the manual testing spreadsheet been updated with instructions for manual testing?

Documentation

  • Are changes to the UI documented in the platform docs? If this PR introduces new platform site functionality or changes existing ones, the changes must be documented in the Cadasta Platform Documentation.
  • Are changes to the API documented in the API docs? If this PR introduces new API functionality or changes existing ones, the changes must be documented in the API docs.
  • Are reusable components documented? If this PR introduces components that are relevant to other developers (for instance a mixin for a view or a generic form) they should be documented in the Wiki.

Copy link
Member

@oliverroick oliverroick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The functionality for the mock part looks good. I just have a few questions and remarks.

One general question:

Instead of creating a new app (mock_es) could that functionality not live in search, maybe inside a sub-module called mock_es? It's related to search after all and it will keep our app structure neater.

resources = list(Resource.objects.filter(project=project))

entities = []
while len(locations) + len(parties) + len(rels) + len(resources) > 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please correct me if I don't understand this loop properly. Inside the loop you're removing entities from each of the entities one-by-obe and them to the list entities until all of the original lists are empty. What's the idea behind this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is to provide a deterministic ordered list of "search results" that's not just a list of locations followed by parties, then relationships, then resources. So I'm interleaving all 4 types of entities.

num_page_results = request.data.get('size', 10)

hits = []
for entity in entities[start_idx:start_idx + num_page_results]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would write this as a list comprehension; it will be more efficient:

hits = [self.transform(entity)
        for entity in entities[start_idx:start_idx + num_page_results]]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

query = request.query_params.get('q')
def post(self, request, *args, **kwargs):
query = request.data.get('q')
start_idx = self.convert_field_to_int(request.data, 'start', 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this really need a separate method to cast this to int? In which cases would ValueError in convert_field_to_int be thrown?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea to cast values to integer came from the recommendation of DataTables in their server-side processing mode:

The draw counter that this object is a response to - from the draw parameter sent as part of the data request. Note that it is strongly recommended for security reasons that you cast this parameter to an integer, rather than simply echoing back to the client what it sent in the draw parameter, in order to prevent Cross Site Scripting (XSS) attacks.

So aside from casting the draw parameter to an integer, I am casting all expected integer values (including start and length) into integers just to be safe.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't questioning the need for casting to int. I was wondering whether it's necessary to implement this into a method if it might be possible to make this explicit in this line. I notice that you catch ValueError exceptions in the method; I guess that was the reason for putting it into a method. Is it likely that a value that cannot be cast to int is provided to any of these parameters (draw, start, length)? If it's likely under which circumstances will this happen? If it's not likely, it might be better not to catch the exception and fail loudly; otherwise there's a risk of introducing a bug that will be difficult to work out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, OK. I understand. It is expected that these parameters have only integer values. If they aren't, then somebody is doing something malicious and I agree that it would be better not to catch the ValueError exception in that case. So, I have removed the conversion method and inlined the integer conversions.

@seav
Copy link
Contributor Author

seav commented Feb 19, 2017

Instead of creating a new app (mock_es) could that functionality not live in search, maybe inside a sub-module called mock_es? It's related to search after all and it will keep our app structure neater.

Yeah, that makes sense. OK, I'll just move everything under search.

Copy link
Member

@oliverroick oliverroick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good stuff!

'error': 'unavailable',
})

num_hits = raw_results['hits']['total']
results = raw_results['hits']['hits']

if len(results) == 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a little strange. Is there no way to configure ES to return a timestamp even when there are no results?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bjohare, unfortunately, no. The timestamp is always stored with the records. So if no results are returned, we would need to fetch a dummy result to get the timestamp. (There's a relevant discussion on the search Slack channel.)

Copy link
Contributor

@bjohare bjohare left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Have tested in Dev VM. My inline comment about the timestamp is not a blocker.

@oliverroick
Copy link
Member

@amplifi Can we get this branch on staging next to test pagination under production conditions?

@amplifi
Copy link
Contributor

amplifi commented Feb 24, 2017

Can't verify functionality of _score sorting due to known issue that current search implementation only supports exact string matches -- we'll need to test this as a follow-up when that bugfix is ready.

Pagination drop-down still displays options to show 10/25/50/100 results even when less than 10 results are returned; should be hidden.

Help text under "More Search Guidelines" should be edited to remove the obsolete directive that "Search results are currently capped to 10 matches." Search example prompts should avoid use of quotation marks around search terms because quotation marks are an operator with distinct meaning. Encouraging users to apply quotation marks as a default will lead to unnecessarily narrowed results.

Manual testing spreadsheet hasn't been updated to adequately test search, including changes in this PR.

Documentation needs to be updated as per PR review requirements: devwiki and docs.cadasta.org are out of date for this implementation.

Still testing mock ES

@amplifi
Copy link
Contributor

amplifi commented Feb 24, 2017

Mock ES looks good to me; just the above.

@seav seav force-pushed the feature/search-paging-mock branch from a64c01d to 98b50bf Compare March 1, 2017 07:30
@seav
Copy link
Contributor Author

seav commented Mar 1, 2017

Pagination drop-down still displays options to show 10/25/50/100 results even when less than 10 results are returned; should be hidden.

This is the current behavior of all DataTables in the website. I agree that it would be nice if this option can be hidden if DataTable has less than 10 rows, but I prefer that this be done in a separate PR that affects all DataTables and not just the search results table.

Help text under "More Search Guidelines" should be edited to remove the obsolete directive that "Search results are currently capped to 10 matches." Search example prompts should avoid use of quotation marks around search terms because quotation marks are an operator with distinct meaning. Encouraging users to apply quotation marks as a default will lead to unnecessarily narrowed results.

I've eliminated the quotation marks and instead used the <kbd> tag, which represents user input, and I've added custom styling for this tag. I've also removed the guideline regarding the 10-entry limit.

Manual testing spreadsheet hasn't been updated to adequately test search, including changes in this PR.

I've added test cases.

Documentation needs to be updated as per PR review requirements: devwiki and docs.cadasta.org are out of date for this implementation.

This is tricky. docs.cadasta.org is generally quite out-of-date for several recent features aside from search. I think updates for these need to be coordinated between the programs team and @bethschechter. As for devwiki, what is documented there are the feature requirements. I'm not sure if requirements need to be updated to reflect the actual implementation. @dpalomino, do you think we need to always update the requirements on the devwiki to reflect the actual implementation?

@amplifi amplifi merged commit 7acb2fb into master Mar 2, 2017
@amplifi amplifi deleted the feature/search-paging-mock branch March 2, 2017 13:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants