-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add search pagination and mock ES #1132
Conversation
ed51415
to
1b6e0e9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The functionality for the mock part looks good. I just have a few questions and remarks.
One general question:
Instead of creating a new app (mock_es
) could that functionality not live in search
, maybe inside a sub-module called mock_es
? It's related to search after all and it will keep our app structure neater.
cadasta/mock_es/views.py
Outdated
resources = list(Resource.objects.filter(project=project)) | ||
|
||
entities = [] | ||
while len(locations) + len(parties) + len(rels) + len(resources) > 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please correct me if I don't understand this loop properly. Inside the loop you're removing entities from each of the entities one-by-obe and them to the list entities
until all of the original lists are empty. What's the idea behind this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is to provide a deterministic ordered list of "search results" that's not just a list of locations followed by parties, then relationships, then resources. So I'm interleaving all 4 types of entities.
cadasta/mock_es/views.py
Outdated
num_page_results = request.data.get('size', 10) | ||
|
||
hits = [] | ||
for entity in entities[start_idx:start_idx + num_page_results]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would write this as a list comprehension; it will be more efficient:
hits = [self.transform(entity)
for entity in entities[start_idx:start_idx + num_page_results]]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK.
cadasta/search/views/async.py
Outdated
query = request.query_params.get('q') | ||
def post(self, request, *args, **kwargs): | ||
query = request.data.get('q') | ||
start_idx = self.convert_field_to_int(request.data, 'start', 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this really need a separate method to cast this to int
? In which cases would ValueError
in convert_field_to_int
be thrown?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea to cast values to integer came from the recommendation of DataTables in their server-side processing mode:
The
draw
counter that this object is a response to - from the draw parameter sent as part of the data request. Note that it is strongly recommended for security reasons that you cast this parameter to an integer, rather than simply echoing back to the client what it sent in thedraw
parameter, in order to prevent Cross Site Scripting (XSS) attacks.
So aside from casting the draw
parameter to an integer, I am casting all expected integer values (including start
and length
) into integers just to be safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't questioning the need for casting to int
. I was wondering whether it's necessary to implement this into a method if it might be possible to make this explicit in this line. I notice that you catch ValueError
exceptions in the method; I guess that was the reason for putting it into a method. Is it likely that a value that cannot be cast to int
is provided to any of these parameters (draw
, start
, length
)? If it's likely under which circumstances will this happen? If it's not likely, it might be better not to catch the exception and fail loudly; otherwise there's a risk of introducing a bug that will be difficult to work out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, OK. I understand. It is expected that these parameters have only integer values. If they aren't, then somebody is doing something malicious and I agree that it would be better not to catch the ValueError
exception in that case. So, I have removed the conversion method and inlined the integer conversions.
Yeah, that makes sense. OK, I'll just move everything under |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good stuff!
'error': 'unavailable', | ||
}) | ||
|
||
num_hits = raw_results['hits']['total'] | ||
results = raw_results['hits']['hits'] | ||
|
||
if len(results) == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems a little strange. Is there no way to configure ES to return a timestamp even when there are no results?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bjohare, unfortunately, no. The timestamp is always stored with the records. So if no results are returned, we would need to fetch a dummy result to get the timestamp. (There's a relevant discussion on the search Slack channel.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Have tested in Dev VM. My inline comment about the timestamp is not a blocker.
@amplifi Can we get this branch on staging next to test pagination under production conditions? |
Can't verify functionality of Pagination drop-down still displays options to show 10/25/50/100 results even when less than 10 results are returned; should be hidden. Help text under "More Search Guidelines" should be edited to remove the obsolete directive that "Search results are currently capped to 10 matches." Search example prompts should avoid use of quotation marks around search terms because quotation marks are an operator with distinct meaning. Encouraging users to apply quotation marks as a default will lead to unnecessarily narrowed results. Manual testing spreadsheet hasn't been updated to adequately test search, including changes in this PR. Documentation needs to be updated as per PR review requirements: devwiki and docs.cadasta.org are out of date for this implementation. Still testing mock ES |
Mock ES looks good to me; just the above. |
a64c01d
to
98b50bf
Compare
This is the current behavior of all DataTables in the website. I agree that it would be nice if this option can be hidden if DataTable has less than 10 rows, but I prefer that this be done in a separate PR that affects all DataTables and not just the search results table.
I've eliminated the quotation marks and instead used the
I've added test cases.
This is tricky. docs.cadasta.org is generally quite out-of-date for several recent features aside from search. I think updates for these need to be coordinated between the programs team and @bethschechter. As for devwiki, what is documented there are the feature requirements. I'm not sure if requirements need to be updated to reflect the actual implementation. @dpalomino, do you think we need to always update the requirements on the devwiki to reflect the actual implementation? |
Proposed changes in this pull request
start
andlength
parameters into ES'from
andsize
parameters. Note that this approach is only feasible if there is a 1-to-1 correspondence between ES' search results and the results presented in the UI. If the platform decides to insert or remove results (for example, the platform splits an ES party-relationship document result into a party UI result and a relationship UI result), this approach completely breaks down and we need to go to the platform-side approach. Currently I am still assuming a 1-to-1 correspondence.from
andsize
input parameters andhits.total
return value. The platform endpoint and UI is also converted from using GET to using POST, and the UI also provides the CSRF token for the POST AJAX request._score
sorting to the API request to ES.mock_ES
Django app to the platform dev environment with corresponding URLs and views. The URLs match the ES API in our deployment environments.When should this PR be merged
Anytime.
Risks
This should be extensively tested in the dev VM (for the mock ES app) and the staging environment (for the pagination feature) to test that everything works correctly.
Follow up actions
None. This PR needs no updated packages or dependencies or ES reindexing.
Checklist (for reviewing)
General
migration
label if a new migration is added.Functionality
Code
Tests
Documentation