-
-
Notifications
You must be signed in to change notification settings - Fork 719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds sanitizer for preventing certain tags to enter search index based on parameters #2993
Conversation
Hi @lonvia, I have made the required changes, please have a look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks quite good already.
Your tests do not yet cover the use case, where only some of the arguments are given. For example, if the country_code
parameter is not specified, then the filter should not check for country codes at all.
And thanks for fixing all my bad spelling. If you want to get it out of the way while working on the code comments, feel free to split them out in a separate PR. That can be merged quickly then.
As a general comment: you are welcome to rebase the branch for the PR at any time. So if you want to fix up or squash commits, feel free to do so.
if self.filter_kind(tag.kind): | ||
|
||
if (obj.place.country_code in self.country_codes and | ||
self._in_rank_addresses(obj.place.rank_address)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two conditions apply to the whole object, not to the individual tags. That means, if they are not fulfilled, none of the tags will filtered. You can therefore have a shortcut and check those up in line 77 already.
""" Return True if the given rank address lies in any of the | ||
rank address intervals. Otherwise, returns False. | ||
""" | ||
return any(l <= rank <= r for l, r in self.rank_address_intervals) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can do the check for rank more efficiently here. You just need to do a bit more preprocessing: we know that there are only ranks 0-30. So during preprocessing of the incoming configuration you can create a set of allowed rank numbers. That means spell out every single number to be accepted. For example, from the default configuration of '0-30' you can create a allowed_ranks = set(range(0, 31))
and then simply check obj.place.rank_address in allowed_ranks
.
(Even more advanced and faster: An alternative to a set would be a tuple of 31 booleans, so that eventually the lookup is allowed_ranks[obj.place.rank_address]
.)
Thanks for the review @lonvia. As suggested, I have raised a separate PR which fixes typos here and reverted the commit here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks mostly good now.
One more tiny thing: I did not mean to completely replace the test which has all parameters set but rather add to it. So if you could bring back the test which has all parameters set and checks that things still go well, this PR is good to go.
Just to confirm, now we need to add tests where all parameters are set in the sanitizer configuration. |
Yes. You now have unit tests that make sure input for each single parameter is handled correctly. That is good. It gives a good code coverage for the tests. However, just because we know the single parts work, doesn't mean that they still do the right thing when combined. Testing all possible combinations of possible parameters is too much. So the second best thing is to have a test with all parameters set, tested against a couple of meaningful data points. |
…d on parameters fix: pylint error added docs for delete tags sanitizer fixed typos in docs and code comments fix: python typechecking error fixed rank address type Revert "fixed typos in docs and code comments" This reverts commit 6839eea. added default parameters and refactored code added test for all parameters
Hi @lonvia, I have added the tests. Please have a look. |
Thank you! If you are interested in a bit of a follow-up: The |
That's a great idea! I will create a new PR for this. |
References #2953.
It would prevent certain tags which are only used for referencing some place to enter the search index, as they are not part of the current object also these tags wrongly populate the search index because of which sometimes actual places are pushed to lower rankings(or even removed) in search. For Example: #2652.