-
-
Notifications
You must be signed in to change notification settings - Fork 719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
generalize filter function for sanitizers #3006
Conversation
Hi @lonvia, could you please review the PR? Thanks. |
or not self.allowed_ranks[obj.place.rank_address] | ||
or self.has_country_code | ||
and obj.place.country_code not in self.country_codes | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry, I have to complain about style here. I usually use a slash for multilines:
if not tags \
or not self.allowed_ranks[obj.place.rank_address] \
or self.has_country_code \
and obj.place.country_code not in self.country_codes:
We really should add this to the style guide.
|
||
assert filt(kind) | ||
|
||
|
||
@pytest.mark.parametrize('kind', ('de ', '123', '', 'bedece')) | ||
@pytest.mark.parametrize("kind", ("de ", "123", "", "bedece", None)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please go with the existing quoting style.
The convention for quotes is:
- use single quotes for all simple strings unless the string contains a single string (so, it should be 'London' but "London's subway")
- use double quotes for format strings (i.e.
f"London {foo}"
)
I know that this convention is not followed everywhere but we should at least get it consistent in new code.
filt = SanitizerConfig({'filter-kind': '.*de'}).get_filter_kind() | ||
filt = SanitizerConfig( | ||
{ | ||
"name": "abc", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this additional parameter needed?
6f6f637
to
5b4ef44
Compare
Hi @lonvia, I have made the required changes. Also, noted the suggestions for the future. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, two more small questions.
@@ -289,7 +289,7 @@ def run_sanitizer_on(self, country_code, rank_addr, suffix, **kwargs): | |||
|
|||
def test_string_arguments_pass(self): | |||
res = self.run_sanitizer_on('de', '25-30', r'[\s\S]*', | |||
name='foo', ref='foo', name_abc='bar', ref_abc='baz') | |||
name_xyz='foo', ref_pqr='foo', name_abc='bar', ref_abc='baz') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please explain why this needed to be changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because of the functionality of get_filter
function. If we want all suffixes to pass, we should ideally not provide the suffix parameter. If we provide a suffix parameter, it will not allow empty suffixes. This is because the function treats none
and empty string as different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. The None
case is really hard to wrap your head around. I think for non-programmers it is easier to understand if an absent suffix is handled like the empty string. It would also make the code in get_filter
easier because the function can go back to accepting str
only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with this. For these scenarios, we can consider None
value as an empty string safely. I'll add the changes.
""" | ||
filters = self.get_string_list('filter-kind', default) | ||
values = self.data.get(param, None) | ||
filters = self.get_string_list(param) if values is not None else default |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the motivation to change the semantics of get_string_list()
? It just makes the code here a lot more complicated. Without the change, you can simply leave the code here untouched and it works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, here we want to check if both values
and default
are none
so I am explicitly checking that. With the current definition of get_string_list
, it should not return none
, because type checking would complain.
And in case we allow it to also return none
, we would explicitly have to check for none
values wherever we use it. Also, I think it would be good idea to let the function only return a list as it would make the function return value be more predictable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The None
behaviour was the intention, the type annotation is wrong. It's odd that mypy never caught that error.
I see your point about having to check for None
when using the function. However, once this PR is merged, I see only two uses of the function: here and for the country codes in the delete-tags sanitizer. Both would benefit from a None
return. For the country_code it could replace the explicit has_country_code
variable.
Oh well, up to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That could be done, but I would prefer to be it like this.
""" | ||
filters = self.get_string_list('filter-kind', default) | ||
values = self.data.get(param, None) | ||
filters = self.get_string_list(param) if values is not None else default |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The None
behaviour was the intention, the type annotation is wrong. It's odd that mypy never caught that error.
I see your point about having to check for None
when using the function. However, once this PR is merged, I see only two uses of the function: here and for the country codes in the delete-tags sanitizer. Both would benefit from a None
return. For the country_code it could replace the explicit has_country_code
variable.
Oh well, up to you.
|
||
if not filters: | ||
if filters is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This changes the behaviour when the parameter is present but empty. It's an odd edge case but the idea was to have it behave as if the parameter was not present at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. This is less intuitive unless explicitly mentioned. What I can think of is - we can do either of two things-
- Go with these changes and let
[]
keep its meaning i.e allow nothing. Because to allow anything to pass we can just skip the parameter, which ideally makes sense. - We can revert back to the original code and explicitly mention that
[]
is just like skipping the parameter. Also with that, if I allowget_string_list
to takeNone
default and return an empty list for it, we could bring back theget_string_list(param, default)
thing. But it will not be usable for a few cases like inclean_housenumber
, an empty list forconvert-to-name
tag actually means no names.
@lonvia please let me know what you think and then I will give it a go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It took me a while to untangle this. Problem here is that []
appears in two very different cases:
The first case is that somebody has given an empty field in yaml, e.g.
- step: clean-housenumbers
filter-kind:
This results in param=''´
which is converted to []
by get_String_list
. This is the case where I think it should behave as if the parameter was missing.
The second case is that a default of []
is given and the parameter is omitted. In your current implementation this is equivalent to 'let none pass', while default=None
results in 'let all pass'.
So the actual fix for the first case (which I meant in my comment) is to change a line further up:
filters = self.get_string_list(param) if values else default
That said, it would be a good idea to be more explicit in the default parameter. Allow default=PASS_ALL
, default=FAIL_ALL
and default=[<list of regexes>]
. Do not allow None or empty lists.
5b4ef44
to
8f03c80
Compare
Hi @lonvia, I have made the following changes-
Request you to please have a look. |
As discussed here filter functions are commonly used by sanitizers to process only selected tags of places.
So, instead of having separate filters for each parameter of sanitizer configuration, we can have a generalized filter function
get_filter()
which takes the parameter name as one of the arguments and creates a filter function for it.