Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add get port functionality #25

Merged
merged 4 commits into from
Apr 10, 2020
Merged

Add get port functionality #25

merged 4 commits into from
Apr 10, 2020

Conversation

yabirgb
Copy link
Contributor

@yabirgb yabirgb commented Apr 9, 2020

No description provided.

@birdsarah
Copy link
Collaborator

birdsarah commented Apr 9, 2020

Are you still working on this? If yes, change the title to have [WIP] at the front and when you're ready for review, just remove the [WIP].

@yabirgb
Copy link
Contributor Author

yabirgb commented Apr 9, 2020

@birdsarah I believe everything is ready

Copy link
Collaborator

@birdsarah birdsarah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to take a slightly different approach to the one you've built here in order to reuse code and keep the behavior of the library, and API, consistent across methods.

See the following code from get_stripped_url:

    purl = urlparse(url)
    _scheme = purl.scheme

    # To handle the case where we have no scheme, but we have a port
    # we have the following heuristic. Does scheme have a . in it
    # which is stdlib behavior when not recognizing a netloc due to
    # lack of //. If TLDExtract, can find a suffix in the _scheme
    # then it's probably a domain without an http.
    if '.' in _scheme:
        # From the docs: "urlparse recognizes a netloc only
        # if it is properly introduced by ‘//’". So we
        # prepend to get results we expect.
        if extractor(_scheme).suffix != '' or is_ip_address(_scheme):
            url = '//{url}'.format(url=url)

This could be a private method, something like, _update_url_to_handle_port_and_no_scheme that returns the url either modified or not.

This new private method can be used by both get_port and get_stripped_url.

This will have the nice benefit of making get_stripped_url a little cleaner.

Get port should then have an easy time returning the port.

I would also like to omit the strict flag unless you have a strong reason for it. If you do, then that reasoning should also be incorporated into the get_stripped_url function. So I'd rather open an issue to discuss the flag and then implement in both places.

I'm thrilled to see lots of tests, but we don't need to overdo it. My general rule is that I don't test imported libraries, especially not stdlib. So you only need sufficient tests to test the logic that you're implementing. If that doesn't make sense let me know.

@birdsarah
Copy link
Collaborator

You may also be interested in the additional comment notes I wrote here: https://github.com/mozilla/domain_utils/pull/22/files#diff-01e7661674408a06f910cdb3bd537536L200-L210

@yabirgb
Copy link
Contributor Author

yabirgb commented Apr 9, 2020

I've proceeded as you mentioned abstracting this behave in a shared function. I've found the case of urls like localhost:5000 that were not properly catch under the old code and now they are.

My intention with those tests is to ensure that it behaves as I expect and not to test urllib. If you don't see the need for those they can be removed.

About the strict param my intention was to give the option to work as urllib does or under our logic. I've removed as it might be unnecessary and probably not recommended.

@birdsarah
Copy link
Collaborator

My intention with those tests is to ensure that it behaves as I expect and not to test urllib. If you don't see the need for those they can be removed.

That sounds great. It's a bit of an odd library. Lots of what seem like redundant test cases but we need to make sure we've tested the edge cases we know of. Just wanted to set expectations.

domain_utils/domain_utils.py Outdated Show resolved Hide resolved
domain_utils/domain_utils.py Outdated Show resolved Hide resolved
@@ -219,7 +231,6 @@ def get_stripped_url(url, scheme=False, drop_non_http=False, use_netloc=True, ex
else:
return url

purl = urlparse(url)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch.

tests/test_get_port.py Outdated Show resolved Hide resolved
tests/test_get_port.py Show resolved Hide resolved
@birdsarah birdsarah merged commit 934fcf9 into openwpm:master Apr 10, 2020
@birdsarah
Copy link
Collaborator

Thanks @yabirgb!

@birdsarah birdsarah mentioned this pull request Apr 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants