Add get port functionality #25

yabirgb · 2020-04-09T18:07:25Z

No description provided.

birdsarah · 2020-04-09T18:40:57Z

Are you still working on this? If yes, change the title to have [WIP] at the front and when you're ready for review, just remove the [WIP].

yabirgb · 2020-04-09T18:45:44Z

@birdsarah I believe everything is ready

birdsarah

I'd like to take a slightly different approach to the one you've built here in order to reuse code and keep the behavior of the library, and API, consistent across methods.

See the following code from get_stripped_url:

    purl = urlparse(url)
    _scheme = purl.scheme

    # To handle the case where we have no scheme, but we have a port
    # we have the following heuristic. Does scheme have a . in it
    # which is stdlib behavior when not recognizing a netloc due to
    # lack of //. If TLDExtract, can find a suffix in the _scheme
    # then it's probably a domain without an http.
    if '.' in _scheme:
        # From the docs: "urlparse recognizes a netloc only
        # if it is properly introduced by ‘//’". So we
        # prepend to get results we expect.
        if extractor(_scheme).suffix != '' or is_ip_address(_scheme):
            url = '//{url}'.format(url=url)

This could be a private method, something like, _update_url_to_handle_port_and_no_scheme that returns the url either modified or not.

This new private method can be used by both get_port and get_stripped_url.

This will have the nice benefit of making get_stripped_url a little cleaner.

Get port should then have an easy time returning the port.

I would also like to omit the strict flag unless you have a strong reason for it. If you do, then that reasoning should also be incorporated into the get_stripped_url function. So I'd rather open an issue to discuss the flag and then implement in both places.

I'm thrilled to see lots of tests, but we don't need to overdo it. My general rule is that I don't test imported libraries, especially not stdlib. So you only need sufficient tests to test the logic that you're implementing. If that doesn't make sense let me know.

birdsarah · 2020-04-09T18:58:00Z

You may also be interested in the additional comment notes I wrote here: https://github.com/mozilla/domain_utils/pull/22/files#diff-01e7661674408a06f910cdb3bd537536L200-L210

yabirgb · 2020-04-09T20:29:57Z

I've proceeded as you mentioned abstracting this behave in a shared function. I've found the case of urls like localhost:5000 that were not properly catch under the old code and now they are.

My intention with those tests is to ensure that it behaves as I expect and not to test urllib. If you don't see the need for those they can be removed.

About the strict param my intention was to give the option to work as urllib does or under our logic. I've removed as it might be unnecessary and probably not recommended.

birdsarah · 2020-04-09T21:58:04Z

My intention with those tests is to ensure that it behaves as I expect and not to test urllib. If you don't see the need for those they can be removed.

That sounds great. It's a bit of an odd library. Lots of what seem like redundant test cases but we need to make sure we've tested the edge cases we know of. Just wanted to set expectations.

domain_utils/domain_utils.py

birdsarah · 2020-04-09T21:59:52Z

domain_utils/domain_utils.py

@@ -219,7 +231,6 @@ def get_stripped_url(url, scheme=False, drop_non_http=False, use_netloc=True, ex
        else:
            return url

-    purl = urlparse(url)


Nice catch.

tests/test_get_port.py

domain_utils/domain_utils.py

birdsarah · 2020-04-10T14:38:22Z

Thanks @yabirgb!

yabirgb added 2 commits April 9, 2020 20:07

Add get port functionality

222b10b

Fix style in tests

83512c2

birdsarah requested changes Apr 9, 2020

View reviewed changes

Update behave and share code

3313f5e

birdsarah approved these changes Apr 9, 2020

View reviewed changes

birdsarah requested changes Apr 9, 2020

View reviewed changes

birdsarah reviewed Apr 9, 2020

View reviewed changes

domain_utils/domain_utils.py Outdated Show resolved Hide resolved

Pass extractor to the private function

238eb52

birdsarah merged commit 934fcf9 into openwpm:master Apr 10, 2020

birdsarah mentioned this pull request Apr 10, 2020

Add new method get_port #16

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add get port functionality #25

Add get port functionality #25

yabirgb commented Apr 9, 2020

birdsarah commented Apr 9, 2020 •

edited

Loading

yabirgb commented Apr 9, 2020

birdsarah left a comment

birdsarah commented Apr 9, 2020

yabirgb commented Apr 9, 2020

birdsarah commented Apr 9, 2020

birdsarah Apr 9, 2020

birdsarah commented Apr 10, 2020

Add get port functionality #25

Add get port functionality #25

Conversation

yabirgb commented Apr 9, 2020

birdsarah commented Apr 9, 2020 • edited Loading

yabirgb commented Apr 9, 2020

birdsarah left a comment

Choose a reason for hiding this comment

birdsarah commented Apr 9, 2020

yabirgb commented Apr 9, 2020

birdsarah commented Apr 9, 2020

birdsarah Apr 9, 2020

Choose a reason for hiding this comment

birdsarah commented Apr 10, 2020

birdsarah commented Apr 9, 2020 •

edited

Loading