Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unescapes html in PageParser.href_match_to_url #191

Merged
merged 1 commit into from
Jan 7, 2016

Conversation

daveFNbuck
Copy link
Contributor

PageParser breaks if the links contain any escaped html characters. This fixes that bug.

@kwlzn
Copy link
Contributor

kwlzn commented Jan 1, 2016

would you mind also adding a quick test for this?

one mild concern to call out: this introduces a dependency on the xml.* class of stdlib modules which, depending on the environment python was compiled in (e.g. lacking supporting xml devel packages), may or may not be present in the stdlib on some systems/interpreters. I suppose that's sort of outside of our scope here tho - and I don't see a great portable alternative to xml.sax.saxutils.unescape short of writing our own.

@daveFNbuck
Copy link
Contributor Author

Sure, I can add a quick test. https://wiki.python.org/moin/EscapingHtml
suggests a short function we can use if the xml lib not existing is a
problem for some people. All I really need is the s.replace('&', '&')
line, as the issue I'm having is with parameters being passed in the url on
my custom pypi server.

On Fri, Jan 1, 2016 at 11:48 AM, Kris Wilson [email protected]
wrote:

would you mind also adding a quick test for this?

one mild concern to call out: this introduces a dependency on the xml.*
class of stdlib modules which, depending on the environment python was
compiled in (e.g. lacking supporting xml devel packages), may or may not be
present in the stdlib on some systems/interpreters. I suppose that's sort
of outside of our scope here tho - and I don't see a great portable
alternative to xml.sax.saxutils.unescape short of writing our own.


Reply to this email directly or view it on GitHub
#191 (comment).

PageParser breaks if the links contain any escaped characters. This fixes that
bug.
@daveFNbuck
Copy link
Contributor Author

ping

@kwlzn
Copy link
Contributor

kwlzn commented Jan 7, 2016

lgtm! :shipit:

kwlzn added a commit that referenced this pull request Jan 7, 2016
Unescape html in PageParser.href_match_to_url.
@kwlzn kwlzn merged commit 033707c into pex-tool:master Jan 7, 2016
@kwlzn
Copy link
Contributor

kwlzn commented Jan 7, 2016

thanks @daveFNbuck - this should be going out in the 1.1.2 release later today/tomorrow.

@daveFNbuck
Copy link
Contributor Author

Awesome, thanks!

On Thu, Jan 7, 2016 at 2:12 PM, Kris Wilson [email protected]
wrote:

thanks @daveFNbuck https://github.com/daveFNbuck - this should be going
out in the 1.1.2 release later today/tomorrow.


Reply to this email directly or view it on GitHub
#191 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants