Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linkchecker internal error (apostrophe handling?) #732

Open
chrishanretty opened this issue Aug 29, 2017 · 1 comment
Open

Linkchecker internal error (apostrophe handling?) #732

chrishanretty opened this issue Aug 29, 2017 · 1 comment

Comments

@chrishanretty
Copy link

I'm reporting an internal error as requested. Full output is below. The error was repeated several times with pages on this site: the common factor seems to be the presence of an apostrophe in the url.

********** Oops, I did it again. *************

You have found an internal error in LinkChecker. Please write a bug report
at https://github.com/wummel/linkchecker/issues
and include the following information:

  • the URL or file you are testing
  • the system information below

When using the commandline client:

  • your commandline arguments and any custom configuration files.
  • the output of a debug run with option "-Dall"

Not disclosing some of the information above due to privacy reasons is ok.
I will try to help you nonetheless, but you have to give me something
I can work with ;) .

Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/linkcheck/director/checker.py", line 104, in check_url
line: self.check_url_data(url_data)
locals:
self = <Checker(CheckThread-https://www.royalholloway.ac.uk/politicsandir/research/dec/blogs/articles/why-theresa-may%27s-gamble-at-the-polls-failed.aspx?nomobile=0, started 140621508146944)>
self.check_url_data = <bound method Checker.check_url_data of <Checker(CheckThread-https://www.royalholloway.ac.uk/politicsandir/research/dec/blogs/articles/why-theresa-may%27s-gamble-at-the-polls-failed.aspx?nomobile=0, started 140621508146944)>>
url_data = <https link, base_url=u'?nomobile=0', parent_url=u'https://www.royalholloway.ac.uk/politicsandir/research/dec/blogs/articles/why-theresa-may%27s-gamble-at-the-polls-failed.aspx', base_ref=None, recursion_level=3, url_connection=None, line=613, column=49, page=0, name=u'Mobile site view', anchor=u...
File "/usr/lib/python2.7/dist-packages/linkcheck/director/checker.py", line 120, in check_url_data
line: check_url(url_data, self.logger)
locals:
check_url = <function check_url at 0x7fe5007c7230>
url_data = <https link, base_url=u'?nomobile=0', parent_url=u'https://www.royalholloway.ac.uk/politicsandir/research/dec/blogs/articles/why-theresa-may%27s-gamble-at-the-polls-failed.aspx', base_ref=None, recursion_level=3, url_connection=None, line=613, column=49, page=0, name=u'Mobile site view', anchor=u...
self = <Checker(CheckThread-https://www.royalholloway.ac.uk/politicsandir/research/dec/blogs/articles/why-theresa-may%27s-gamble-at-the-polls-failed.aspx?nomobile=0, started 140621508146944)>
self.logger = <linkcheck.director.logger.Logger object at 0x7fe501af9e50>
File "/usr/lib/python2.7/dist-packages/linkcheck/director/checker.py", line 64, in check_url
line: parser.parse_url(url_data)
locals:
parser = <module 'linkcheck.parser' from '/usr/lib/python2.7/dist-packages/linkcheck/parser/init.pyc'>
parser.parse_url = <function parse_url at 0x7fe5007bf848>
url_data = <https link, base_url=u'?nomobile=0', parent_url=u'https://www.royalholloway.ac.uk/politicsandir/research/dec/blogs/articles/why-theresa-may%27s-gamble-at-the-polls-failed.aspx', base_ref=None, recursion_level=3, url_connection=None, line=613, column=49, page=0, name=u'Mobile site view', anchor=u...
File "/usr/lib/python2.7/dist-packages/linkcheck/parser/init.py", line 39, in parse_url
line: globals()funcname
locals:
globals =
funcname = 'parse_html', len = 10
url_data = <https link, base_url=u'?nomobile=0', parent_url=u'https://www.royalholloway.ac.uk/politicsandir/research/dec/blogs/articles/why-theresa-may%27s-gamble-at-the-polls-failed.aspx', base_ref=None, recursion_level=3, url_connection=None, line=613, column=49, page=0, name=u'Mobile site view', anchor=u...
File "/usr/lib/python2.7/dist-packages/linkcheck/parser/init.py", line 48, in parse_html
line: find_links(url_data, url_data.add_url, linkparse.LinkTags)
locals:
find_links = <function find_links at 0x7fe5007bfc80>
url_data = <https link, base_url=u'?nomobile=0', parent_url=u'https://www.royalholloway.ac.uk/politicsandir/research/dec/blogs/articles/why-theresa-may%27s-gamble-at-the-polls-failed.aspx', base_ref=None, recursion_level=3, url_connection=None, line=613, column=49, page=0, name=u'Mobile site view', anchor=u...
url_data.add_url = <bound method HttpUrl.add_url of <https link, base_url=u'?nomobile=0', parent_url=u'https://www.royalholloway.ac.uk/politicsandir/research/dec/blogs/articles/why-theresa-may%27s-gamble-at-the-polls-failed.aspx', base_ref=None, recursion_level=3, url_connection=None, line=613, column=49, page=0, n...
linkparse = <module 'linkcheck.htmlutil.linkparse' from '/usr/lib/python2.7/dist-packages/linkcheck/htmlutil/linkparse.pyc'>
linkparse.LinkTags = {'tr': [u'background'], 'q': [u'cite'], 'meta': [u'content', u'href'], 'isindex': [u'action'], 'track': [u'src'], 'applet': [u'archive', u'src'], 'object': [u'classid', u'data', u'archive', u'usemap', u'codebase'], None: [u'style', u'itemtype'], 'layer': [u'background', u'src'], 'html': [u'manife..., len = 35
File "/usr/lib/python2.7/dist-packages/linkcheck/parser/init.py", line 126, in find_links
line: parser.feed(url_data.get_content())
locals:
parser = <linkcheck.HtmlParser.htmlsax.parser object at 0x7fe4cb190418>
parser.feed = <built-in method feed of linkcheck.HtmlParser.htmlsax.parser object at 0x7fe4cb190418>
url_data = <https link, base_url=u'?nomobile=0', parent_url=u'https://www.royalholloway.ac.uk/politicsandir/research/dec/blogs/articles/why-theresa-may%27s-gamble-at-the-polls-failed.aspx', base_ref=None, recursion_level=3, url_connection=None, line=613, column=49, page=0, name=u'Mobile site view', anchor=u...
url_data.get_content = <bound method HttpUrl.get_content of <https link, base_url=u'?nomobile=0', parent_url=u'https://www.royalholloway.ac.uk/politicsandir/research/dec/blogs/articles/why-theresa-may%27s-gamble-at-the-polls-failed.aspx', base_ref=None, recursion_level=3, url_connection=None, line=613, column=49, page=...
File "/usr/lib/python2.7/dist-packages/linkcheck/htmlutil/linkparse.py", line 231, in start_element
line: self.parse_tag(tag, attr, value, name, base)
locals:
self = <linkcheck.htmlutil.linkparse.LinkFinder object at 0x7fe4dcec1910>
self.parse_tag = <bound method LinkFinder.parse_tag of <linkcheck.htmlutil.linkparse.LinkFinder object at 0x7fe4dcec1910>>
tag = u'link'
attr = u'href'
value = u'/siteelements/styles/100-system.css?version=2692258?version=2692258', len = 67
name = u''
base = u''
File "/usr/lib/python2.7/dist-packages/linkcheck/htmlutil/linkparse.py", line 277, in parse_tag
line: self.found_url(value, name, base)
locals:
self = <linkcheck.htmlutil.linkparse.LinkFinder object at 0x7fe4dcec1910>
self.found_url = <bound method LinkFinder.found_url of <linkcheck.htmlutil.linkparse.LinkFinder object at 0x7fe4dcec1910>>
value = u'/siteelements/styles/100-system.css?version=2692258?version=2692258', len = 67
name = u''
base = u''
File "/usr/lib/python2.7/dist-packages/linkcheck/htmlutil/linkparse.py", line 283, in found_url
line: column=self.parser.last_column(), name=name, base=base)
locals:
column =
self = <linkcheck.htmlutil.linkparse.LinkFinder object at 0x7fe4dcec1910>
self.parser = <linkcheck.HtmlParser.htmlsax.parser object at 0x7fe4cb190418>
self.parser.last_column = <built-in method last_column of linkcheck.HtmlParser.htmlsax.parser object at 0x7fe4cb190418>
name = u''
base = u''
File "/usr/lib/python2.7/dist-packages/linkcheck/checker/urlbase.py", line 653, in add_url
line: page=page, name=name, parent_content_type=self.content_type)
locals:
page = 0
name = u''
parent_content_type =
self = <https link, base_url=u'?nomobile=0', parent_url=u'https://www.royalholloway.ac.uk/politicsandir/research/dec/blogs/articles/why-theresa-may%27s-gamble-at-the-polls-failed.aspx', base_ref=None, recursion_level=3, url_connection=None, line=613, column=49, page=0, name=u'Mobile site view', anchor=u...
self.content_type = 'text/html', len = 9
File "/usr/lib/python2.7/dist-packages/linkcheck/checker/init.py", line 125, in get_url_from
line: line=line, column=column, page=page, name=name, extern=extern)
locals:
line = 8
column = 422
page = 0
name = u''
extern = None
File "/usr/lib/python2.7/dist-packages/linkcheck/checker/urlbase.py", line 117, in init
line: aggregate, line, column, page, name, url_encoding, extern)
locals:
aggregate = <linkcheck.director.aggregator.Aggregate object at 0x7fe501af9610>
line = 8
column = 422
page = 0
name = u''
url_encoding = None
extern = None
File "/usr/lib/python2.7/dist-packages/linkcheck/checker/urlbase.py", line 157, in init
line: "unquoted parent URL %r" % self.parent_url
locals:
self = <None link, base_url=u'/siteelements/styles/100-system.css?version=2692258?version=2692258', parent_url=u"https://www.royalholloway.ac.uk/politicsandir/research/dec/blogs/articles/why-theresa-may's-gamble-at-the-polls-failed.aspx?478490430", base_ref=None, recursion_level=4, url_connection=None, ...
self.parent_url = u"https://www.royalholloway.ac.uk/politicsandir/research/dec/blogs/articles/why-theresa-may's-gamble-at-the-polls-failed.aspx?478490430", len = 133
AssertionError: unquoted parent URL u"https://www.royalholloway.ac.uk/politicsandir/research/dec/blogs/articles/why-theresa-may's-gamble-at-the-polls-failed.aspx?478490430"
System info:
LinkChecker 9.3
Released on: 16.7.2014
Python 2.7.13 (default, Jan 19 2017, 14:48:08)
[GCC 6.3.0 20170118] on linux2
Requests: 2.10.0
Modules: Sqlite
Local time: 2017-08-29 12:35:20+001
sys.argv: ['/usr/bin/linkchecker', 'https://www.royalholloway.ac.uk/politicsandir/home.aspx']
LANGUAGE = 'en_GB:en'
LANG = 'en_GB.UTF-8'
Default locale: ('en', 'UTF-8')

@dpalic
Copy link

dpalic commented Oct 29, 2017

Thank you for the issue report. Sadly this project is dead, and a new team is around with https://github.com/linkcheck/linkchecker
for more details please see: #708
Also please close this issue and report it freshly on the new repo https://github.com/linkcheck/linkchecker/issues

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants