Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Empty 'extract' in Wikipedia response causes 'TypeError: list indices must be integers, not str'. #32

Closed
dmirylenka opened this issue Feb 6, 2014 · 5 comments

Comments

@dmirylenka
Copy link

>>> import wikipedia
>>> wikipedia.page('Fully connected network', auto_suggest=False, redirect=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/wikipedia/wikipedia.py", line 211, in page
    return WikipediaPage(title, redirect=redirect, preload=preload)
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/wikipedia/wikipedia.py", line 224, in __init__
    self.load(redirect=redirect, preload=preload)
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/wikipedia/wikipedia.py", line 276, in load
    self.__init__(title, redirect=redirect, preload=preload)
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/wikipedia/wikipedia.py", line 224, in __init__
    self.load(redirect=redirect, preload=preload)
  File "/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/wikipedia/wikipedia.py", line 250, in load
    pages = request['query']['pages']
TypeError: list indices must be integers, not str
@dmirylenka
Copy link
Author

This seems to be the code that causes the problem:

    extract = request['query']['pages'][pageid]['extract']

    # extract should be of the form "REDIRECT <new title>"
    # ("REDIRECT" could be translated to current language)
    title = ' '.join(extract.split('\n')[0].split()[1:]).strip()

For this particular page ("Fully connected network") the 'extract' is empty, so the title becomes empty as well. Then the code tries to get the wikipedia page with empty title:

GET /w/api.php?inprop=url&format=json&ppprop=disambiguation&titles=&action=query&prop=info%7Cpageprops HTTP/1.1

, which eventually causes the exception in these lines:

       request = _wiki_request(**query_params)
       pages = request['query']['pages']

@goldsmith
Copy link
Owner

I'll look into it. Do you have any idea why extract would be empty? In my experience, the content of any redirect page should be of the form detailed in the comment.

@dmirylenka
Copy link
Author

I am not very familiar with the Wikipedia API – just started using it.
I have only seen empty extracts for the redirect pages so far.
Another example:

http://en.wikipedia.org/w/api.php?prop=extracts&titles=Recommendation+systems&format=json&explaintext=&action=query

   {"query":{"pages":{"1648434":{"pageid":1648434,"ns":0,"title":"Recommendation systems","extract":""}}}}

Why aren't you using the 'redirects' key?
E.g.

http://en.wikipedia.org/w/api.php?prop=info&titles=Recommendation+systems&format=json&action=query&redirects

   {"query":{"redirects":[{"from":"Recommendation systems","to":"Recommender system"}],"pages":{"596646":{"pageid":596646,"ns":0,"title":"Recommender system","contentmodel":"wikitext","pagelanguage":"en","touched":"2014-02-05T06:32:28Z","lastrevid":594009289,"counter":"","length":42649}}}}

@goldsmith
Copy link
Owner

To be honest I never knew that 'redirects' was a key you could request in the Mediawiki API, great catch! That's definitely a better solution than the hacky parsing it's doing now. I'll work on a patch this weekend.

@SuzanaK
Copy link

SuzanaK commented Mar 5, 2014

I get the same error with this line:

wikipedia.page('King Cobra (malt liquor)')

The error message is:

TypeError: list indices must be integers, not str

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants