Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AttributeError: nonexistent IDs in id_lists yield invalid entries #80

Closed
lukasschwab opened this issue Aug 15, 2021 · 3 comments · Fixed by #81
Closed

AttributeError: nonexistent IDs in id_lists yield invalid entries #80

lukasschwab opened this issue Aug 15, 2021 · 3 comments · Fixed by #81
Assignees
Labels
api Issues that correspond to arXiv API behavior rather than behavior introduced by this wrapper. bug Deviations from documented behavior.

Comments

@lukasschwab
Copy link
Owner

Description

A clear and concise description of what the bug is.

When a specified ID doesn't correspond to an arXiv paper, the results feed includes an entry element missing expected fields (id).

The status is 200, but feedparser chokes and the error-handling in this package tries to access the nonexistent ID, yielding a raw AttributeError

Steps to reproduce

Steps to reproduce the behavior; ideally, include a code snippet.

Example API feed: http://export.arxiv.org/api/query?id_list=2208.05394

>>> import arxiv
>>> pub = next(arxiv.Search(id_list=["2208.05394"]).get())
Traceback (most recent call last):
  File "/Users/lukas/.pyenv/versions/3.7.9/lib/python3.7/site-packages/feedparser/util.py", line 156, in __getattr__
    return self.__getitem__(key)
  File "/Users/lukas/.pyenv/versions/3.7.9/lib/python3.7/site-packages/feedparser/util.py", line 113, in __getitem__
    return dict.__getitem__(self, key)
KeyError: 'id'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/lukas/.pyenv/versions/3.7.9/lib/python3.7/site-packages/arxiv/arxiv.py", line 586, in results
    yield Result._from_feed_entry(entry)
  File "/Users/lukas/.pyenv/versions/3.7.9/lib/python3.7/site-packages/arxiv/arxiv.py", line 122, in _from_feed_entry
    entry.id
  File "/Users/lukas/.pyenv/versions/3.7.9/lib/python3.7/site-packages/feedparser/util.py", line 158, in __getattr__
    raise AttributeError("object has no attribute '%s'" % key)
AttributeError: object has no attribute 'id'

Expected behavior

A clear and concise description of what you expected to happen.

This package's error handling should return a neatly handleable error.

Versions

  • python version: 3.7.9
  • arxiv.py version: 1.4.1
@lukasschwab lukasschwab added the bug Deviations from documented behavior. label Aug 15, 2021
@lukasschwab lukasschwab self-assigned this Aug 15, 2021
@lukasschwab lukasschwab added the api Issues that correspond to arXiv API behavior rather than behavior introduced by this wrapper. label Aug 16, 2021
@lukasschwab
Copy link
Owner Author

lukasschwab commented Aug 16, 2021

A design problem: the feeds for id_list-only queries are ordinal matches for the IDs in the id_list. If you want to see if the nth ID exists in arXiv, check if the nth entry in the feed is well-formed or empty. See, for example, this feed.

Returning None from the generator would preserve this relationship, but forces clients to check whether entries are None when processing them.

Skipping the partial entries breaks the ordinal relationship. There's a work-around: you can still check existence by looking up in the aggregate results.

Since this usage (testing ID existence) seems less likely, I'm inclined to require some dependents to do the latter rather than requiring all projects to do the former.

If this use case turns out to be common, we can parameterize an invalid-entry handler in the Client options, e.g. lambda entry: None, to. override the skipping.

@lukasschwab lukasschwab changed the title Empty id_list results include an invalid entry AttributeError: nonexistent IDs in id_lists yield invalid entries Aug 16, 2021
@lukasschwab
Copy link
Owner Author

Another risk with skipping partial results: doing so may confuse a dependent's length-checking pagination logic.

@lukasschwab
Copy link
Owner Author

Final consideration:

Skipping the partial entries breaks the ordinal relationship. There's a work-around: you can still check existence by looking up in the aggregate results.

Since this usage (testing ID existence) seems less likely, I'm inclined to require some dependents to do the latter rather than requiring all projects to do the former.

No dependent of this package relies on the ordinal relationship, because any request that would be impacted by this change currently fails. Skipping the results is the least disruptive option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api Issues that correspond to arXiv API behavior rather than behavior introduced by this wrapper. bug Deviations from documented behavior.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant