AttributeError: nonexistent IDs in `id_list`s yield invalid entries #80

lukasschwab · 2021-08-15T21:24:43Z

Description

A clear and concise description of what the bug is.

When a specified ID doesn't correspond to an arXiv paper, the results feed includes an entry element missing expected fields (id).

The status is 200, but feedparser chokes and the error-handling in this package tries to access the nonexistent ID, yielding a raw AttributeError

Steps to reproduce

Steps to reproduce the behavior; ideally, include a code snippet.

Example API feed: http://export.arxiv.org/api/query?id_list=2208.05394

>>> import arxiv
>>> pub = next(arxiv.Search(id_list=["2208.05394"]).get())
Traceback (most recent call last):
  File "/Users/lukas/.pyenv/versions/3.7.9/lib/python3.7/site-packages/feedparser/util.py", line 156, in __getattr__
    return self.__getitem__(key)
  File "/Users/lukas/.pyenv/versions/3.7.9/lib/python3.7/site-packages/feedparser/util.py", line 113, in __getitem__
    return dict.__getitem__(self, key)
KeyError: 'id'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/lukas/.pyenv/versions/3.7.9/lib/python3.7/site-packages/arxiv/arxiv.py", line 586, in results
    yield Result._from_feed_entry(entry)
  File "/Users/lukas/.pyenv/versions/3.7.9/lib/python3.7/site-packages/arxiv/arxiv.py", line 122, in _from_feed_entry
    entry.id
  File "/Users/lukas/.pyenv/versions/3.7.9/lib/python3.7/site-packages/feedparser/util.py", line 158, in __getattr__
    raise AttributeError("object has no attribute '%s'" % key)
AttributeError: object has no attribute 'id'

Expected behavior

A clear and concise description of what you expected to happen.

This package's error handling should return a neatly handleable error.

Versions

python version: 3.7.9

arxiv.py version: 1.4.1

The text was updated successfully, but these errors were encountered:

lukasschwab · 2021-08-16T02:07:13Z

A design problem: the feeds for id_list-only queries are ordinal matches for the IDs in the id_list. If you want to see if the nth ID exists in arXiv, check if the nth entry in the feed is well-formed or empty. See, for example, this feed.

Returning None from the generator would preserve this relationship, but forces clients to check whether entries are None when processing them.

Skipping the partial entries breaks the ordinal relationship. There's a work-around: you can still check existence by looking up in the aggregate results.

Since this usage (testing ID existence) seems less likely, I'm inclined to require some dependents to do the latter rather than requiring all projects to do the former.

If this use case turns out to be common, we can parameterize an invalid-entry handler in the Client options, e.g. lambda entry: None, to. override the skipping.

lukasschwab · 2021-08-16T02:30:01Z

Another risk with skipping partial results: doing so may confuse a dependent's length-checking pagination logic.

lukasschwab · 2021-08-18T02:38:30Z

Final consideration:

Skipping the partial entries breaks the ordinal relationship. There's a work-around: you can still check existence by looking up in the aggregate results.

Since this usage (testing ID existence) seems less likely, I'm inclined to require some dependents to do the latter rather than requiring all projects to do the former.

No dependent of this package relies on the ordinal relationship, because any request that would be impacted by this change currently fails. Skipping the results is the least disruptive option.

lukasschwab added the bug Deviations from documented behavior. label Aug 15, 2021

lukasschwab self-assigned this Aug 15, 2021

lukasschwab mentioned this issue Aug 15, 2021

Upgrade arxiv dependency to 1.4.1 leipzig/awesome-reproducible-research#67

Merged

lukasschwab added the api Issues that correspond to arXiv API behavior rather than behavior introduced by this wrapper. label Aug 16, 2021

lukasschwab mentioned this issue Aug 16, 2021

Upgrade arxiv dependency to 1.4.1 temken/comparxiv#18

Merged

lukasschwab changed the title ~~Empty id_list results include an invalid entry~~ AttributeError: nonexistent IDs in id_lists yield invalid entries Aug 16, 2021

lukasschwab mentioned this issue Aug 16, 2021

Skip invalid entries from nonexistent id_list IDs #81

Merged

2 tasks

lukasschwab mentioned this issue Aug 18, 2021

Invalid entries in multi-member ID lists cause entry repetition #82

Open

lukasschwab closed this as completed in #81 Aug 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AttributeError: nonexistent IDs in `id_list`s yield invalid entries #80

AttributeError: nonexistent IDs in `id_list`s yield invalid entries #80

lukasschwab commented Aug 15, 2021

lukasschwab commented Aug 16, 2021 •

edited

Loading

lukasschwab commented Aug 16, 2021

lukasschwab commented Aug 18, 2021

AttributeError: nonexistent IDs in id_lists yield invalid entries #80

AttributeError: nonexistent IDs in id_lists yield invalid entries #80

Comments

lukasschwab commented Aug 15, 2021

Description

Steps to reproduce

Expected behavior

Versions

lukasschwab commented Aug 16, 2021 • edited Loading

lukasschwab commented Aug 16, 2021

lukasschwab commented Aug 18, 2021

AttributeError: nonexistent IDs in `id_list`s yield invalid entries #80

AttributeError: nonexistent IDs in `id_list`s yield invalid entries #80

lukasschwab commented Aug 16, 2021 •

edited

Loading