Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WordPress Photo Directory provider script does not stop #1372

Closed
1 task
AetherUnbound opened this issue Nov 1, 2022 · 0 comments · Fixed by WordPress/openverse-catalog#916
Closed
1 task
Assignees
Labels
💻 aspect: code Concerns the software code in the repository 🛠 goal: fix Bug fix 🟧 priority: high Stalls work on the project or its dependents 🐍 tech: python Involves Python

Comments

@AetherUnbound
Copy link
Collaborator

Description

After the refactor performed in WordPress/openverse-catalog#835, the WordPress Photo Directory provider script no longer stops attempting to process data after hitting the last page. A recent run with skip_ingestion_errors (which was run as a result of #1373) caught the script in an infinite loop which kept it iterating over pages which did not exist. We ended up forcing the run to fail and had a 16MB log file with the following lines repeated thousands of times:

[2022-11-01T16:18:32.116+0000] {requester.py:75} WARNING - Unable to request URL: https://wordpress.org/photos/wp-json/wp/v2/photos?format=json&page=50&per_page=100&_embed=true  Status code: 400
[2022-11-01T16:18:32.116+0000] {requester.py:121} WARNING - Bad response_json:  None
[2022-11-01T16:18:32.116+0000] {requester.py:122} WARNING - Retrying:
_get_response_json(
    https://wordpress.org/photos/wp-json/wp/v2/photos,
    {'format': 'json', 'page': 50, 'per_page': 100, '_embed': 'true'},
    retries=2)

The response from that URL is:

{
    "code": "rest_post_invalid_page_number",
    "message": "The page number requested is larger than the number of pages available.",
    "data": {
        "status": 400
    }
}

It looks like there was some logic for calculating total available pages which were not carried over from the refactor:

https://github.com/WordPress/openverse-catalog/pull/835/files#diff-b98978a812fb92894554fca75f318f598e0135bfda7f2d2c8397f7ae5c5d8fe8L83

Reproduction

  1. Trigger the DAG with the config: {"init_query_params": {'format': 'json', 'page': 49, 'per_page': 100, '_embed': 'true'}}
  2. Observe that after one page, the DAG run fails on the above error

Additional context

Resolution

  • 🙋 I would be interested in resolving this bug.
@AetherUnbound AetherUnbound added 🐍 tech: python Involves Python 💻 aspect: code Concerns the software code in the repository 🛠 goal: fix Bug fix 🟧 priority: high Stalls work on the project or its dependents labels Nov 1, 2022
@stacimc stacimc self-assigned this Dec 9, 2022
@github-project-automation github-project-automation bot moved this to 📋 Backlog in Openverse Backlog Apr 17, 2023
@obulat obulat transferred this issue from WordPress/openverse-catalog Apr 17, 2023
@obulat obulat moved this from 📋 Backlog to ✅ Done in Openverse Backlog Apr 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💻 aspect: code Concerns the software code in the repository 🛠 goal: fix Bug fix 🟧 priority: high Stalls work on the project or its dependents 🐍 tech: python Involves Python
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants