Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in ingestion server tests: index audio_temporary not found #1059

Closed
obulat opened this issue Mar 29, 2023 · 1 comment
Closed

Error in ingestion server tests: index audio_temporary not found #1059

obulat opened this issue Mar 29, 2023 · 1 comment
Labels
💻 aspect: code Concerns the software code in the repository 🛠 goal: fix Bug fix 🟩 priority: low Low priority and doesn't need to be rushed 🧱 stack: ingestion server Related to the ingestion/data refresh server

Comments

@obulat
Copy link
Contributor

obulat commented Mar 29, 2023

Description

While debugging the CI failure in #904, I saw an error in the logs of the ingestion server test:

integration_ingestion_server_1  | 2023-03-29 04:04:18,316 INFO indexer.py:469 - Index `audio-temporary` was deleted - data refresh complete! :tada:
integration_ingestion_server_1  | 2023-03-29 04:04:18,320 INFO indexer.py:271 - Sending callback request
integration_ingestion_server_1  | 2023-03-29 04:04:18,326 INFO indexer.py:273 - Response: {"message": "OK"}
integration_ingestion_server_1  | 2023-03-29 04:04:18,328 WARNING base.py:288 - PUT http://integration_es:9200/audio-temporary/_settings [status:404 request:0.010s]
integration_ingestion_server_1  | Process Process-10:
integration_ingestion_server_1  | 2023-03-29 04:04:18,330 INFO tasks.py:226 - Task 2331c5050a8b4972b095e652e51c9348 completed.
integration_ingestion_server_1  | Traceback (most recent call last):
integration_ingestion_server_1  |   File "/usr/local/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
integration_ingestion_server_1  |     self.run()
integration_ingestion_server_1  |   File "/usr/local/lib/python3.10/multiprocessing/process.py", line 108, in run
integration_ingestion_server_1  |     self._target(*self._args, **self._kwargs)
integration_ingestion_server_1  |   File "/ingestion_server/ingestion_server/indexer.py", line 259, in refresh
integration_ingestion_server_1  |     self.es.indices.put_settings(
integration_ingestion_server_1  |   File "/venv/lib/python3.10/site-packages/elasticsearch/client/utils.py", line 347, in _wrapped
integration_ingestion_server_1  |     return func(*args, params=params, headers=headers, **kwargs)
integration_ingestion_server_1  |   File "/venv/lib/python3.10/site-packages/elasticsearch/client/indices.py", line 889, in put_settings
integration_ingestion_server_1  |     return self.transport.perform_request(
integration_ingestion_server_1  |   File "/venv/lib/python3.10/site-packages/elasticsearch/transport.py", line 466, in perform_request
integration_ingestion_server_1  |     raise e
integration_ingestion_server_1  |   File "/venv/lib/python3.10/site-packages/elasticsearch/transport.py", line 427, in perform_request
integration_ingestion_server_1  |     status, headers_response, data = connection.perform_request(
integration_ingestion_server_1  |   File "/venv/lib/python3.10/site-packages/elasticsearch/connection/http_requests.py", line 216, in perform_request
integration_ingestion_server_1  |     self._raise_error(response.status_code, raw_data)
integration_ingestion_server_1  |   File "/venv/lib/python3.10/site-packages/elasticsearch/connection/base.py", line 328, in _raise_error
integration_ingestion_server_1  |     raise HTTP_EXCEPTIONS.get(status_code, TransportError)(
integration_ingestion_server_1  | elasticsearch.exceptions.NotFoundError: NotFoundError(404, 'index_not_found_exception', 'no such index [audio-temporary]', audio-temporary, index_or_alias)
integration_ingestion_server_1  | [2023-03-29 04:04:18 +0000] [7] [DEBUG] POST /task

This error is repeated a couple of times. You can see it in the logs for passing CI, too. I suspect that moving to a context manager for multiprocessing during ingestion without calling pool.join() at the end to wait for all tasks to end was failing because of this error.

Reproduction

  1. Open the successful run of the CI/CD: https://github.com/WordPress/openverse/actions/runs/4550084588/jobs/8022800726.
  2. Open the logs for "Print ingestion test logs" step.
  3. See error.

Additional context

I set the priority to low because the tests still pass, but please raise the priority if this is actually causing more problems.

@obulat obulat added 🟩 priority: low Low priority and doesn't need to be rushed 🛠 goal: fix Bug fix 💻 aspect: code Concerns the software code in the repository 🧱 stack: ingestion server Related to the ingestion/data refresh server labels Mar 29, 2023
@github-project-automation github-project-automation bot moved this to 📋 Backlog in Openverse Backlog Mar 29, 2023
dhruvkb pushed a commit that referenced this issue Apr 14, 2023
* Begin re-factor json resource load

* Test new function with yield

* New json function in two more files

* Added get_json_resource() to more test files.

* Updated all tests in provider_api_scripts to use new func

* Use a json_load function that returns a function per pr feedback

* Remove FakeSource.py

* Fix test_flickr.py imports

* Split make_resource_json_func tests into separate file
@sarayourfriend
Copy link
Collaborator

sarayourfriend commented Aug 11, 2023

without calling pool.join() at the end to wait for all tasks to end was failing because of this error.

Fixed in cf9febd. Confirming that I cannot see the same error in recent runs of the ingestion server tests in CI.

@github-project-automation github-project-automation bot moved this from 📋 Backlog to ✅ Done in Openverse Backlog Aug 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💻 aspect: code Concerns the software code in the repository 🛠 goal: fix Bug fix 🟩 priority: low Low priority and doesn't need to be rushed 🧱 stack: ingestion server Related to the ingestion/data refresh server
Projects
Archived in project
Development

No branches or pull requests

2 participants