Skip to content
This repository has been archived by the owner on Feb 22, 2023. It is now read-only.

Rename indices back to original during data refresh #415

Merged
merged 19 commits into from
Dec 20, 2021

Conversation

AetherUnbound
Copy link
Contributor

Fixes

Fixes #362 by @AetherUnbound

Description

This PR modifies the ingestion server so that it renames the indices of the target table back to the original during the go-live table swap. Previously this was not occurring, and so every subsequent run would add a temp_import_ prefix to the target table (ultimately this would result in something like temp_import_temp_import_temp_import_unique_url_idx). The ingestion server now retains information about the original index names and uses them to rename the indices after the old target table has been removed. The integration tests will now also verify that the indices are unchanged after a data refresh.

Indices must unique globally across a database, but constraints need not be. While no logic change is necessary, I've also added a constraint check during the integration test similar to the index check.

Since I added volumes in #408, we want to make sure that we remove them for integration testing (even though we're performing docker-compose down -v and removing volumes, better safe than sorry!). I've added that step to the integration test setup. The tests are ordered and they're kind of dependent on the previous test succeeding, so I've also added a fail-fast option for when pytest is run.

I also simplified the integration tests since the INGEST_UPSTREAM tests between image/audio shared so much common logic.

Lastly, I added logging of the go-live query before it's actually run. That way we can make debugging easier for ourselves on this next run 😄

Testing Instructions

just ing-testlocal! It's been flaky sometimes for me so let me know if the first data refresh test times out.

Checklist

  • My pull request has a descriptive title (not a vague title like Update index.md).
  • My pull request targets the default branch of the repository (main) or a parent feature branch.
  • My commit messages follow best practices.
  • My code follows the established code style of the repository.
  • I added or updated tests for the changes I made (if applicable).
  • I added or updated documentation (if applicable).
  • I tried running the project locally and verified that there are no visible errors.

Developer Certificate of Origin

Developer Certificate of Origin
Developer Certificate of Origin
Version 1.1

Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
1 Letterman Drive
Suite D4700
San Francisco, CA, 94129

Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.


Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I
    have the right to submit it under the open source license
    indicated in the file; or

(b) The contribution is based upon previous work that, to the best
    of my knowledge, is covered under an appropriate open source
    license and I have the right under that license to submit that
    work with modifications, whether created in whole or in part
    by me, under the same open source license (unless I am
    permitted to submit under a different license), as indicated
    in the file; or

(c) The contribution was provided directly to me by some other
    person who certified (a), (b) or (c) and I have not modified
    it.

(d) I understand and agree that this project and the contribution
    are public and that a record of the contribution (including all
    personal information I submit with it, including my sign-off) is
    maintained indefinitely and may be redistributed consistent with
    this project or the open source license(s) involved.

@AetherUnbound AetherUnbound added 🟥 priority: critical Must be addressed ASAP 🛠 goal: fix Bug fix 💻 aspect: code Concerns the software code in the repository python labels Dec 17, 2021
@AetherUnbound AetherUnbound requested review from a team, zackkrida and sarayourfriend and removed request for a team December 17, 2021 03:06
@sarayourfriend
Copy link
Contributor

I tried testing this locally but I can't get the tests to pass 🙁 It seems to be an issue with the tests not being able to hit the upstream db successfully:

integration_ingestion_server_1  | 2021-12-17 12:31:33,573 INFO ingest.py:74 - Shared columns: ['last_synced_with_source', 'removed_from_source', 'tags', 'width', 'thumbnail', 'tags_list', 'updated_on', 'creator', 'meta_data', 'license_version', 'filetype', 'height', 'foreign_landing_url', 'title', 'foreign_identifier', 'identifier', 'creator_url', 'watermarked', 'provider', 'standardized_popularity', 'source', 'filesize', 'url', 'id', 'view_count', 'created_on', 'license']
integration_ingestion_server_1  | 2021-12-17 12:31:33,573 INFO ingest.py:280 - (Re)initializing foreign data wrapper
integration_ingestion_server_1  | Process Task-1:
integration_ingestion_server_1  | Traceback (most recent call last):
integration_ingestion_server_1  |   File "/usr/local/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
integration_ingestion_server_1  |     self.run()
integration_ingestion_server_1  |   File "/ingestion_server/ingestion_server/tasks.py", line 110, in run
integration_ingestion_server_1  |     reload_upstream(self.model)
integration_ingestion_server_1  |   File "/ingestion_server/ingestion_server/ingest.py", line 289, in reload_upstream
integration_ingestion_server_1  |     downstream_cur.execute(init_fdw)
integration_ingestion_server_1  | psycopg2.errors.SqlclientUnableToEstablishSqlconnection: could not connect to server "upstream"
integration_ingestion_server_1  | DETAIL:  could not translate host name "upstream_db" to address: Name does not resolve

Not sure how to resolve this 🤔

@AetherUnbound
Copy link
Contributor Author

Hmm, that sounds really similar to #364 @sarayourfriend - do you happen to have a .env file in the ingestion_server folder? If not, do you mind sharing the contents of integration_server/integration-docker-compose.yml?

@AetherUnbound
Copy link
Contributor Author

It looks like this is also having an issue I was seeing intermittently locally where the just init step would hang on creating the image index. Looking into this more

@AetherUnbound AetherUnbound requested a review from a team as a code owner December 17, 2021 23:08
@zackkrida
Copy link
Member

fwiw I'm experiencing the same hanging, Waiting for index 'image' to be ready... issue.

@AetherUnbound
Copy link
Contributor Author

I've identified the root cause and I'm addressing it now!

@AetherUnbound
Copy link
Contributor Author

Woohoo, this is finally ready!

Copy link
Member

@dhruvkb dhruvkb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very good. Appreciate the improvements and cleanups for the integration tests.

@@ -67,6 +67,26 @@ def _fixup_env(conf: dict):
conf["services"]["ingestion_server"]["env_file"] = ["env.integration"]


def _remove_volumes(conf: dict):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very useful, thanks!

@krysal
Copy link
Member

krysal commented Dec 20, 2021

I am getting the same error as Sara and the content of my ingestion_server/test/integration-docker-compose.yml file is:

Auto-generated Docker Compose
# This is an auto-generated Docker Compose configuration file.
# Do not modify this file directly. Your changes will be overwritten.

services:
  integration_db:
    env_file:
    - /Users/krysal/Codes/openverse-api/postgres/env.docker
    healthcheck:
      test: pg_isready -U deploy -d openledger
    image: postgres:13.2-alpine
    ports:
    - 65432:5432
    volumes: []
  integration_es:
    environment:
    - xpack.security.enabled=false
    - discovery.type=single-node
    healthcheck:
      interval: 10s
      retries: 10
      test:
      - CMD-SHELL
      - curl -si -XGET 'localhost:9200/_cluster/health?pretty' | grep -qE 'yellow|green'
      timeout: 60s
    image: docker.elastic.co/elasticsearch/elasticsearch:7.12.0
    mem_limit: ${ES_MEM_LIMIT:-4294967296}
    ports:
    - 60200:9200
    ulimits:
      nofile:
        hard: 65536
        soft: 65536
    volumes: []
  integration_ingestion_server:
    build: ../
    command: bash -c 'sleep 20 && supervisord -c config/supervisord.conf'
    depends_on:
    - integration_db
    - integration_es
    env_file:
    - env.integration
    image: ingestion_server
    ports:
    - 60001:8001
    stdin_open: true
    tty: true
    volumes:
    - ../:/ingestion_server
  integration_upstream_db:
    env_file:
    - /Users/krysal/Codes/openverse-api/postgres/env.docker
    healthcheck:
      test: pg_isready -U deploy -d openledger
    image: postgres:13.2-alpine
    ports:
    - 65433:5432
    volumes:
    - ./mock_data:/mock_data
version: '2.4'
volumes: {}

Edit: I spoke too fast, the problem was indeed the presence of the .env file! Tests are passing ✅

Copy link
Member

@krysal krysal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything seems to work correctly. Thanks for the detailed explanation and the order in the tests ⭐️

It would be a good idea to add a note for the .env file with these tests somewhere (the README maybe?) so as not to forget it.

ingestion_server/ingestion_server/ingest.py Outdated Show resolved Hide resolved
Comment on lines +1 to +4
# Mock schemas

Files in this directory were created using the following command:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, it is good to know the origin and have the query at hand 👍

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
💻 aspect: code Concerns the software code in the repository 🛠 goal: fix Bug fix 🟥 priority: critical Must be addressed ASAP
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Indices are not renamed after go-live swap
5 participants