Add DAG for creating staging indices #3232

stacimc · 2023-10-20T00:51:18Z

Fixes

Description

Note: I recommend reviewing this PR by commit (although note that the first commit was a reverted initial attempt and can be ignored).

This PR adds a recreate_full_staging_index DAG, which can be used to create a new elasticsearch index in the staging cluster.

TODO: I could not find an issue for creating the DAG for the search relevancy project. Double check this and link it.

The DAGs work by connecting to the staging ingestion server. This is much simpler to implement than updating the production ingestion server to optionally connect to the staging elasticsearch cluster, and also avoids many questions about valid connection configurations. The IP proposes using the ingestion server’s REINDEX task to create the index and the ElasticsearchPythonHook to do the remaining steps, but since those steps are all possible to do very easily using the ingestion server, and we’re already using the ingestion server to begin with, I decided to use the ingestion server for everything for ease/speed of implementation. This can of course be iterated on if there’s call to do so.

I changed the behavior and params of this DAG from my original implementation, after re-reviewing the project proposal and IPs. **Expand this section** to see the original description if you're interested.

The DAG does the following in its normal workflow:

Creates a new elasticsearch index for the media type on the staging es cluster

Aliases the index to <media_type>-full, removing that alias from any other indices it may have previously been applied to

Reports to Slack

Optionally, the DAG can be configured with a few params:

point_media_alias: if enabled, the new index will be aliased to the main <media_type> alias (so it will be used by the staging API). If enabled, the index is only aliased to this, not to <media_type>-full.

delete_old_media_index: If enabled, the index which was previously aliased to <media_type> will be deleted after the alias is moved to the new index. If point_media_alias is not enabled, this param has no effect (i.e. we never delete the index which currently is aliased to <media_type>).

Indices created with this DAG using the <media_type>-full alias are not automatically deleted on subsequent runs. See the IP for a full discussion as to why: essentially, we do not want to automate deleting indices that may still be in use for testing.

This DAG has the following conf options:

media_type: media type to be used.
target_alias_override: By default, the DAG will apply the <media_type>-full alias to the newly created index, but optionally you can supply a different value here. For example on a DagRun with media_type set to 'image', you could set target_alias_override to 'image' as well in order to point the main media alias to your new index.
delete_old_index: By default, False. When enabled, the DAG will delete the old index that was previously pointed to by the target alias (if applicable -- if no such index exists, it skips this step).

If the staging_database_restore DAG is running when the DAG starts, it will fail immediately. Conversely, if this DAG is running when staging_database_restore starts, that DAG will wait on this one to complete.

This PR also updates the ingestion server to respect the DATA_REFRESH_LIMIT in the reindex task as well as the full ingest_upstream. (TL;DR: this is an env variable that currently can be set to limit the number of records that are copied into the API table during a data refresh. In this PR, I've updated it so the same limit is respected when building an index.)

It was also necessary to update the ingestion server to consider a task 'active' if it has active workers. This is necessary to fix a bug where an ingestion server task is considered to be in the "errored" state by the TaskStatus, when it schedules some indexer workers and then completes (because in this state, the task is no longer alive but progress has not yet reached 100%). By checking whether there are active workers associated with the task id, we can correctly determine whether the task is actually in an errored state.

Testing Instructions

Run just up on this branch. For testing, you’ll need to access your local ingestion server via Elasticvue. I used audio for tests but you could test some or all of these with image as well.

Test that it connects to the staging ingestion server

Locally we only have one ingestion server, but you can test that it is using the correct connection by first setting the AIRFLOW_CONN_STAGING_DATA_REFRESH env variable in your catalog/.env to something nonsense and verifying that the recreate_full_audio_staging_index fails locally. Then correctly set the variable to http://ingestion_server:8001 for subsequent tests.

Test normal flow

Trigger the DAG locally and do not enable any of the options. It should run successfully, and default to audio. In Elasticvue, you should see a new index with a name like audio-full-20231019t220746 (the timestamp will differ) with the audio-full alias. I will refer to this as index A.

Trigger the DAG with default options a second time. Now in Elasticvue you should see another new index, with a more recent timestamp, which I will call index B. Index B should have the audio-full alias; index A should no longer have the alias, but should not have been deleted.

Test pointing the media alias

First, look at Elasticvue and note the name of the index that currently has the ‘audio’ alias. I will call this Index C. Then trigger the DAG again, but set target_alias_override to 'audio'. Now in Elasticvue you should see:

Index A with no alias
Index B still has the audio-full alias
Index C is still there, but no longer has the audio alias
A new index, Index D, which has the audio alias

Test pointing the media alias and deleting the old one

Trigger the DAG again with params:

{
    "media_type": "audio",
    "target_alias_override": "audio",
    "delete_old_index": true
}

In Elasticvue you should see:

Index A with no alias
Index B still has the audio-full alias
Index C is still there, still with no aliases
Index D has been deleted entirely
A new index, Index E, with the audio alias

Test `delete_old_index` is skipped if no index previously had the target_alias

Trigger the DAG again, with this conf:

{
    "media_type": "audio",
    "target_alias_override": "audio-foo",
    "delete_old_index": true
}

This should create a new index with a target_alias that does not currently exist, but we tell it to delete_old_index anyway. You should see that the trigger_delete_index step is skipped, and the indices look like:

Index A with no alias
Index B, now with no alias
Index C is still there, still with no aliases
Index E, still with the audio alias and not deleted
A new index, Index F, with the audio-full alias

Test creating an index with a limit

In your ingestion_server/.env, set DATA_REFRESH_LIMIT=1000. Now run the DAG again and inspect Elasticvue. You should see that the new index has only 1000 documents (while the others will have 5000, assuming you're working off clean sample data).

I personally also tested that the limit still works with the data refresh by running an audio data refresh and verifying that there were now only 1000 audio records in my local API.

Test concurrency prevention with `staging_database_restore`

To test this, trigger both DAGs in the shell:

$ just catalog/shell
> airflow dags trigger staging_database_restore && airflow dags trigger recreate_full_staging_index

Since staging_database_restore started first, you should see recreate_full_staging_index fail immediately. (Note that the staging_database_restore DAG will pass the first couple of tasks, but most of the others will fail in the local env. That's fine for this test.)

Now try the other direction:

$ just catalog/shell
> airflow dags trigger recreate_full_staging_index && airflow dags trigger staging_database_restore

This time, you should see the staging_database_restore's wait_for_recreate_full_staging_index task go up for reschedule and then pass once the recreate_full_staging_index DAG finishes. (Set DATA_REFRESH_POKE_INTERVAL=5 in your catalog/.env if it's taking a long time.)

Other DAGs

I also ran the data refresh and popularity refresh for audio to ensure they were unaffected.

Deployment plan

In addition to merging this PR, we also need to add the staging ingestion server connection to production Airflow using an Airflow variable. Before using the DAG we'll probably want to add an official DNS name to the staging data refresh server.

To actually add the audio index in staging, we'll:

Merge this PR
Deploy the ingestion server
Add the staging ingestion server connection to production Airflow using an airflow variable
Optionally, set a DATA_REFRESH_LIMIT in staging to limit the number of records to be reindexed (There are currently 2,533,832 records in the audio table on staging.)
Run the recreate_full_audio_staging_index DAG in production Airflow with point_media_alias enabled

Checklist

My pull request has a descriptive title (not a vague title likeUpdate index.md).
My pull request targets the default branch of the repository (main) or a parent feature branch.
My commit messages follow best practices.
My code follows the established code style of the repository.
I added or updated tests for the changes I made (if applicable).
I added or updated documentation (if applicable).
I tried running the project locally and verified that there are no visible errors.
I ran the DAG documentation generator (if applicable).

Developer Certificate of Origin

Developer Certificate of Origin
Version 1.1

Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
1 Letterman Drive
Suite D4700
San Francisco, CA, 94129

Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.


Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I
    have the right to submit it under the open source license
    indicated in the file; or

(b) The contribution is based upon previous work that, to the best
    of my knowledge, is covered under an appropriate open source
    license and I have the right under that license to submit that
    work with modifications, whether created in whole or in part
    by me, under the same open source license (unless I am
    permitted to submit under a different license), as indicated
    in the file; or

(c) The contribution was provided directly to me by some other
    person who certified (a), (b) or (c) and I have not modified
    it.

(d) I understand and agree that this project and the contribution
    are public and that a record of the contribution (including all
    personal information I submit with it, including my sign-off) is
    maintained indefinitely and may be redistributed consistent with
    this project or the open source license(s) involved.

github-actions · 2023-10-20T01:13:37Z

Full-stack documentation: https://docs.openverse.org/_preview/3232

Please note that GitHub pages takes a little time to deploy newly pushed code, if the links above don't work or you see old versions, wait 5 minutes and try again.

You can check the GitHub pages deployment action list to see the current status of the deployments.

Changed files 🔄:

https://docs.openverse.org/_preview/3232/catalog/reference/DAGs.html

sarayourfriend · 2023-10-30T00:57:51Z

For the purposes of the Search Relevancy Sandbox project we should figure out what to do here (if anything), but I think this can be a follow-up PR. For the sake of adding the audio index to staging, we can apply the filtered alias via the rest API if we want to.

There is a milestone with issues related to filtered index creation. The project thread for search sandbox links to it, but I don't know if these issues are actually geared towards this. In particular, I don't know if they explicitly seek to handle filtered index re-creation in this case.

sarayourfriend

@stacimc The issue you've linked in the PR description isn't the right one. I see a note to try and find the right one, so I tried looking and also couldn't find it. Maybe we forgot to create the issues after #2358. This single PR addresses the entire implementation plan (I think?), so we could create the issue now, at least.

A couple of questions for clarification, I might just be forgetting things from the planning discussions, but they might be good to document in the code too:

What's the significance of the "full" suffix? In what sense are those indexes "full" and others not? I'm struggling to connect the dots.
Do we also need to deploy the staging ingestion server before the last step (maybe simultaneously with step 3) of the deployment plan?

I haven't tested this locally yet (will do so after lunch), but I love the approach you've taken. I think it makes heaps of sense to use the staging ingestion server, and gets around some nasty issues with needing to avoid collision with the actual data refresh!

ingestion_server/ingestion_server/utils/config.py

catalog/dags/elasticsearch/recreate_staging_index/recreate_full_staging_index_dag_factory.py

sarayourfriend · 2023-10-30T00:55:03Z

catalog/dags/common/ingestion_server.py

    return SimpleHttpOperator(
        task_id="get_current_index",
-        http_conn_id="data_refresh",
+        http_conn_id=http_conn_id,


Very cool, this whole set of changes!

sarayourfriend

LGTM! Tests well locally.

My one concern is over the parameter naming, as I found it confusing when I was actually testing, mostly because the alias references are non-specific, so I kept mixing up whether it was the -full or non-full alias that I was meant to expect to move. In particular, the fact that the audio-full alias always moves, was initially difficult to wrap my head around, and I could totally see forgetting the intricacies of this behaviour whenever I first use this, probably a little while after reviewing this PR. Clearer explanations of the alias behaviour on the DAG trigger screen would help with this, essentially clarifying that the options are for the "base" media index alias and that the -full alias always moves, no matter what you do.

On that last note: what is the purpose of the -full alias? IIRC the intention from the implementation plan was to make it easier to iterate quickly without having to modify the ES query... but given the abstract and vague nature of "full", I wonder if it doesn't make more sense to have a free-text input that lets you assign a custom alias to add or be managed by the DAG. So that if I'm testing some specific feature, I could pass sara-cool-feature and it would create the alias audio-sara-cool-feature and move that around.

I suppose the question is how to easily identify the index to start with, which I'm realising is now probably the main use case for -full index? If that's the case, would -latest-manual be a better way to convey the purpose?

Anyway, these are just minor points that can be solved in a follow-up issue and I do not want to further delay this PR that's taken so long to get reviews for. If you think there are easy ways to clarify some of these things or like any of my concrete suggestions, I'll leave it up to you whether it's worth implementing them now or whether to create fast follow issues to avoid prolonging this PR. Both are reasonable to me.

stacimc · 2023-10-30T19:45:28Z

The issue you've linked in the PR description isn't the right one

@sarayourfriend Ah thanks -- the issue this addressed is in the infrastructure repo which is why the link did not work (fixed now). It looks like that issue was closed because the index was just added in staging manually while this PR has been stuck waiting for review. The priority can probably be lowered now.

What's the significance of the "full" suffix? In what sense are those indexes "full" and others not? I'm struggling to connect the dots.

This comes from the IP and I believe is to distinguish it from smaller proportional indices, which the IP proposes adding in a separate DAG.

My one concern is over the parameter naming, as I found it confusing when I was actually testing, mostly because the alias references are non-specific, so I kept mixing up whether it was the -full or non-full alias that I was meant to expect to move. In particular, the fact that the audio-full alias always moves, was initially difficult to wrap my head around, and I could totally see forgetting the intricacies of this behaviour whenever I first use this, probably a little while after reviewing this PR. Clearer explanations of the alias behaviour on the DAG trigger screen would help with this, essentially clarifying that the options are for the "base" media index alias and that the -full alias always moves, no matter what you do.

The index names and alias behavior is all taken from the IP without change, but I do think the suggestion for -latest-manual is more immediately intuitive. I think the -full will make more sense when the DAG for proportional indices is added, though. A custom alias taken from a DAG conf option definitely sounds like the best of all worlds to me, although I need to re-review the IP to understand why that wasn't the approach suggested.

I found a lot of additional commentary in the comments on the IP. I think the intention of having a simple -full suffix was just to stop people from needing to type out full timestamps, so the custom alias should be fine (and is in my opinion preferable 😄). I'd like to add that in a follow-up so it can be discussed separately, since it deviates from what was planned.

In particular, the fact that the audio-full alias always moves, was initially difficult to wrap my head around

The audio-full alias doesn't always move, though. The DAG creates a new index, and then either moves/applies the -full alias (by default), or the media alias (if point_media_alias is enabled). Let me see if I can make that clearer in the param descriptions!

stacimc · 2023-10-30T19:47:53Z

Drafting this momentarily since the issue was closed.

sarayourfriend · 2023-10-30T22:08:03Z

I believe is to distinguish it from smaller proportional indices

Gotcha! You're right that it makes more sense in context 👍

This reverts commit 4840acf.

stacimc · 2023-11-22T19:44:07Z

catalog/dags/es/recreate_staging_index/recreate_full_staging_index.py

+
+
+@task(retries=0)
+def prevent_concurrency_with_staging_database_restore(**context):


I am going to make some sensor utils for doing these steps (here and in the staging_database_restore), since this could already be useful in several places. I did not want to increase the complexity of this PR so I'll be doing that in a follow up.

stacimc · 2023-11-22T19:55:10Z

After coming back to this PR, I've made some major changes (so re-requesting review from @sarayourfriend):

Prevented concurrency with staging_database_restore
Simplified the behavior and options for the DAG. The description has been updated, but tl;dr it now takes an arbitrary target_alias_override as Sara suggested, so it can be much more flexible (but still defaults to <media_type>-full). It also accepts a delete_old_index param (default false) which will delete the index being replaced if applicable.

All of the same use cases are possible (default behavior of the DAG if you don't update any options is to just create the new <media_type>-full-aliased index, but you can also use it to point to and replace the old <media_type> index), but this should be both more flexible and less confusing.

krysal · 2023-11-28T00:07:37Z

@stacimc I'll review this tomorrow.

openverse-bot · 2023-11-29T00:00:10Z

Based on the high urgency of this PR, the following reviewers are being gently reminded to review this PR:

@krysal
@AetherUnbound
@sarayourfriend
This reminder is being automatically generated due to the urgency configuration.

Excluding weekend¹ days, this PR was ready for review 4 day(s) ago. PRs labelled with high urgency are expected to be reviewed within 2 weekday(s)².

@stacimc, if this PR is not ready for a review, please draft it to prevent reviewers from getting further unnecessary pings.

Specifically, Saturday and Sunday. ↩
For the purpose of these reminders we treat Monday - Friday as weekdays. Please note that the operation that generates these reminders runs at midnight UTC on Monday - Friday. This means that depending on your timezone, you may be pinged outside of the expected range. ↩

krysal

Worked perfectly! Tried first the case without any base audio index, then creating another, and then one more run deleting of the old index. Very nice! 💯

stacimc · 2023-12-04T18:06:00Z

Pinging @sarayourfriend and @AetherUnbound -- I think the review reminder bot isn't working because Sara approved the original version of this PR. I cleared that review because the implementation changed considerably (summary in this comment).

sarayourfriend · 2023-12-05T05:12:04Z

Apologies, I started reviewing this yesterday and ran out of time. It LGTM code wise, I just haven't run it locally yet..

AetherUnbound

The code all looks great, I love how easy some of these compositions are becoming and the new slack utility task!

I'm running into an issue though where subsequent reruns fail at the trigger_reindex step 😕

Logs:

[2023-12-05, 05:09:17 UTC] {http.py:143} INFO - Calling HTTP method
[2023-12-05, 05:09:17 UTC] {base.py:73} INFO - Using connection ID 'staging_data_refresh' for task execution.
[2023-12-05, 05:09:17 UTC] {http.py:178} ERROR - HTTP error: Internal Server Error
[2023-12-05, 05:09:17 UTC] {http.py:179} ERROR - {"message": "Failed to schedule task due to an internal server error. Check scheduler logs."}
[2023-12-05, 05:09:17 UTC] {taskinstance.py:1937} ERROR - Task failed with exception
Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.10/site-packages/airflow/providers/http/hooks/http.py", line 176, in check_response
    response.raise_for_status()
  File "/home/airflow/.local/lib/python3.10/site-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: http://ingestion_server:8001/task
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.10/site-packages/airflow/providers/http/operators/http.py", line 145, in execute
    response = http.run(self.endpoint, self.data, self.headers, self.extra_options)
  File "/home/airflow/.local/lib/python3.10/site-packages/airflow/providers/http/hooks/http.py", line 166, in run
    return self.run_and_check(session, prepped_request, extra_options)
  File "/home/airflow/.local/lib/python3.10/site-packages/airflow/providers/http/hooks/http.py", line 217, in run_and_check
    self.check_response(response)
  File "/home/airflow/.local/lib/python3.10/site-packages/airflow/providers/http/hooks/http.py", line 180, in check_response
    raise AirflowException(str(response.status_code) + ":" + response.reason)
airflow.exceptions.AirflowException: 500:Internal Server Error

The ingestion server logs don't give any indication that something's wrong:

ingestion_server-1  | [2023-12-05 05:06:39,796 - elastic_transport.transport - 335][INFO] POST http://es:9200/audio-full-20231205t050631/_refresh [status:200 duration:0.136s]
ingestion_server-1  | [2023-12-05 05:06:39,819 - elastic_transport.transport - 335][INFO] PUT http://es:9200/audio-full-20231205t050631/_settings [status:200 duration:0.022s]
ingestion_server-1  | [2023-12-05 05:09:15,345 - gunicorn.error - 281][DEBUG] GET /stat/audio-full
ingestion_server-1  | [2023-12-05 05:09:15,349 - elastic_transport.transport - 335][INFO] GET http://es:9200/ [status:200 duration:0.002s]
ingestion_server-1  | [2023-12-05 05:09:15,350 - elastic_transport.transport - 335][INFO] GET http://es:9200/audio-full [status:200 duration:0.001s]
ingestion_server-1  | [2023-12-05 05:09:15,351 - gunicorn.access - 363][INFO] 172.20.0.12 - - [05/Dec/2023:05:09:15 +0000] "GET /stat/audio-full HTTP/1.1" 200 77 "-" "python-requests/2.31.0"
ingestion_server-1  | [2023-12-05 05:09:17,075 - gunicorn.error - 281][DEBUG] POST /task
ingestion_server-1  | [2023-12-05 05:09:17,088 - elastic_transport.transport - 335][INFO] GET http://es:9200/ [status:200 duration:0.003s]
ingestion_server-1  | [2023-12-05 05:09:17,088 - root - 298][INFO] Creating index audio-full-20231205t050907 for model audio from table audio.
ingestion_server-1  | [2023-12-05 05:09:17,151 - elastic_transport.transport - 335][INFO] PUT http://es:9200/audio-full-20231205t050907 [status:200 duration:0.062s]
ingestion_server-1  | [2023-12-05 05:09:17,151 - root - 307][INFO] Running distributed index using indexer workers.
ingestion_server-1  | [2023-12-05 05:09:17,157 - root - 119][INFO] Checking http://172.20.0.6:8002/healthcheck. . .
ingestion_server-1  | [2023-12-05 05:09:17,162 - root - 132][INFO] http://172.20.0.6:8002/healthcheck passed healthcheck
ingestion_server-1  | [2023-12-05 05:09:17,163 - root -  65][INFO] Assigning job {'model_name': 'audio', 'table_name': 'audio', 'start_id': 0, 'end_id': 5000, 'target_index': 'audio-full-20231205t050907'} to http://172.20.0.6:8002
ingestion_server-1  | [2023-12-05 05:09:17,170 - root - 274][INFO] Task 6e58fe46b3e247b3acaee0dfc93f1891 completed.
ingestion_server-1  | [2023-12-05 05:09:17,184 - gunicorn.access - 363][INFO] 172.20.0.12 - - [05/Dec/2023:05:09:17 +0000] "POST /task HTTP/1.1" 500 93 "-" "python-requests/2.31.0"
ingestion_server-1  | [2023-12-05 05:09:17,731 - gunicorn.error - 281][DEBUG] POST /worker_finished
ingestion_server-1  | [2023-12-05 05:09:17,732 - root - 115][INFO] Received worker_finished signal from 172.20.0.6
ingestion_server-1  | [2023-12-05 05:09:17,733 - root - 284][INFO] All indexer workers succeeded! New index: audio-full-20231205t050907
ingestion_server-1  | [2023-12-05 05:09:17,735 - elastic_transport.transport - 335][INFO] GET http://es:9200/ [status:200 duration:0.002s]
ingestion_server-1  | [2023-12-05 05:09:17,739 - gunicorn.access - 363][INFO] 172.20.0.6 - - [05/Dec/2023:05:09:17 +0000] "POST /worker_finished HTTP/1.1" 200 0 "-" "python-requests/2.31.0"
ingestion_server-1  | [2023-12-05 05:09:17,894 - elastic_transport.transport - 335][INFO] POST http://es:9200/audio-full-20231205t050907/_refresh [status:200 duration:0.154s]
ingestion_server-1  | [2023-12-05 05:09:17,914 - elastic_transport.transport - 335][INFO] PUT http://es:9200/audio-full-20231205t050907/_settings [status:200 duration:0.020s]

In fact, the new indices are made and aliased even:

Do you know why that might be failing?

AetherUnbound · 2023-12-05T02:39:54Z

catalog/dags/es/recreate_staging_index/recreate_full_staging_index.py

+    config = DATA_REFRESH_CONFIGS.get(media_type)
+    data_refresh_timeout = config.data_refresh_timeout if config else timedelta(days=1)


Do we need to make this code resilient? Would it make sense to simplify this with the assumption that media_type is going to be image or audio only?

sarayourfriend

LGTM! Tested locally and it all works great 👍 I made some assumptions about the instructions, places where it seemed like they hadn't been updated from the previous instructions, but according to my current understanding of this feature, it works the way I would exect it to. I did a bunch of different configurations when running the new DAG and it always behaved precisely the way I thought it would.

stacimc · 2023-12-05T19:44:36Z

I'm running into an issue though where subsequent reruns fail at the trigger_reindex step 😕

@AetherUnbound Can you give more details about what you mean by subsequent reruns here?

If you entirely cleared a previous, successful dagrun (one that had successfully created a new index), it will attempt to create an index with the same index name and fail (although I do see the resource_already_exists_exception in my ingestion server logs when I do so). Wondering if that's possibly what you encountered? In that case the new index does exist, but it was created as part of the initial successful run.

AetherUnbound · 2023-12-05T20:11:28Z

@AetherUnbound Can you give more details about what you mean by subsequent reruns here?

Sure! I tried to convey it with my screenshot, but I triggered one run with default settings, let that finish, and then tried to trigger another run with default settings (as in, I didn't change anything when triggering the run). Then it failed at that step, was I supposed to trigger with something else?

stacimc · 2023-12-05T20:55:59Z

Sure! I tried to convey it with my screenshot, but I triggered one run with default settings, let that finish, and then tried to trigger another run with default settings (as in, I didn't change anything when triggering the run). Then it failed at that step, was I supposed to trigger with something else?

Gotcha! That's how I initially interpreted it but it worked for me so I thought I might've misunderstood 🤔 I've tried again to be certain and am able to trigger multiple runs one after the other with default settings; I also tried identical non-default settings and that also worked for me. I have no idea what could cause that, especially since you say it appears to work, and you only see the behavior in the UI? How bizarre!

sarayourfriend · 2023-12-05T21:58:16Z

FWIW, in my testing I was also able to run the DAG multiple times with the default settings.

AetherUnbound · 2023-12-05T23:00:54Z

Okay, I just rebuilt the images and tried again - same thing! I have no idea what's happening 🤷🏼‍♀️ Probably a mismatched config on my end. If other folks aren't seeing this then LGTM!

stacimc · 2023-12-06T18:51:19Z

Okay, I just rebuilt the images and tried again - same thing! I have no idea what's happening

Not important, but @AetherUnbound if you get a minute I'd be curious if you have similar problems when re-running any other DAGs that deal with indices, like the data refreshes or the create_filtered_index DAGs 🤔

AetherUnbound · 2023-12-06T19:22:32Z

I just tried, and I'm able to run the data refresh DAGs locally just fine!! So strange

stacimc self-assigned this Oct 20, 2023

github-actions bot added the 🧱 stack: documentation Related to Sphinx documentation label Oct 20, 2023

stacimc marked this pull request as ready for review October 20, 2023 17:46

stacimc requested review from a team as code owners October 20, 2023 17:46

stacimc requested review from fcoveram, sarayourfriend, krysal and AetherUnbound and removed request for fcoveram October 20, 2023 17:46

sarayourfriend reviewed Oct 30, 2023

View reviewed changes

sarayourfriend approved these changes Oct 30, 2023

View reviewed changes

stacimc marked this pull request as draft October 30, 2023 19:47

AetherUnbound mentioned this pull request Nov 15, 2023

Search relevancy sandbox #392

Closed

16 tasks

stacimc added 4 commits November 15, 2023 15:27

Optionally override ES environment in ingestion server

0b39ed3

Revert "Optionally override ES environment in ingestion server"

67f66d6

This reverts commit 4840acf.

Add staging ingestion server connection

92d5d87

Make ingestion server utilities accept optional http_conn_id

380e59b

stacimc force-pushed the add/create-staging-index-dag branch from 569702e to e49dca1 Compare November 22, 2023 02:45

stacimc added 2 commits November 21, 2023 19:00

Fix imports to prevent errors when filling DagBag

aab602f

Update dag docs

a64bcfb

stacimc force-pushed the add/create-staging-index-dag branch from f318d62 to a64bcfb Compare November 22, 2023 18:07

Avoid module name conflicts

1f9d0a6

stacimc commented Nov 22, 2023

View reviewed changes

stacimc requested a review from sarayourfriend November 22, 2023 19:55

stacimc marked this pull request as ready for review November 22, 2023 19:55

krysal approved these changes Nov 29, 2023

View reviewed changes

AetherUnbound linked an issue Dec 4, 2023 that may be closed by this pull request

Build the ES full index recreation DAG #3246

Closed

AetherUnbound reviewed Dec 5, 2023

View reviewed changes

sarayourfriend approved these changes Dec 5, 2023

View reviewed changes

AetherUnbound approved these changes Dec 5, 2023

View reviewed changes

stacimc merged commit ee77ef5 into main Dec 6, 2023
53 checks passed

stacimc deleted the add/create-staging-index-dag branch December 6, 2023 18:48

This was referenced Dec 7, 2023

Add external dag sensor utilities #3482

Merged

Set up staging ingestion server connection in Airflow #3492

Closed

Fix incorrect default production data refresh limit of 0 #3515

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add DAG for creating staging indices #3232

Add DAG for creating staging indices #3232

stacimc commented Oct 20, 2023 •

edited

Loading

github-actions bot commented Oct 20, 2023

sarayourfriend commented Oct 30, 2023

sarayourfriend left a comment

sarayourfriend Oct 30, 2023

sarayourfriend left a comment

stacimc commented Oct 30, 2023

stacimc commented Oct 30, 2023

sarayourfriend commented Oct 30, 2023

stacimc Nov 22, 2023 •

edited

Loading

stacimc commented Nov 22, 2023

krysal commented Nov 28, 2023

openverse-bot commented Nov 29, 2023

krysal left a comment

stacimc commented Dec 4, 2023 •

edited

Loading

sarayourfriend commented Dec 5, 2023

AetherUnbound left a comment

AetherUnbound Dec 5, 2023

sarayourfriend left a comment •

edited

Loading

stacimc commented Dec 5, 2023

AetherUnbound commented Dec 5, 2023

stacimc commented Dec 5, 2023

sarayourfriend commented Dec 5, 2023

AetherUnbound commented Dec 5, 2023

stacimc commented Dec 6, 2023

AetherUnbound commented Dec 6, 2023



		@task(retries=0)
		def prevent_concurrency_with_staging_database_restore(**context):

		config = DATA_REFRESH_CONFIGS.get(media_type)
		data_refresh_timeout = config.data_refresh_timeout if config else timedelta(days=1)

Add DAG for creating staging indices #3232

Add DAG for creating staging indices #3232

Conversation

stacimc commented Oct 20, 2023 • edited Loading

Fixes

Description

Testing Instructions

Test that it connects to the staging ingestion server

Test normal flow

Test pointing the media alias

Test pointing the media alias and deleting the old one

Test delete_old_index is skipped if no index previously had the target_alias

Test creating an index with a limit

Test concurrency prevention with staging_database_restore

Other DAGs

Deployment plan

Checklist

Developer Certificate of Origin

github-actions bot commented Oct 20, 2023

sarayourfriend commented Oct 30, 2023

sarayourfriend left a comment

Choose a reason for hiding this comment

sarayourfriend Oct 30, 2023

Choose a reason for hiding this comment

sarayourfriend left a comment

Choose a reason for hiding this comment

stacimc commented Oct 30, 2023

stacimc commented Oct 30, 2023

sarayourfriend commented Oct 30, 2023

stacimc Nov 22, 2023 • edited Loading

Choose a reason for hiding this comment

stacimc commented Nov 22, 2023

krysal commented Nov 28, 2023

openverse-bot commented Nov 29, 2023

Footnotes

krysal left a comment

Choose a reason for hiding this comment

stacimc commented Dec 4, 2023 • edited Loading

sarayourfriend commented Dec 5, 2023

AetherUnbound left a comment

Choose a reason for hiding this comment

AetherUnbound Dec 5, 2023

Choose a reason for hiding this comment

sarayourfriend left a comment • edited Loading

Choose a reason for hiding this comment

stacimc commented Dec 5, 2023

AetherUnbound commented Dec 5, 2023

stacimc commented Dec 5, 2023

sarayourfriend commented Dec 5, 2023

AetherUnbound commented Dec 5, 2023

stacimc commented Dec 6, 2023

AetherUnbound commented Dec 6, 2023

stacimc commented Oct 20, 2023 •

edited

Loading

Test `delete_old_index` is skipped if no index previously had the target_alias

Test concurrency prevention with `staging_database_restore`

stacimc Nov 22, 2023 •

edited

Loading

stacimc commented Dec 4, 2023 •

edited

Loading

sarayourfriend left a comment •

edited

Loading