Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation plan: Ingestion server removal #4026

Merged
merged 15 commits into from
Apr 17, 2024

Conversation

stacimc
Copy link
Collaborator

@stacimc stacimc commented Apr 3, 2024

Fixes

Fixes #3981 by @stacimc

Description

This PR adds the implementation plan for removing the ingestion server, and moving the data refresh into Airflow.

Reviewers

I have requested reviews from @sarayourfriend and @AetherUnbound. @sarayourfriend's deep knowledge of the infrastructure, and in particular the ongoing work to the catalog deployments, would be extremely beneficial. @AetherUnbound also has expertise in our infrastructure and catalog, and in orchestrating similar distributed tasks using Airflow.

Testing Instructions

Please read the IP following the decision-making process defined below.

Decision-making process

This discussion is following the Openverse decision-making process. Information about this process can be found on the Openverse documentation site. Requested reviewers or participants will be following this process. If you are being asked to give input on a specific detail, you do not need to familiarise yourself with the process and follow it.

Current round

This discussion is currently in the Decision round. The deadline for review of this round is April 15 2024. Note the deadline would typically be April 16, but is extended to account for my own AFK.

Checklist

  • My pull request has a descriptive title (not a vague title likeUpdate index.md).
  • My pull request targets the default branch of the repository (main) or a parent feature branch.
  • My commit messages follow best practices.
  • My code follows the established code style of the repository.
  • I added or updated tests for the changes I made (if applicable).
  • I added or updated documentation (if applicable).
  • I tried running the project locally and verified that there are no visible errors.
  • I ran the DAG documentation generator (if applicable).

Developer Certificate of Origin

Developer Certificate of Origin
Developer Certificate of Origin
Version 1.1

Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
1 Letterman Drive
Suite D4700
San Francisco, CA, 94129

Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.


Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I
    have the right to submit it under the open source license
    indicated in the file; or

(b) The contribution is based upon previous work that, to the best
    of my knowledge, is covered under an appropriate open source
    license and I have the right under that license to submit that
    work with modifications, whether created in whole or in part
    by me, under the same open source license (unless I am
    permitted to submit under a different license), as indicated
    in the file; or

(c) The contribution was provided directly to me by some other
    person who certified (a), (b) or (c) and I have not modified
    it.

(d) I understand and agree that this project and the contribution
    are public and that a record of the contribution (including all
    personal information I submit with it, including my sign-off) is
    maintained indefinitely and may be redistributed consistent with
    this project or the open source license(s) involved.

@stacimc stacimc added 🟨 priority: medium Not blocking but should be addressed soon 📄 aspect: text Concerns the textual material in the repository 🧰 goal: internal improvement Improvement that benefits maintainers, not users 🧱 stack: ingestion server Related to the ingestion/data refresh server 🧱 stack: catalog Related to the catalog and Airflow DAGs 🧱 stack: documentation Related to Sphinx documentation 🧱 stack: infra Related to the Terraform config and other infrastructure labels Apr 3, 2024
@stacimc stacimc self-assigned this Apr 3, 2024
@stacimc stacimc requested a review from a team as a code owner April 3, 2024 18:54
@stacimc stacimc requested review from fcoveram, sarayourfriend and AetherUnbound and removed request for fcoveram April 3, 2024 18:54
Copy link

github-actions bot commented Apr 3, 2024

Full-stack documentation: https://docs.openverse.org/_preview/4026

Please note that GitHub pages takes a little time to deploy newly pushed code, if the links above don't work or you see old versions, wait 5 minutes and try again.

You can check the GitHub pages deployment action list to see the current status of the deployments.

New files ➕:

Copy link
Collaborator

@sarayourfriend sarayourfriend left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is soooo exciting! I left a very long comment with a suggestion for an important difference we should ideally take to simplify deployments and maintenance, as well as potentially simplify the DAGs (depending on whether boto3 in a plain Python operator is considered "simpler" than the AWS Airflow operators).

I also took the time to write out the differences in hourly cost for EC2 and ECS, which you can include in the document as an appendix, if you'd like to record evidence for the cost difference. Based on the resource consumption numbers you mentioned, I also looked to see if we could reduce the allocated resources in either approach and compare the costs. EC2 would not work for us here, because anything with less vCPU cuts the RAM in half, with the relevant EC2 instances, and there's nothing smaller than the m5.xlarge that would keep enough RAM to be comfortably above the 52% of 16 GB consumption we saw.

ECS is more flexible, and we can scale down the amount of RAM allocated to each task, but because we still need at least 50% of

EC2 m5.xlarge
0.192/hour
4 vCPU
16 GiB

ECS hourly costs:

  • 0.04048/hour/vCPU
  • 0.004445/hour/GB

ECS direct comparison:
4 vCPU = 0.16192/hour
16 GB = 0.07112/hour
total = 0.23304/hour

ECS lower RAM:
4 vCPU = 0.16192/hour
10 GB = 0.04445/hour
total = 0.20637/hour

ECS lower RAM and CPU:
2 vCPU = 0.08096/hour
10 GB = 0.04445/hour
total = 0.12541/hour

So there is potentially a way to make ECS less expensive, simply by scaling down the resources we allocate for each task to try to achieve as high utilisation as possible from each one. However, we have the opportunity for much more significant cost savings if we re-write the indexer worker in a language that has tighter memory requirements than Python, by leveraging disk instead of RAM for task storage on the worker, or potentially even just by refactoring the existing Python code to be more consciencious of memory usage. If we can cut memory reliably below 50% consumption, a m4.large with 2 vCPU and 8 GB costs a mere 0.10/hour, cutting costs nearly in half from the m5.xlarge.

Even if ECS could be less expensive now, optimising the indexer workers to maximise resource utilisation with EC2 will always result in cheaper running costs than ECS, vCPU-per-vCPU and GB-per-GB. Planning this project with EC2 sets us up well for the opportunity of long-term savings through code optimisation. With the ASG approach I've outlined, I think it also brings virtually every single benefit we've come to enjoy from ECS to the EC2 based workers.

Comment on lines 394 to 415
- Use Airflow's
[EC2Hook#describe_instances](https://airflow.apache.org/docs/apache-airflow-providers-amazon/stable/_api/airflow/providers/amazon/aws/hooks/ec2/index.html#airflow.providers.amazon.aws.hooks.ec2.EC2Hook.describe_instances)
to get a list of internal URLs bound to the indexer workers for the
appropriate environment. In the local environment, this will instead return
the URL for the catalog-indexer-worker Docker container.
- Use
[EC2Hook#start_instances](https://airflow.apache.org/docs/apache-airflow-providers-amazon/stable/_api/airflow/providers/amazon/aws/hooks/ec2/index.html#airflow.providers.amazon.aws.hooks.ec2.EC2Hook.start_instances)
to start the appropriate EC2 instances. This step should skip in a local
environment.
- Use dynamic task mapping to generate a Sensor for each of the expected indexer
workers, and ping its `healthcheck` endpoint until it passes.
- Use dynamic task mapping to distribute reindexing across the indexer workers
by first calculating `start` and `end` indices that will split the records in
the media table into even portions, depending on the number of workers
available in the given environment. Then:
- POST to each worker's `reindexing_task` endpoint the `start_index` and
`end_index` it should handle
- Use a Sensor to ping the worker's `task/{task_id}` endpoint until the task
is complete, logging the progress as it goes
- Use the
[EC2Hook#stop_instances](https://airflow.apache.org/docs/apache-airflow-providers-amazon/stable/_api/airflow/providers/amazon/aws/hooks/ec2/index.html#airflow.providers.amazon.aws.hooks.ec2.EC2Hook.stop_instances)
to shut the instance down. This step should skip in a local environment.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mention using ec2-service later on, and in thinking further about the implications of that compared to creating EC2 instances directly, I think we have an opportunity to leverage the functionality of an auto-scaling group to trivially get somewhat automatically updating indexer workers, along with simplifying the DAGs set of tasks by removing most of the need for mapped tasks.

When managing EC2 instances directly (what we do currently), we start the instances once when we provision them, and then immediately shut them down. When the instances start again, a cron job is responsible for restarting the docker compose services1. All of this is configured once in the user data script at instance creation. User data scripts run only on instance creation and never again (at least not automatically, I'm sure you could locate the script and run it manually if you wanted to).

With ASGs, however, we have some interesting capabilities that would make deploying even a bumped version tag (if we did not use a new tag to point to the "release" we want). The critical thing to know about ASGs is that they do not rely on stopped instances, but rather terminated instances. Meaning the instances literally get created and destroyed as needed, rather than started and stopped. This is important because it means the user data script runs every time an ASG instance spins up. The user data script is defined in the launch template, which in turn means that a "deployment" for a correctly leveraged ASG should mean just updating the launch template and then replacing each running instance one-by-one (or in whatever pattern makes sense for the given service). If we ran the API or frontend in an ASG again, that's how we would do it, rather than needing to create an entirely new ASG each time.

For the indexer workers, what does this mean? It means we can actually basically ignore Ansible for them, and instead completely leverage Terraform and only Terraform, significantly simplifying their management. Rather than running an Ansible playbook that would one-by-one start and then SSH into each indexer worker, we would just push an update to the ASG launch template. The next time any instance starts in that ASG, it will use the new launch template, and therefore have whatever updates we pushed to it.

This somewhat blends our old approach with the new approach, and basically sets up the best part of ECS (the need to only define a task-definition) with the ASG (by using a launch template, instead of task definition).

Then, in Airflow, rather than starting and stopping individual EC2 instances, we just do the following:

  1. Query for the relevant ASG for the environment we need workers for
  2. Issue a set-desired-capacity command to the ASG to set it to the capacity used for the data refresh
  3. Query the ASG using describe-auto-scaling-group, which returns the ASG health and a list of all the instances with health status, and wait until instances are available to accept tasks (some or all are healthy).
  4. Continue with the non-ASG approach as described in the IP, just use the ASG's instances list instead of querying for the EC2 instances matching the tag (and of course skip starting the instances).
  5. As each instance finishes everything the DAG needs it to do, tell the ASG to spin it down using manual specific instance termination which also allows us to decrement the desired capacity at the same time. This bascially tells the ASG: terminate this instance, and don't replace it.

This has some significant benefits over the EC2 approach:

  1. Deployments will take literally seconds (updating a launch template is nearly instant)
  2. Instances do not need to know how to terminate themselves, the DAG will do that for us
  3. We can take a completely "hands-free" approach to the instances, thinking only in terms of launch templates and ASGs for the most part, eliminating the need for any new Ansible playbooks.
  4. Additionally, we can use SSM parameters for selecting the AMI at launch time rather than using a static AMI ID, meaning we have one less place to worry about updating system dependencies (which is the main headache long-running EC2 instances carry, in my opinion).

In the future, we could even employ a target group with the internal load balancer and use target group stickiness configured with a custom app cookie, where each task responsible for one of the indexer workers generates a unique cookie value for the lifetime of the DAG run. Then the target group will automatically route requests to the same instance without the task needing to maintain a reference to the instance or us ever needing to even list the individual instances. That's definitely out of scope here, but wanted to share an example of what ASG's get us.

To summarise, rather than directly interacting with EC2, we can interact with the ASG. I don't think we can use AWS operators for this, there don't appear to be ones for ASGs, but boto3 supports them, so we just need to write our own interactions (like we do for s3). This eliminates the need for the Ansible playbook, like I said, so it will meaningfully reduce the amount of work done on the infrastructure side, in addition to reducing the number of mapped tasks in the DAG.

Footnotes

  1. Which, by the way, really is not necessary if we just used the docker service as intended, with an appropriate restart policy, but that's neither here nor there

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh also, I forgot about this difference with ECS. We already have a process for automatically updating the ECS task definition when we cut a new release. We do not have that for launch templates, which are roughly the equivalent for EC2 instances (and the ASG approach). But, as you've said in the plan, we do not often update the code for the workers, so that isn't a big concern. Additionally, it would be good to keep updates to the launch templates manual (e.g., running tf apply) so that we make sure that we only do it when a data refresh is not running.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sarayourfriend this is incredible 😍 I'm going to take some time to do a bit more background reading on this myself, but this is really exciting.

One possible correction is that while the indexer workers currently shut themselves down, in my proposed implementation the DAG would send the stop_instances request. (However as you've noted that's only in the context of a data refresh; they need to shut themselves down after deploying. I also had not realized that we're currently using a cron job to restart the stack when the instances start 😮)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1-ing Staci's note - thank you for this @sarayourfriend! I hadn't thought about leveraging ASGs here for this, but it feels so right. Definitely on board with this approach!

Copy link
Collaborator

@sarayourfriend sarayourfriend Apr 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, thanks for the clarification there, Staci.

they need to shut themselves down after deploying

That would no longer be necessary using the ASG, just to clarify 😁

I also had not realized that we're currently using a cron job to restart the stack when the instances start 😮

Me neither!

Copy link
Member

@zackkrida zackkrida Apr 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sarayourfriend perhaps to your point above, during a data refresh we could set the ASG to use a fixed version of the template rather than latest, and then switch the config back to latest after the refresh is complete. That should be possible with boto3.

Edit: or perhaps it would be simplest to always use a fixed version and bump it whenever we deploy a new launch template definition 🤔

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

during a data refresh we could set the ASG to use a fixed version of the template rather than latest, and then switch the config back to latest after the refresh is complete.

Pinning the versions in Airflow is a really interesting idea 🤔 I think we could do something like this to pin/unpin the launch template version, but if the user_data script is always pulling the latest tag on the Docker image then code changes will be able to come through (for new instances) regardless, right? Although maybe those changes are okay to make mid-refresh.

One thing to note is that if an indexer worker fails for any reason, the Airflow wait_for_reindexing task will fail and won't automatically retry with a new worker instance. If the worker failed due to some issue with the reindexing, Airflow will terminate the instance to ensure it’s not left running (and prevent the ASG from making a new one). If the instance itself crashes for some reason and the ASG tears it down, then the Airflow wait_for_reindexing task will still fail because it will try to poll a terminated instance. But the ASG will spin up a new one and the Airflow task will not know how to connect to resume, or to terminate the new instance gracefully. I think we’d need an extra Airflow task at the end to account for this 😓 But in either case, if a worker fails the DAG won’t be able to retry on a new instance automatically.

This discussion has made me realize that if we use an ASG instead of the more naive EC2 approach, we lose the ability to retry a single failed reindexer worker, because when a worker is terminated and a new one is spun up, there is no way to clear a single trigger_reindex task (because it will be trying to connect to the terminated instance). The best thing I have come up with is making distributed_reindex a reusable TaskGroup that accepts a list of worker params, and then having an entirely separate DAG that accepts worker params as a conf option and can be used to manually retry parts of a failed data refresh. This feels very convoluted.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although as we pointed out recently, if a single reindexing task fails and we're forced to retry all of them (in this case by clearing the DAG at the task that sets the ASG desired capacity), it's not any slower than retrying a single indexer worker because they run concurrently. The loss there is in the literal cost of unnecessarily running the other workers for however many hours the reindexing takes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to post back-to-back 😅 but -- the inability to retry a single reindexing task has an advantage in that it actually fixes the problem of ensuring that all the indexer workers are running the exact same code and have the exact same configuration. The trade-off is just the cost of running the successful workers in scenarios where the configuration hadn't changed, which is perhaps fair.

This would allow us to use latest for both the launch template version and docker tag.

Copy link
Collaborator

@sarayourfriend sarayourfriend Apr 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should generate AMIs for the indexer workers with the Docker image they should run bundled into them: https://github.com/WordPress/openverse-infrastructure/issues/843#issuecomment-2058094706

The launch template would always point to the latest AMI, and a new AMI would only be generated for production manually (we can also make it manual for staging if we want). We can configure the ASG to replace instances using the launch template they were running before, rather than using the latest launch template version (or pin the ASG to a specific launch template version rather than $Latest, and manually bump it when data refresh isn't running for that environment if a new AMI exists, maybe even just using a DAG that runs and checks if data refresh is running before bumping the launch template to the new AMI version).

The AMI approach has a lot of benefits over what I suggested before:

  1. Determinism: we know exactly what dependencies and Docker image will run
  2. Those are all pre-installed/downloaded to the AMI, meaning the instance has zero work to do at boot, just start the service
  3. No dependencies on GHCR or other external services

To avoid blocking this project, I want to spend some time looking at the AMI approach I shared above and tinker with the existing indexer worker code as the proof of concept. We need a proof of concept anyway for Airflow/Kibana/other EC2 based services, and the indexer workers are a very simple baseline to use.

Co-authored-by: sarayourfriend <[email protected]>
Copy link
Collaborator

@AetherUnbound AetherUnbound left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an incredibly thorough implementation plan, thank you for taking the time to get this all "on paper"! I have a few questions and notes for clarification, and I'm really excited about Sara's suggestion for how to manage the EC2 instances in a way that will be more hands-off 🚀

Comment on lines 223 to 232
The ECS approach allows us to remove 100% of the server management code, and
eliminates the need for deployments in essentially every case (unless the task
definition itself changes). However it is more expensive, more difficult to
implement, and requires a special secondary implementation for running locally
that may be prone to cross-platform issues.

The EC2 approach on the other hand is cheaper, quicker to implement, and gets us
the vast majority of what we want in terms of removing complexity and reducing
the need for deployments. Consequently I argue here that the EC2 approach is the
one we should pursue.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to say, I still would like to pursue the ECS approach, even with the disadvantages you mention 😅 But, that's not a blocking concern, merely a personal desire to get rid of the last vestiges of the seemingly-unnecessary webserver aspects of the data refresh process 😆

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the inability to realistically test the data refresh locally, needing an entirely separate process locally, is pretty significant.

seemingly-unnecessary webserver aspects

In what sense are they unnecessary? ECS still runs a server, it eliminates a small amount of deployment complexity, but it isn't complexity we don't have elsewhere and can solve to make as or almost as simple as our ECS deployments (which would be good for making Airflow, Kibana, Elasticsearch, etc all simpler and easier to understand and maintain).

The difference in long-term cost is significant, with a bit of up-front infrastructure work that will build off of and benefit the work we've done for other services. Why avoid EC2 if we can and regardless need to make it simple to use for the sake of our other services? We have a low level of understanding of EC2, but we need to improve that, and as we have done, everything about the process improves and eliminates the relative complexity of it compared to ECS.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seemingly-unnecessary webserver aspects

The indexer workers run their own webserver in order to accept tasks and issue progress on them. That code could be completely obviated if the containers which were to run on ECS received the range they needed to index from Airflow and then ran directly on that chunk. The containers could spin up, use the range supplied via CLI to run, and then complete and exit. This would remove the need for maintaining any webserver aspects for the indexer workers, since the containers would only be running as long as it takes to index everything.

To be clear, I'm comfortable with the current approach proposed here nonetheless.

Comment on lines 394 to 415
- Use Airflow's
[EC2Hook#describe_instances](https://airflow.apache.org/docs/apache-airflow-providers-amazon/stable/_api/airflow/providers/amazon/aws/hooks/ec2/index.html#airflow.providers.amazon.aws.hooks.ec2.EC2Hook.describe_instances)
to get a list of internal URLs bound to the indexer workers for the
appropriate environment. In the local environment, this will instead return
the URL for the catalog-indexer-worker Docker container.
- Use
[EC2Hook#start_instances](https://airflow.apache.org/docs/apache-airflow-providers-amazon/stable/_api/airflow/providers/amazon/aws/hooks/ec2/index.html#airflow.providers.amazon.aws.hooks.ec2.EC2Hook.start_instances)
to start the appropriate EC2 instances. This step should skip in a local
environment.
- Use dynamic task mapping to generate a Sensor for each of the expected indexer
workers, and ping its `healthcheck` endpoint until it passes.
- Use dynamic task mapping to distribute reindexing across the indexer workers
by first calculating `start` and `end` indices that will split the records in
the media table into even portions, depending on the number of workers
available in the given environment. Then:
- POST to each worker's `reindexing_task` endpoint the `start_index` and
`end_index` it should handle
- Use a Sensor to ping the worker's `task/{task_id}` endpoint until the task
is complete, logging the progress as it goes
- Use the
[EC2Hook#stop_instances](https://airflow.apache.org/docs/apache-airflow-providers-amazon/stable/_api/airflow/providers/amazon/aws/hooks/ec2/index.html#airflow.providers.amazon.aws.hooks.ec2.EC2Hook.stop_instances)
to shut the instance down. This step should skip in a local environment.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1-ing Staci's note - thank you for this @sarayourfriend! I hadn't thought about leveraging ASGs here for this, but it feels so right. Definitely on board with this approach!

@stacimc
Copy link
Collaborator Author

stacimc commented Apr 10, 2024

Drafting to address comments :)

@stacimc stacimc marked this pull request as draft April 10, 2024 19:21
@stacimc stacimc marked this pull request as ready for review April 11, 2024 22:29
@stacimc stacimc requested a review from AetherUnbound April 11, 2024 22:29
@stacimc stacimc requested a review from sarayourfriend April 11, 2024 22:29
@stacimc
Copy link
Collaborator Author

stacimc commented Apr 11, 2024

Updates are up and I've re-requested reviews. Particularly interested in folks' thoughts on the discussion toward the end of this thread and the conclusions I came to here :)

Copy link
Collaborator

@AetherUnbound AetherUnbound left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the revisions, I think all my concerns are assuaged! I'm so stoked for this 🤩

@stacimc stacimc mentioned this pull request Apr 15, 2024
13 tasks
Copy link
Collaborator

@sarayourfriend sarayourfriend left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM.

I've left a couple comments describing how the AMI approach I talk about in this comment in the infrastructure repository would solve some of the edge cases in this PR introduced by using the "always pull latest" docker image approach. None of that blocks this plan though, we can move forward with it as is, and the AMI approach would supplant it when eventually implemented. Of course, it could be worth noting the AMI approach at least in general, so that if it is available before the implementation actually reaches the step of creating the ASG, that it should be used instead.

Comment on lines +468 to +469
- The `user_data` script should be updated to pull the Docker image with the
`latest` tag.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a nit to note that this should change if we use the AMI approach, which I think we should, and I aim to have resolved so as not to block the infrastructure work of this IP. It's a conceptual/foundational piece of our ASG-based deployment puzzle that needs to be solved one way or another. It doesn't need to block this project, so you don't need to change from this described approach here using docker to pull the latest image, and the AMI approach builds on that. But just in case we manage to sort it out before we get to this point of the plan, it's worth noting the difference.

@stacimc stacimc merged commit 797b32c into main Apr 17, 2024
38 checks passed
@stacimc stacimc deleted the implementation-plan/ingestion-server-removal branch April 17, 2024 01:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
📄 aspect: text Concerns the textual material in the repository 🧰 goal: internal improvement Improvement that benefits maintainers, not users 🟨 priority: medium Not blocking but should be addressed soon 🧱 stack: catalog Related to the catalog and Airflow DAGs 🧱 stack: documentation Related to Sphinx documentation 🧱 stack: infra Related to the Terraform config and other infrastructure 🧱 stack: ingestion server Related to the ingestion/data refresh server
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Implementation Plan for Ingestion Server removal
4 participants