Skip to content

Latest commit

 

History

History

update_beanstalk_index

Building beanstalk for UCLDC Solr index

The Solr index that powers the Calisphere website is hosted on the AWS Elastic Beanstalk platform.

The CNAME solr.calisphere.org points to https://eb-ucldc-solr2.us-west-2.elasticbeanstalk.com, whichever Beanstalk environment is at this address will be the server for our search requests--i.e. the backbone of production Calisphere.

The Beanstalk is hosted in the Oregon (us-west-2) AWS region. The application name is ucldc-solr2. Currently it runs on only one micro EC2 instance.

The process to create a new production index is as follows:

  1. Optimize the Solr index
  2. Push the index to S3
  3. Clone the existing environment
  4. In the cloned environment, set the env var INDEX_PATH to the new index sub-path in S3
  5. Rebuild the cloned environment
  6. Check that the cloned environment is serving up the new index
  7. Swap URLs from existing environment to the new cloned environment
  8. Update the eb-ucldc-solr2 environment

This will put in place the new index.

Generally, we rebuild the original environment and swap back so the name of the environment remains eb-ucldc-solr2-prod. Not really necessary but makes it a bit easier to remember what's what.

Step 1

Optimize the Solr index:

Step 2

To push a new index to S3:

  • Log into blackstar and sudo su - hrv-prd

  • Run snsatnow solr-index-to-s3.sh. The DATA_BRANCH is set to production in this environment. This will push the last build Solr index to S3 at the location. This process will take a while, but with the snsatnow wrapper it will send a message to the dsc_harvesting_report Slack channel when finished (It takes some time for the new index to be packaged and zipped on S3).

    solr.ucldc/indexes//YYYY/MM/solr-index.YYYY-MM-DD-HH_MM_SS.tar.bz2

  • Look at the message sent to dsc_harvesting_report Slack channel. Find the s3_file_path reported there. It will be something like: s3_file_path": "s3://solr.ucldc/indexes/production/2023/06/solr-index.2023-06-08-16_58_10.tar.bz2

  • This is the value to pass into the update environment command

Steps 3-5

The script clone-with-new-s3-index.sh will do steps 3 to 5 above.

  • First, check what environments are running. Run this from your home directory (e.g., /home/ec2-user or /home/hrv-prd):
eb list
  • If there are two enviroments, determine which one is serving as the production environment by running: eb status [environment name] on each. Whichever environment has a CNAME value of https://eb-ucldc-solr2.us-west-2.elasticbeanstalk.com is the production index, so run the following terminate command on the OTHER environment (be sure to not terminate the production environment with the eb-ucldc-solr2 only CNAME!):
eb terminate [environment name]
  • Now run the following, where the <new index path> is the value from Step #1 (e.g., s3://solr.ucldc/indexes/production/2023/06/solr-index.2023-06-08-16_58_10.tar.bz2). This process will take a while. By convention, we name the existing environment (<old env name>) eb-ucldc-solr2-prod and the new environment (<new env name>) eb-ucldc-solr2-clone. However, these names may be switched, depending on which which enviroment was the production environment in the step above. So it may be the case that the production environment is eb-ucldc-solr2-clone and the new environment will be eb-ucldc-solr2-prod
snsatnow clone-with-new-s3-index.sh <old env name> <new env name> <new index path>
  • This command will send a message to the dsc_harvesting_report Slack channel when finished
  • When it finishes, you should be able to run the following, and see that INDEX_PATH is updated to the value passed to the script.
eb printenv <new env name>

Step 6

Check the new environment's URL for the proper search results:

  • Run the following, to confirm the URL that is associated with the environment:
cname_for_env.sh <new env name>
  • You can check that the URL is up (and verify that the total object count matches QA reports) by running:
check_solr_api_for_env.sh <new env name>

Step 7

Swap URLs from the existing environment to the new cloned environment running the updated solr index:

  • First, check what environment has the eb-ucldc-solr2.us-west-2.elasticbeanstalk.com CNAME:
eb status <new env name>

Also, check the status and health of the environment. Here's an example of a happy environment:

Environment details for: eb-ucldc-solr2
 Application name: ucldc-solr2
 Region: us-west-2
 Deployed Version: new-nginx-index-html
 Environment ID: e-dmmzpvb2vj
 Platform: 64bit Amazon Linux 2016.03 v2.1.3 running Docker 1.11.1
 Tier: WebServer-Standard
 CNAME: eb-ucldc-solr2.us-west-2.elasticbeanstalk.com
 Updated: 2023-06-08 02:09:01.062000+00:00
 Status: Ready <--
 Health: Green <--
  • If both look right, swap the URLs and the new index will be live (eb swap -n <new env name> <old env name>):
eb swap -n eb-ucldc-solr2-clone eb-ucldc-solr2-prod

Step 8

Updating the eb-ucldc-solr2 environment:

  • After updating the eb-ucdlc-solr2 environment, we want to swap back the URL so that the production environment we have up is always named eb-ucldc-solr2-prod. Note that this is not required, but the important thing is that the eb-ucdlc-solr2.us-west-2.elasticbeanstalk.com/solr/query URL works.

  • The update-env-with-new-s3-index.sh command will update an existing beanstalk environment to the new index path. e.g.

snsatnow update-env-with-new-s3-index.sh eb-ucldc-solr2-prod s3://solr.ucldc/indexes/production/2023/06/solr-index.2023-06-08-16_58_10.tar.bz2 
  • When finished running (this takes about 40 min.), it will send a message to the dsc_harvesting_report Slack channel

  • Once that is done, you can swap CNAMEs to the updated environment.

eb swap -n eb-ucldc-solr2-clone eb-ucldc-solr2-prod
  • Now the eb-ucldc-solr2-prod environment is once again the production environment with the CNAME eb-ucdlc-solr2.us-west-2.elasticbeanstalk.com attached to it.