Skip to content

Running the Latest PDS API (Registry API) Locally

Sean Kelly edited this page Sep 24, 2021 · 4 revisions

🤯 Running the Latest PDS API (Registry API) Locally

This little guide is mostly notes cribbed from trying to get the latest Planetary Data System ReST-based server for its application programmer interface running on your local computer. Once running you can make API calls, mostly to the "Registry API", as that's the only part of the ReST API that works right now.

It's a little confusing because multiple GitHub repositories comprise the API server, and the documentation on each of them make assumptions about certain use cases but not the developer case of someone just wanting to run it locally—hence this document.

🖥 Assumptions

This document assumes:

  • Running macOS Big Sur (sw_vers says 11.6)
  • JAVA_HOME environment variable set to the output of /usr/libexec/java_home
    • As of this writing, that's /Library/Java/JavaVirtualMachines/jdk-16.0.1.jdk/Contents/Home
  • Docker Desktop 3.3.3
    • "gRPC FUSE" is enabled for filesystem sharing (on the General tab)
    • The /Users volume is enabled for file sharing (on the Resources → File Sharing tab)
      • Also: everything must be running out of your home directory—not /tmp—as a result
  • Xcode 13.0
    • Xcode command line tools 13.0 also installed (Preferences → Locations)
    • This provides /usr/bin/git
  • Homebrew 3.2.13
  • PATH contains the following directories:
    • /usr/bin, to find git, java
    • /usr/local/bin, to find python3, and python3 -V prints Python 3.9.7 or so
    • /usr/local/opt/maven#3.5/bin, to find mvn

📜 Procedure

The process of getting the PDS API (aka, the Registry API) up and running consists of these steps:

  1. Setting up Elasticsearch.
  2. Populating Elasticsearch with data.
  3. Running the PDS API (the "Registry API").
  4. Optionally, updating the PDS API Python client.

You'll also need to make a workspace for all this somewhere under your $HOME:

$ cd $HOME
$ mkdir pds
$ cd pds

And create a Docker network—but just once:

docker network create --driver bridge --label 'org.label-schema.name=PDS' pds

OK, here we go.

🩲 Elasticsearch

Set up and start an Elasticsearch server as follows:

$ mkdir es      # This will contain the Elasticsearch database
$ mkdir output  # This'll have output files after we harvest metadata to fill the database
$ docker container run \
    --detach \
    --name es \
    --env "discovery.type=single-node" \
   --network pds \
   --publish 9200:9200 \
   --publish 9300:9300 \
   --rm \
   --volume $HOME/pds/es:/usr/share/elasticsearch/data \
   elasticsearch:7.10.1

This will pull (if needed) an Elasticsearch image and start it as a container on ports 9200 and 9300 using the es subdirectory we made earlier to hold its database. It may take a few moments for Elasticsearch to start up. You can confirm it's running with:

curl --silent 'http://localhost:9200/registry/_search?q=*' | json_pp

This should give a JSON response indicating that the index named registry is not found with a status of 404.

👉 Note: Elasticsearch is running in the background (--detach). If you need to stop it, run

docker container stop es

🌳 Populating ElasticSearch

To fill up Elasticsearch with data, first create an image for the "PDS Registry App" (misnamed: it's not the actual Registry App or the API—it's the thing that gathers science metadata) as follows:

$ git clone https://github.com/NASA-PDS/pds-registry-app.git
$ cd pds-registry-app
$ docker image build --build-arg version_reg_app=latest --file Dockerfile.local --tag pds-registry-app:latest .

This'll take a while (4–20 minutes). You'll have a new Docker image named pds-registry-app:latest. Don't bother with the $(git rev-parse HEAD) nonsense in the pds-registry-app's README—that just creates more cleanup work later on.

You can re-create this image if you ever do a git pull and get newer changes, of course, overwriting your last pds-registry-app:latest tag.

Now, using that pds-registry-app:latest image, we can populate Elasticsearch with the database schema we need for PDS:

docker container run \
    --rm \
    --network pds \
    pds-registry-app:latest \
        registry-manager \
            create-registry \
                -es http://es:9200 \
                -schema /var/local/registry/elastic/registry.json 

The path /var/local/registry/elastic/registry.json exists in the image, so there's nothing you need to do to edit or update it. Now if you run

curl --silent 'http://localhost:9200/registry/_search?q=*' | json_pp

you should get a new JSON response that indicates no hits (i.e., "hits": []). So the schema's set up, but we still need to fill Elasticsearch with data. That's a two step process: step one is to "harvest" science data and put it into the $HOME/pds/output directory.

For some reason I cannot fathom test data is included in the image, and although that makes life easier, it's not the right way to build Docker images 😬. Anyway, run this:

docker container run \
    --rm \
    --network pds \
    --volume $HOME/pds/output:/var/local/harvest/output \
    pds-registry-app:latest \
    harvest \
        -c /var/local/harvest/conf/examples/bundles.xml \
        -o /var/local/harvest/output

This takes the test data in bundles.xml (in the image) and transforms it into a usable format into $HOME/pds/output.

Step two is to "ingest" the transformed data into Elasticsearch:

docker container run \
    --rm \
    --network pds \
    --volume $HOME/pds/output:/var/local/harvest/output \
    pds-registry-app:latest \
        registry-manager \
            load-data \
                -es http://es:9200 \
                -dir /var/local/harvest/output \
                -updateSchema n

You can then confirm that Elasticsearch is populated by running:

curl --silent 'http://localhost:9200/registry/_search?q=*' | json_pp

and this time should see the test data.

🏃‍♀️ Running the PDS API ("Registry API")

Elasticsearch is up and running and populated with data, but Elasticsearch is not the PDS API, but just the query engine for the API. We now get to start the API. To do so, try:

$ cd ..  # To leave pds-registry-app
$ git clone https://github.com/NASA-PDS/registry-api-service.git
$ cd registry-api-service
$ docker image build --build-arg version=latest --file docker/Dockerfile.local --tag registry-api-service:latest .

Again, this will take some time (2–20 minutes), but eventually you will get an image called registry-api-service:latest. And again, don't bother with the $(git rev-parse HEAD) flapdoodle as suggested in the README. You'll then need to edit the application.properties. Make a copy into the parent directory and edit it:

$ cp src/main/resources/application.properties ..
$ cd ..
$ vi application.properties  # Or substitute your favorite editor

Change the entry

elasticSearch.host=localhost:9200

to

elasticSearch.host=es:9200

At last, you can then start the PDS API (the "Registry API"):

docker container run \
    --rm \
    --network pds \
    --publish 8080:8080 \
    --volume $HOME/pds/application.properties:/usr/local/registry-api-service-latest/src/main/resources/application.properties \
   registry-api-service:latest

You can stop it with ⌃C (or whatever your interrupt key is).

And you can check that it's okay with:

curl --silent http://localhost:8080/bundles | json_pp

If you get a JSON payload of test data, it works.

🐍 Updating the Client

We've got a server up and running, and it's ReST-based so it's easy to use. But chances are you'll want a Python API to make it even easier. The PDS API Client is published on PyPI, but you've gone through all this effort to run the PDS API server locally, so you've got the latest-and-greatest—and so you need an up-to-date client—one that's newer than what's on PyPI.

Generating the Python code is a two-step process:

  1. Generate the code generator.
  2. Generate the Python code using the code generator.

⚡️ Generating the Generator

Normally, you could just use brew install openapi-generator and use it, but sadly it has a bug. Thankfully, Thomas Loubrieu has come up with a forked version that fixes the problem, so here's how you build his openapi-generator:

$ cd $HOME/pds
$ git clone https://github.com/tloubrieu-jpl/openapi-generator.git
$ cd openapi-generator
$ ./mvnw -Dmaven.test.skip=true package

🛃 Generating the Client

Now you can make the Python client:

$ cd ..  # To leave openapi-generator
$ git clone https://github.com/NASA-PDS/pds-api-client.git
$ cd pds-api-client
$ java -jar $HOME/pds/openapi-generator/modules/openapi-generator-cli/target/openapi-generator-cli.jar \
    generate -g python-legacy -i swagger.json --package-name pds.api_client \
    --additional-properties=packageVersion=X.Y.Z

Replace X.Y.Z with the version number you want to call your new package. Now install it into a Python virtual environment:

$ python3 -m venv venv
$ venv/bin/pip install --quiet --upgrade pip setuptools wheel
$ venv/bin/pip install --requirement requirements.txt
$ venv/bin/python setup.py install

You can see if it works by editing client-demo.py and changing the configuration.host to http://localhost:8080/; then run it:

venv/bin/python client-demo.py

If you get data, you're done. You can then manually install this package into other Python virtual environments by directory name:

pip install $HOME/pds/pds-api-client