-
Notifications
You must be signed in to change notification settings - Fork 3
Running the Latest PDS API (Registry API) Locally
This little guide is mostly notes cribbed from trying to get the latest Planetary Data System ReST-based server for its application programmer interface running on your local computer. Once running you can make API calls, mostly to the "Registry API", as that's the only part of the ReST API that works right now.
It's a little confusing because multiple GitHub repositories comprise the API server, and the documentation on each of them make assumptions about certain use cases but not the developer case of someone just wanting to run it locally—hence this document.
This document assumes:
- Running macOS Big Sur (
sw_vers
says 11.6) -
JAVA_HOME
environment variable set to the output of/usr/libexec/java_home
- As of this writing, that's
/Library/Java/JavaVirtualMachines/jdk-16.0.1.jdk/Contents/Home
- As of this writing, that's
- Docker Desktop 3.3.3
- "gRPC FUSE" is enabled for filesystem sharing (on the General tab)
- The
/Users
volume is enabled for file sharing (on the Resources → File Sharing tab)- Also: everything must be running out of your home directory—not
/tmp
—as a result
- Also: everything must be running out of your home directory—not
- Xcode 13.0
- Xcode command line tools 13.0 also installed (Preferences → Locations)
- This provides
/usr/bin/git
- Homebrew 3.2.13
- Use this to:
brew install maven
3.8.2 - Also use this to:
brew install [email protected]
- Use this to:
-
PATH
contains the following directories:-
/usr/bin
, to findgit
,java
-
/usr/local/bin
, to findpython3
, andpython3 -V
printsPython 3.9.7
or so -
/usr/local/opt/maven#3.5/bin
, to findmvn
-
The process of getting the PDS API (aka, the Registry API) up and running consists of these steps:
- Setting up Elasticsearch.
- Populating Elasticsearch with data.
- Running the PDS API (the "Registry API").
- Optionally, updating the PDS API Python client.
You'll also need to make a workspace for all this somewhere under your $HOME
:
$ cd $HOME
$ mkdir pds
$ cd pds
And create a Docker network—but just once:
docker network create --driver bridge --label 'org.label-schema.name=PDS' pds
OK, here we go.
Set up and start an Elasticsearch server as follows:
$ mkdir es # This will contain the Elasticsearch database
$ mkdir output # This'll have output files after we harvest metadata to fill the database
$ docker container run \
--detach \
--name es \
--env "discovery.type=single-node" \
--network pds \
--publish 9200:9200 \
--publish 9300:9300 \
--rm \
--volume $HOME/pds/es:/usr/share/elasticsearch/data \
elasticsearch:7.10.1
This will pull (if needed) an Elasticsearch image and start it as a container on ports 9200 and 9300 using the es
subdirectory we made earlier to hold its database. It may take a few moments for Elasticsearch to start up. You can confirm it's running with:
curl --silent 'http://localhost:9200/registry/_search?q=*' | json_pp
This should give a JSON response indicating that the index
named registry
is not found with a status
of 404
.
👉 Note: Elasticsearch is running in the background (--detach
). If you need to stop it, run
docker container stop es
To fill up Elasticsearch with data, first create an image for the "PDS Registry App" (misnamed: it's not the actual Registry App or the API—it's the thing that gathers science metadata) as follows:
$ git clone https://github.com/NASA-PDS/pds-registry-app.git
$ cd pds-registry-app
$ docker image build --build-arg version_reg_app=latest --file Dockerfile.local --tag pds-registry-app:latest .
This'll take a while (4–20 minutes). You'll have a new Docker image named pds-registry-app:latest
. Don't bother with the $(git rev-parse HEAD)
nonsense in the pds-registry-app
's README—that just creates more cleanup work later on.
You can re-create this image if you ever do a git pull
and get newer changes, of course, overwriting your last pds-registry-app:latest
tag.
Now, using that pds-registry-app:latest
image, we can populate Elasticsearch with the database schema we need for PDS:
docker container run \
--rm \
--network pds \
pds-registry-app:latest \
registry-manager \
create-registry \
-es http://es:9200 \
-schema /var/local/registry/elastic/registry.json
The path /var/local/registry/elastic/registry.json
exists in the image, so there's nothing you need to do to edit or update it. Now if you run
curl --silent 'http://localhost:9200/registry/_search?q=*' | json_pp
you should get a new JSON response that indicates no hits (i.e., "hits": []
). So the schema's set up, but we still need to fill Elasticsearch with data. That's a two step process: step one is to "harvest" science data and put it into the $HOME/pds/output
directory.
For some reason I cannot fathom test data is included in the image, and although that makes life easier, it's not the right way to build Docker images 😬. Anyway, run this:
docker container run \
--rm \
--network pds \
--volume $HOME/pds/output:/var/local/harvest/output \
pds-registry-app:latest \
harvest \
-c /var/local/harvest/conf/examples/bundles.xml \
-o /var/local/harvest/output
This takes the test data in bundles.xml
(in the image) and transforms it into a usable format into $HOME/pds/output
.
Step two is to "ingest" the transformed data into Elasticsearch:
docker container run \
--rm \
--network pds \
--volume $HOME/pds/output:/var/local/harvest/output \
pds-registry-app:latest \
registry-manager \
load-data \
-es http://es:9200 \
-dir /var/local/harvest/output \
-updateSchema n
You can then confirm that Elasticsearch is populated by running:
curl --silent 'http://localhost:9200/registry/_search?q=*' | json_pp
and this time should see the test data.
Elasticsearch is up and running and populated with data, but Elasticsearch is not the PDS API, but just the query engine for the API. We now get to start the API. To do so, try:
$ cd .. # To leave pds-registry-app
$ git clone https://github.com/NASA-PDS/registry-api-service.git
$ cd registry-api-service
$ docker image build --build-arg version=latest --file docker/Dockerfile.local --tag registry-api-service:latest .
Again, this will take some time (2–20 minutes), but eventually you will get an image called registry-api-service:latest
. And again, don't bother with the $(git rev-parse HEAD)
flapdoodle as suggested in the README. You'll then need to edit the application.properties
. Make a copy into the parent directory and edit it:
$ cp src/main/resources/application.properties ..
$ cd ..
$ vi application.properties # Or substitute your favorite editor
Change the entry
elasticSearch.host=localhost:9200
to
elasticSearch.host=es:9200
At last, you can then start the PDS API (the "Registry API"):
docker container run \
--rm \
--network pds \
--publish 8080:8080 \
--volume $HOME/pds/application.properties:/usr/local/registry-api-service-latest/src/main/resources/application.properties \
registry-api-service:latest
You can stop it with ⌃C (or whatever your interrupt key is).
And you can check that it's okay with:
curl --silent http://localhost:8080/bundles | json_pp
If you get a JSON payload of test data, it works.
We've got a server up and running, and it's ReST-based so it's easy to use. But chances are you'll want a Python API to make it even easier. The PDS API Client is published on PyPI, but you've gone through all this effort to run the PDS API server locally, so you've got the latest-and-greatest—and so you need an up-to-date client—one that's newer than what's on PyPI.
Generating the Python code is a two-step process:
- Generate the code generator.
- Generate the Python code using the code generator.
Normally, you could just use brew install openapi-generator
and use it, but sadly it has a bug. Thankfully, Thomas Loubrieu has come up with a forked version that fixes the problem, so here's how you build his openapi-generator
:
$ cd $HOME/pds
$ git clone https://github.com/tloubrieu-jpl/openapi-generator.git
$ cd openapi-generator
$ ./mvnw -Dmaven.test.skip=true package
Now you can make the Python client:
$ cd .. # To leave openapi-generator
$ git clone https://github.com/NASA-PDS/pds-api-client.git
$ cd pds-api-client
$ java -jar $HOME/pds/openapi-generator/modules/openapi-generator-cli/target/openapi-generator-cli.jar \
generate -g python-legacy -i swagger.json --package-name pds.api_client \
--additional-properties=packageVersion=X.Y.Z
Replace X.Y.Z
with the version number you want to call your new package. Now install it into a Python virtual environment:
$ python3 -m venv venv
$ venv/bin/pip install --quiet --upgrade pip setuptools wheel
$ venv/bin/pip install --requirement requirements.txt
$ venv/bin/python setup.py install
You can see if it works by editing client-demo.py
and changing the configuration.host
to http://localhost:8080/
; then run it:
venv/bin/python client-demo.py
If you get data, you're done. You can then manually install this package into other Python virtual environments by directory name:
pip install $HOME/pds/pds-api-client
Copyright © 2021-2024 California Institute of Technology.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.