Scripts and queries for inspecting Galaxy usage data.
Before getting started, we'll get a hold of a subset of Galaxy data so it is easier and faster to work with. This setup needs to be done once for a given set of data. Once the data is available, we can start querying it. For Maine, these queries tend to run abut half an hour each.
Take a look at the collect_job_data.sh
script to automate this process.
Put the following Job
table query in a file called job_query.sql
,
adjusting desired dates:
COPY (
SELECT
id, create_time, tool_id, tool_version, user_id, destination_id
FROM
job
WHERE
create_time >= '2021-08-01'
AND create_time < '2022-09-01') TO STDOUT;
Run the query from the command shell (where galaxy_main
is the name of database):
psql galaxy_main < job_query.sql > job_table.txt
Next, put the following Metrics
table query in a file called metrics_query.sql
,
matching the dates to the ones used in the job query above:
COPY (
SELECT
job.id,
job.destination_id,
job_metric_numeric.metric_name,
job_metric_numeric.metric_value
FROM
job
INNER JOIN job_metric_numeric
ON job_metric_numeric.job_id = job.id
WHERE
job.create_time >= '2021-08-01' AND
job.create_time < '2022-09-01' AND
(job_metric_numeric.metric_name = 'galaxy_slots' OR
job_metric_numeric.metric_name = 'memory.max_usage_in_bytes' OR
job_metric_numeric.metric_name = 'galaxy_memory_mb' OR
job_metric_numeric.metric_name = 'galaxy_slots' OR
job_metric_numeric.metric_name = 'cpuacct.usage' OR
job_metric_numeric.metric_name = 'runtime_seconds')) TO STDOUT;
Run the query from the command shell (where galaxy_main
is the name of database):
psql galaxy_main < metrics_query.sql > metrics_table.txt
Once we have the data, we can import it into a local database for easier querying.
To get started, let's start a Postgres server using a Docker container, mapping
a volume to a host path (replace <pwd>
with an absolute path):
docker run -d --rm --name main-data-postgres -e POSTGRES_PASSWORD=changeThis -v <pwd>/data/db-files/:/var/lib/postgresql/data -e PGDATA=/var/lib/postgresql/data/db-files/pg --mount type=tmpfs,destination=/var/lib/postgresql/data/pg_stat_tmp -p 5432:5432 postgres
Place the data files (job_table.txt
and metrics_table.txt
) into
./data/db-files/
. File sql/db-files/tables.sql
contains the structure for
the tables for our local database. We will make the data from the files
available in our PostgreSQL container. Note that there are a couple of sample
files in the data/db-files/
folder that can be used for testing and developing
queries.
Exec into the container and change into the data dir:
docker exec -it main-data-postgres bash
cd /var/lib/postgresql/data/
Next, we'll create the database and necessary tables:
createdb -U postgres -O postgres galaxy
psql -U postgres galaxy < tables.sql
Lastly, we import the data (this process will take about 10 minutes):
psql -U postgres -c "\copy job from 'job_table.txt';" galaxy
psql -U postgres -c "\copy job_metric_numeric from 'metrics_table.txt';" galaxy
Once the data is imported, refer to the queries in the sql
folder. Access the
database with:
psql -U postgres galaxy
To stop the container, issue
docker stop main-data-postgres
To start the container again, reusing already existing database, run the same
docker run
command from above. To start with a clean database, remove
./data/db-files/pg*
directories and start a new container.