Pipeline optimization (Sprint 34–36) #105

(With revision up to 2021-05-20)

to simplify code for PSRA CSV imports (Updated 2021-05-21)

to download from xz-compressed repos for speed and cost-saving (no LFS) See #91

Also, upgrade git to the latest version (2.32.0.rc0 as of this writing) because "git checkout" for model-inputs got stuck with git 2.28. See #83

for a little bit of time-saving.

Commands are prefixed with RUN or "is_dry_run || " in add_data.sh for more verbose logging and to allow dry run. Dry-run mode may be enabled by using ADD_DATA_DRY_RUN=true in the .env file; see sample.env for example.

See PR #89

unless their values are literally "password". Also fix bug in LOG() where secrets were not hidden when there was only one argument.

Instead of manually passing the needed variables as command-line arguments to add_data.sh in python/Dockerfile, rely on these variables being already in the environment, as defined either in .env file for Docker Compose, or in the task definition of Amazon ECS. Also add a quick environment variable check at the beginning of add_data.sh to warn if any variable is empty. This fixes "curl: (3) URL using bad/illegal format or missing URL" error in "Creating PSRA Kibana Index Patterns" due to unquoted command-line arguments in python/Dockerfile, causing KIBANA_ENDPOINT to be empty when the optional ES_USER and ES_PASS are empty.

Put the main program in a function called "main" as the bottommost function from which all the major steps are called. See the Google Shell Style Guide at https://google.github.io/styleguide/shellguide.html#s7.8-main Other changes include: * LOG(): Correct quoting for one argument containing single quote * ERROR(): Exit the program after showing * check_environment_variables(): Abort if a mandatory variable is undefined * LOG(): Print line numbers and function names by default too, see ADD_DATA_PRINT_LINENO and ADD_DATA_PRINT_FUNCNAME in sample.env

Also: Rename import_data_from_postgis_to_elasticsearch to export_to_elasticsearch for shorter line lengths in the log.

I originally wanted to make merge_csv to handle OpenQuake CSV comment header stripping, but have not found a good solution yet, so that functionality remains in fetch_psra_csv_from_model. This commit fixes the error in the merge_csv function description.

@drotheram

Revert my unrelated and undocumented and buggy change to the -p port setting for pg_isready in wait_for_postgres(). See the reviews at #105 for more details. Special thanks to @drotheram for catching this bug!

@drotheram

Remove RUN from the awk command, as RUN with '>' ended up prepending LOG() output into the first line of the merged CSV file. Special thanks to Drew Rotheram (@drotheram) for catching this! See reviews at #105 for more information.

This allows the script to be run without error in e.g. the db-opendrr (postgis) service container where GNU time is not pre-installed. Note: GNU time is not strictly needed because it is used mainly for tracking memory usage (Maximum resident set size or "maxresident") only. It may be installed in Debian-based container (e.g. db-opendrr) using "apt update && apt install time".

@drotheram

Special thanks to @drotheram for catching this glaring mistake of mine! The error was introduced in commit 50af463 (commented EXPECTED_PT_LIST) and then in commit 12dbd02 where I renamed it to PT_LIST to replace the originally fetched list without actually verifying it.

to avoid "curl: (3) URL using bad/illegal format or missing URL" (non-fatal error) when ES_CREDENTIALS is empty where curl would interpret the quoted empty string "" as an URL.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pipeline optimization (Sprint 34–36) #105

Pipeline optimization (Sprint 34–36) #105

Commits on Jun 7, 2021

Commits on Jun 10, 2021

Commits on Jun 11, 2021

Commits on Jun 12, 2021

Commits on Jun 14, 2021