Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipeline optimization (Sprint 34–36) #105

Merged
merged 24 commits into from
Jun 14, 2021
Merged

Commits on Jun 7, 2021

  1. add_data: Use jq to simplify code

    (With revision up to 2021-05-20)
    anthonyfok committed Jun 7, 2021
    Configuration menu
    Copy the full SHA
    dfeb9e9 View commit details
    Browse the repository at this point in the history
  2. add_data: Add fetch_psra_csv_from_model and merge_csv functions

    to simplify code for PSRA CSV imports
    
    (Updated 2021-05-21)
    anthonyfok committed Jun 7, 2021
    Configuration menu
    Copy the full SHA
    50af463 View commit details
    Browse the repository at this point in the history
  3. add_data: New fetch_csv_xz function

    to download from xz-compressed repos for speed and cost-saving (no LFS)
    
    See #91
    anthonyfok committed Jun 7, 2021
    Configuration menu
    Copy the full SHA
    4fda1d5 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    ee3c8ad View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    6463dba View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    7e97869 View commit details
    Browse the repository at this point in the history
  7. add_data: Fetch pointers of CSV files for "oid sha256"

    Also, upgrade git to the latest version (2.32.0.rc0 as of this writing)
    because "git checkout" for model-inputs got stuck with git 2.28.
    
    See #83
    anthonyfok committed Jun 7, 2021
    Configuration menu
    Copy the full SHA
    38d59b9 View commit details
    Browse the repository at this point in the history
  8. add_data: Clone model-factory before the wait for postgres

    for a little bit of time-saving.
    anthonyfok committed Jun 7, 2021
    Configuration menu
    Copy the full SHA
    8491609 View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    7dc525d View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    7b0b921 View commit details
    Browse the repository at this point in the history
  11. Configuration menu
    Copy the full SHA
    8f6467e View commit details
    Browse the repository at this point in the history
  12. add_data: Allow dry run for testing and debugging

    Commands are prefixed with RUN or "is_dry_run || " in add_data.sh
    for more verbose logging and to allow dry run.
    
    Dry-run mode may be enabled by using ADD_DATA_DRY_RUN=true
    in the .env file; see sample.env for example.
    anthonyfok committed Jun 7, 2021
    Configuration menu
    Copy the full SHA
    12dbd02 View commit details
    Browse the repository at this point in the history
  13. Configuration menu
    Copy the full SHA
    313776f View commit details
    Browse the repository at this point in the history
  14. add_data: Hide secrets such as POSTGRES_PASS and ES_PASS in LOG()

    unless their values are literally "password".
    
    Also fix bug in LOG() where secrets were not hidden
    when there was only one argument.
    anthonyfok committed Jun 7, 2021
    Configuration menu
    Copy the full SHA
    3116de2 View commit details
    Browse the repository at this point in the history
  15. add_data: Read variables from environment instead of command-line

    Instead of manually passing the needed variables as command-line arguments
    to add_data.sh in python/Dockerfile, rely on these variables being already
    in the environment, as defined either in .env file for Docker Compose,
    or in the task definition of Amazon ECS.
    
    Also add a quick environment variable check at the beginning of
    add_data.sh to warn if any variable is empty.
    
    This fixes "curl: (3) URL using bad/illegal format or missing URL" error
    in "Creating PSRA Kibana Index Patterns" due to unquoted command-line
    arguments in python/Dockerfile, causing KIBANA_ENDPOINT to be empty
    when the optional ES_USER and ES_PASS are empty.
    anthonyfok committed Jun 7, 2021
    Configuration menu
    Copy the full SHA
    ffd2078 View commit details
    Browse the repository at this point in the history
  16. add_data: Move major steps info their own functions

    Put the main program in a function called "main" as the bottommost function
    from which all the major steps are called.  See the Google Shell Style Guide
    at https://google.github.io/styleguide/shellguide.html#s7.8-main
    
    Other changes include:
    
     * LOG(): Correct quoting for one argument containing single quote
     * ERROR(): Exit the program after showing
     * check_environment_variables(): Abort if a mandatory variable is undefined
     * LOG(): Print line numbers and function names by default too,
       see ADD_DATA_PRINT_LINENO and ADD_DATA_PRINT_FUNCNAME in sample.env
    anthonyfok committed Jun 7, 2021
    Configuration menu
    Copy the full SHA
    15c2d18 View commit details
    Browse the repository at this point in the history
  17. add_data: Use RUN for all the main steps too

    Also: Rename import_data_from_postgis_to_elasticsearch to
    export_to_elasticsearch for shorter line lengths in the log.
    anthonyfok committed Jun 7, 2021
    Configuration menu
    Copy the full SHA
    de8375c View commit details
    Browse the repository at this point in the history
  18. add_data: Fix doc regarding OpenQuake CSV header stripping

    I originally wanted to make merge_csv to handle OpenQuake CSV comment
    header stripping, but have not found a good solution yet, so that
    functionality remains in fetch_psra_csv_from_model.
    
    This commit fixes the error in the merge_csv function description.
    anthonyfok committed Jun 7, 2021
    Configuration menu
    Copy the full SHA
    e114c19 View commit details
    Browse the repository at this point in the history

Commits on Jun 10, 2021

  1. Configuration menu
    Copy the full SHA
    60e3e57 View commit details
    Browse the repository at this point in the history
  2. add_data: Revert change to pg_isready port setting

    Revert my unrelated and undocumented and buggy change to the
    -p port setting for pg_isready in wait_for_postgres().
    See the reviews at #105 for more details.
    
    Special thanks to @drotheram for catching this bug!
    anthonyfok committed Jun 10, 2021
    Configuration menu
    Copy the full SHA
    1bfe87d View commit details
    Browse the repository at this point in the history
  3. add_data: Fix erroneous CSV generated by merge_csv

    Remove RUN from the awk command, as RUN with '>' ended up prepending
    LOG() output into the first line of the merged CSV file.
    
    Special thanks to Drew Rotheram (@drotheram) for catching this!
    See reviews at #105 for more information.
    anthonyfok committed Jun 10, 2021
    Configuration menu
    Copy the full SHA
    21fb875 View commit details
    Browse the repository at this point in the history

Commits on Jun 11, 2021

  1. add_data: Do not fail if /usr/bin/time does not exist

    This allows the script to be run without error in e.g. the db-opendrr
    (postgis) service container where GNU time is not pre-installed.
    
    Note: GNU time is not strictly needed because it is used mainly for
    tracking memory usage (Maximum resident set size or "maxresident") only.
    It may be installed in Debian-based container (e.g. db-opendrr) using
    "apt update && apt install time".
    anthonyfok committed Jun 11, 2021
    Configuration menu
    Copy the full SHA
    fd9ae1d View commit details
    Browse the repository at this point in the history

Commits on Jun 12, 2021

  1. add_data: Add missing YT to PT_LIST

    Special thanks to @drotheram for catching this glaring mistake of mine!
    
    The error was introduced in commit 50af463 (commented EXPECTED_PT_LIST)
    and then in commit 12dbd02 where I renamed it to PT_LIST to replace
    the originally fetched list without actually verifying it.
    anthonyfok committed Jun 12, 2021
    Configuration menu
    Copy the full SHA
    29dec2c View commit details
    Browse the repository at this point in the history

Commits on Jun 14, 2021

  1. add_data: Remove double quotes around ${ES_CREDENTIALS:-}

    to avoid "curl: (3) URL using bad/illegal format or missing URL"
    (non-fatal error) when ES_CREDENTIALS is empty where curl would
    interpret the quoted empty string "" as an URL.
    anthonyfok committed Jun 14, 2021
    Configuration menu
    Copy the full SHA
    3b9c39c View commit details
    Browse the repository at this point in the history