Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create XZ-compressed Git repos and download from them #91

Open
4 of 6 tasks
anthonyfok opened this issue Apr 29, 2021 · 1 comment
Open
4 of 6 tasks

Create XZ-compressed Git repos and download from them #91

anthonyfok opened this issue Apr 29, 2021 · 1 comment
Assignees
Labels
Enhancement New feature or request Task
Milestone

Comments

@anthonyfok
Copy link
Member

anthonyfok commented Apr 29, 2021

Tasks

  • Compress repos and upload them to GitHub
  • Edit the fetch_csv function Add new fetch_csv_xz function in OpenDRR/opendrr-api/python/add_data.sh to download from these compressed repos
  • Rename the original fetch_csv function as fetch_csv_lfs
  • New fetch_csv function to call fetch_csv_xz and fallback to fetch_csv_lfs
  • Deal with corner cases where add_data.sh needs to fetch historic CSV files that may no longer exists in HEAD
  • GitHub Actions for automatic verification and update

Description

Git LFS file download failure (Issue #90) might have been caused by we running out of our GitHub monthly bandwidth quota, especially with my frequent run of docker-composer up --build and docker-composer down -v in recent days.

Create compressed equivalents of LFS repos, e.g. model-inputs → model-inputs-gz or model-inputs-xz, etc. (2021-05-10 update: xz is chosen for its SHA-256 sum feature which matches oid sha256 entries in Git LFS pointer files.)

Or perhaps use our B2 or S3 bucket? (populate manually or using GitHub Actions)
Or can some kind of HTTP proxy be used? Anyway to use B2 or S3 for such a proxy?
2021-05-10 update: Downloading directly from https://raw.githubusercontent.com/ seems fast enough, so the use of buckets might not be necessary.

And what about local cache?

@anthonyfok anthonyfok added this to the Sprint 33 milestone Apr 29, 2021
@anthonyfok anthonyfok self-assigned this Apr 29, 2021
@jvanulde jvanulde modified the milestones: Sprint 33, Sprint 34 May 6, 2021
@anthonyfok
Copy link
Member Author

anthonyfok commented May 10, 2021

Notes

Repos to compress

  • OpenDRR/canada-srm2 (about 34 minutes)
  • OpenDRR/model-inputs (about 11 minutes)
  • OpenDRR/scenario-catalogue (about 5 to 6 hours)
  • OpenDRR/openquake-inputs (about 1 hour)

(inside brackets are the rough time for compression with xz -9 on a gen-3 Intel Core i5.)

Quickly verifying checksum

xz -lvv <xz-file> | grep -Eo '[0-9a-z]{64}'

Corner cases

OpenDRR/opendrr-api/python/add_data.sh currently fetches some historic CSV files that may have already been deleted in HEAD. grep -B1 '?ref' opendrr-api/python/add_data.sh gives a list of them:

fetch_csv model-inputs \
  exposure/census-ref-sauid/census-attributes-2016.csv?ref=ab1b2d58dcea80a960c079ad2aff337bc22487c5
--
fetch_csv model-inputs \
  exposure/general-building-stock/documentation/collapse_probability.csv?ref=73d15ca7e48291ee98d8a8dd7fb49ae30548f34e
--
fetch_csv model-inputs \
  exposure/general-building-stock/documentation/retrofit_costs.csv?ref=73d15ca7e48291ee98d8a8dd7fb49ae30548f34e
--
fetch_csv model-inputs \
  natural-hazards/mh-intensity-ghsl.csv?ref=ab1b2d58dcea80a960c079ad2aff337bc22487c5

@anthonyfok anthonyfok changed the title Create compressed versions of repos (and/or HTTP proxy?) Create XZ-compressed Git repos and download from them May 10, 2021
anthonyfok added a commit that referenced this issue May 20, 2021
to download from xz-compressed repos for speed and cost-saving (no LFS)

See #91
anthonyfok added a commit to anthonyfok/opendrr-api that referenced this issue May 21, 2021
to download from xz-compressed repos for speed and cost-saving (no LFS)

See OpenDRR#91
anthonyfok added a commit to anthonyfok/opendrr-api that referenced this issue May 21, 2021
to download from xz-compressed repos for speed and cost-saving (no LFS)

See OpenDRR#91
@anthonyfok anthonyfok modified the milestones: Sprint 34, Sprint 35 May 25, 2021
anthonyfok added a commit that referenced this issue Jun 3, 2021
to download from xz-compressed repos for speed and cost-saving (no LFS)

See #91
anthonyfok added a commit that referenced this issue Jun 7, 2021
to download from xz-compressed repos for speed and cost-saving (no LFS)

See #91
@anthonyfok anthonyfok pinned this issue Jun 10, 2021
@anthonyfok anthonyfok modified the milestones: Sprint 35, Sprint 36 Jun 10, 2021
anthonyfok added a commit that referenced this issue Jun 14, 2021
to download from xz-compressed repos for speed and cost-saving (no LFS)

See #91
@anthonyfok anthonyfok modified the milestones: Sprint 36, Sprint 38, Sprint 39 Jul 13, 2021
@anthonyfok anthonyfok removed this from the Sprint 39 milestone Aug 5, 2021
@anthonyfok anthonyfok added this to the Sprint 40 milestone Aug 5, 2021
@anthonyfok anthonyfok modified the milestones: Sprint 40, Sprint 41, Sprint 42 Sep 9, 2021
@anthonyfok anthonyfok modified the milestones: Sprint 42, Sprint 43 Sep 23, 2021
@anthonyfok anthonyfok modified the milestones: Sprint 43, Sprint 44 Oct 13, 2021
@anthonyfok anthonyfok modified the milestones: Sprint 44, Sprint 45 Oct 25, 2021
@anthonyfok anthonyfok modified the milestones: Sprint 45, Sprint 46 Nov 8, 2021
@anthonyfok anthonyfok modified the milestones: Sprint 46, Sprint 47 Nov 22, 2021
@anthonyfok anthonyfok modified the milestones: Sprint 47, Sprint 50 Jan 17, 2022
@anthonyfok anthonyfok modified the milestones: Sprint 50, Sprint 52 Feb 15, 2022
@anthonyfok anthonyfok modified the milestones: Sprint 52, Sprint 53 Feb 28, 2022
@anthonyfok anthonyfok modified the milestones: Sprint 53, Sprint 54 Mar 14, 2022
@anthonyfok anthonyfok modified the milestones: Sprint 54, Sprint 55 Mar 25, 2022
@anthonyfok anthonyfok unpinned this issue Apr 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement New feature or request Task
Projects
None yet
Development

No branches or pull requests

2 participants