Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache for LFS #165

Closed
gordinmitya opened this issue Feb 27, 2020 · 15 comments
Closed

Cache for LFS #165

gordinmitya opened this issue Feb 27, 2020 · 15 comments
Labels
question Further information is requested

Comments

@gordinmitya
Copy link

Hello,

Can you provide some info how to use checkout action with cache properly?
I wonder because frequent builds spend all the lfs download quota very fast (1GB per month). (Because it download everything from scratch every time, right?)

Also it would be great if there will be such option from the box!
Thank you in advance.

@ericsciple
Copy link
Contributor

It looks like lfs info is under .git/lfs so you might be able to cache that directory

@ericsciple ericsciple added the question Further information is requested label Feb 28, 2020
@mknejp
Copy link

mknejp commented Mar 30, 2020

I was trying to do this just today but unfortunately this doesn't seem to work. I have a cache action for .git/lfs/objects (where all the data files are) and execute actions/checkout@v2 with lfs: true and clean: false and I get the following output

Syncing repository: owner/repo
Working directory is 'd:\a\repo\repo'
"C:\Program Files\Git\bin\git.exe" version
git version 2.25.1.windows.1
"C:\Program Files\Git\bin\git.exe" lfs version
git-lfs/2.10.0 (GitHub; windows amd64; go 1.12.7; git a526ba6b)
"C:\Program Files\Git\bin\git.exe" config --local --get remote.origin.url
##[error]fatal: --local can only be used inside a git repository
Deleting the contents of 'd:\a\repo\repo'
...

Looking at the code it seems like this wipes everything so the benefit of caching is lost.

@jcornaz
Copy link

jcornaz commented May 28, 2020

Thanks @dabo248 for the solution.

However, it is quite verbose for something that should be (In my opinion) the default behavior.

Due the github's policy to bill for git lfs download bandwidth, git lfs is not usable with github actions in practice unless cached.

So it would make sense that the option lfs: true of actions/checkout caches the LFS data by default. Wouldn't it?

@ericsciple, can you re-open the issue?

@samesfahani-tuplehealth

@dabo248 Thanks for the solution, but I was curious to know if that key is even valid? From the documentation, the key cannot be a directory so your key is being interpreted as a string; the cache would not get invalidated upon new additions to the lfs directory. Correct me if I'm wrong?

@jcornaz
Copy link

jcornaz commented Jul 13, 2020

but I was curious to know if that key is even valid

you're right @samesfahani-tuplehealth. Using .git/lfs for the key is not a good idea and will cause the cache to be useless as soon as the large files are changed.

But before calling git lfs pull, the files will be there as tiny text files containing a hash. And we can build a key based on that tiny text files.

Here's an example:

- name: Checkout repository
  uses: actions/checkout@v2

- name: Cache git lfs
  uses: actions/[email protected]
  with:
    path: .git/lfs
    key: ${{ hashFiles('**/*.zip') }} # Adapt to target the type of the files committed with git lfs

- name: Pull lfs data, if not cached
  run: git lfs pull

@samesfahani-tuplehealth
Copy link

samesfahani-tuplehealth commented Jul 13, 2020

@jcornaz That's pretty much what I came to understand as well. However, even with that approach, if you have more than just zip files or you no longer want zip files to cache, whatever the use case, then you would need to update your CI file. How about this:

- name: Checkout code
  uses: actions/checkout@v2

- name: Create LFS file list
  run: git lfs ls-files -l | cut -d' ' -f1 | sort > .lfs-assets-id

- name: Restore LFS cache
  uses: actions/cache@v2
  id: lfs-cache
  with:
    path: .git/lfs
    key: ${{ runner.os }}-lfs-${{ hashFiles('.lfs-assets-id') }}-v1

- name: Git LFS Pull
  run: git lfs pull

Source: https://www.develer.com/en/avoiding-git-lfs-bandiwdth-waste-with-github-and-circleci/

The author's method was meant for CircleCI, but the same concept still stands; we create a file that has all the hashes tracked within LFS and we run hashFiles on that. Any time a file is added or removed from LFS, this file should get invalidated. I've also added a -v1 to the end in case you ever want to invalidate the cache manually, but you shouldn't need to.

@dabo248
Copy link

dabo248 commented Jul 14, 2020

@samesfahani-tuplehealth You're absolutely right, the static key is not useful. Setting a hash as the key is the way to go!

alvarobartt pushed a commit to alvarobartt/serving-pytorch-models that referenced this issue Nov 27, 2020
StanczakDominik added a commit to StanczakDominik/PlasmaPy-NEI that referenced this issue Jan 13, 2021
namurphy added a commit to PlasmaPy/PlasmaPy-NEI that referenced this issue Jan 19, 2021
* Copy over testing workflow from PlasmaPy

* Remove tentatively added conda python package thing

* Remove Python 3.6 tests from GitHub Actions file

* Copy over tox.ini from PlasmaPy

* Remove Python 3.6 stuff from tox.ini

* Update copyright years

* Replace old pytest config with current one

The previous one referred to functionality of pytest-doctestplus,
astropy's testing plugin, which we've gotten rid of in plasmapy proper
a while ago. There was an addopts clause that added the unrecognized
flag.

* Remove obsolete azure-pipelines.yml

Fully replaced by github actions now.

* Ensure we're testing the, uh, right package.

* Add a step to pull files from Git LFS

as per actions/checkout#165 (comment)

* Typo fix

* Try to fix broken yml file

* Typo fix

* Quickly ensure docs tox env has access to sphinx

this is done through selecting all the extras, one of which has sphinx
in it. Hackish.

* Remove circleci

Co-authored-by: Nick Murphy <[email protected]>
lacava added a commit to EpistasisLab/pmlb that referenced this issue Apr 19, 2021
@jahead
Copy link

jahead commented Jun 22, 2021

Is it possible for us to get this added into the checkout action itself as a flag?

@ericsciple can we re-open this iisue?

We were just hit by this with git lfs burning through our bandwidth in one day.
It's unexpected and just leads to a poor user experience.

lmnotran added a commit to SiliconLabs/ot-efr32 that referenced this issue Jun 22, 2022
We've used ~5.3TB of bandwidth in the last month. This is because every
GitHub action is pulling the Git LFS objects unconditionally and with `lfs: true`
doesn't cache pulled LFS objects by default.

The implementation is from: actions/checkout#165
jwhui pushed a commit to openthread/ot-efr32 that referenced this issue Jun 24, 2022
We've used ~5.3TB of bandwidth in the last month. This is because
every GitHub action is pulling the Git LFS objects unconditionally and
with `lfs: true` doesn't cache pulled LFS objects by default.

The implementation is from: actions/checkout#165
bertoldi-silabs pushed a commit to SiliconLabs/ot-efr32 that referenced this issue Oct 12, 2022
We've used ~5.3TB of bandwidth in the last month. This is because
every GitHub action is pulling the Git LFS objects unconditionally and
with `lfs: true` doesn't cache pulled LFS objects by default.

The implementation is from: actions/checkout#165
@ModischFabrications
Copy link

ModischFabrications commented Oct 27, 2022

Any news or reconsiderations on reopening this issue @ericsciple? I just fell into the same trap and am forced to rely on the nschloe solution. Counting LFS usage within GitHub is unusual in itself, but it should at least be more difficult to do so.

ModischFabrications added a commit to ModischFabrications/CutSolverFrontend that referenced this issue Oct 27, 2022
atharva-2001 added a commit to atharva-2001/tardis that referenced this issue Feb 20, 2023
andrewfullard pushed a commit to tardis-sn/tardis that referenced this issue Feb 20, 2023
* Cache LFS objects (actions/checkout#165)

* Delete steps to see file size

Delete commented step which used to download LFS objects using the bash script

* Add lfs:false flag

* Add my username to .mailmap

* Cache hit and allow pytest to run all tests

* Do git lfs checkout if the cache key is found
epassaro added a commit to epassaro/tardis that referenced this issue Mar 21, 2023
* Adding rate matrix index (tardis-sn#2132)

* adding NlteIndexHelper class

* adding nlte_ionization_species to io/schemas/plasma.yml

* adding nlte_excitation_species to io/schemas/plasma.yml

* added nlte_ionization_species in assemble_plasma

* fixed the issue with getting nlte_ionization_species from config

* fixed transforming string from config to tuple for nlte ionization species

* ran black

* attempt of writing tests 1

* ran black

* test attempt 2

* black on test_hdf_plasma

* experimenting with revert

* attempt 3

* reverted tardis/plasma/tests/test_complete_plasmas.py

* reverted tardis/plasma/tests/test_hdf_plasma.py

* adding cofig for nlte

* adding fixture for tardis_cofig_verysimple_nlte

* added test_plasma_nlte_section_config

* ran black

* fixed the issue with the test, ran it locally

* ran black on necessary files

* updated description in schemas

* switching from nlte_ionization to be used from self

* ran black

* changed nlte_properties_new to nlte_solver_properties

* adding nlte_rate_equation_matrix.py

* reverting to previous commit

* Restructuring NumbaModel (tardis-sn#2136)

* Added NumbaRadial1DGeometry

* Added to_numba function

* Fixed incorrect naming of numba spec

* Updated docstring

* Updated to_numba docstring

* Updated docstring

* Updated numba model to contain only time explosion

* updated docstring formatting error

* Moved geometry to new geometry sub package

* Updated all functions where numba model was used

* Updated to specify units

* converting time_explosion to seconds specifically

* Added numba geometry fixture

* Updated tests to work with numba geometry

* Updated formatting

* Fixed issue with geometry initialization

* Fixed formatting

* Add missing cell to download atom data (tardis-sn#2146)

* Possible fix for prerelease workflow (tardis-sn#2127)

* Possible fix for prerelease workflow

* Fix double @ symbol

* Changed to v2

* Pin setuptools_scm to v6 (tardis-sn#2147)

* Pin setuptools_scm to v6

* Fix typo

* Adding nlte_rate_equation_solver.py (tardis-sn#2140)

* added nlte_rate_equation_matrix

* adding the rate_equation_matrix

* moved rate_matrix into a class NLTERateEquationSolver

* fixed a bug in rate_matrix_index, added an example for checking if rate matrix works properly

* changed the initial electron density for now

* added tests, fixed a small bug

* added some doctrings

* fixed a typo

* added some doctrings

* added some doctrings part 2

* added some docstrings part 3

* removed unnecessary import

* Update tardis/plasma/properties/nlte_rate_equation_solver.py

Co-authored-by: Christian Vogl <[email protected]>

* changed how atomic number is created from rate_matrix_index

* changed call of a function

* got rid of unnecessary if statement in set_nlte_ion_rate

* renamed last_row to charge_conservation_row

* switched 0, 1 to atomic_number and ion_number to make it more readable

* swtihed from rates to coefficients

* changed the matrix set up to only keep necessary row for nlte_ion

* ran black

* fixing some doctrings

* swtiched from using numbers to index names

* switched the return statement to NotImplementedError

* changed groupby from 0, 1 to atomic number, ion number

* fixed an issue in the test

* ran black

Co-authored-by: Christian Vogl <[email protected]>

* Set specific qgrid feedstock version (tardis-sn#2152)

* Pre-release 2022.11.17 (tardis-sn#2155)

Automated changes for pre-release 2022.11.17

* Post-release 2022.11.17 (tardis-sn#2156)

Automated changes for post-release 2022.11.17

* Pre-release 2022.11.20 (tardis-sn#2160)

Automated changes for pre-release 2022.11.20

* Fix relativistic packet initialization (tardis-sn#2159)

* Fix setting of full relativity flag

* Add relativistic packet source

Co-authored-by: Stuart Sim <[email protected]>

* Fix relativistic vpackets on inner boundary

Co-authored-by: Stuart Sim <[email protected]>

* Fixing typos for nlte ion (tardis-sn#2154)

* fixing typos on rate_matrix

* changed the np arange to use 0 instead of 0.0

* NLTE jacobian matrix (tardis-sn#2158)

* added jacobian matrix and deriv block functions

* added doctrings

* ran black

* switched from df to array

* added an initial test

* ran black

* reorganized the tests

* added doctrings

* added todo comments for future work

* ran black

* Kilonova missing zeta (tardis-sn#2150)

* solved KN ZetaData issue

* KN missing Zeta fixed

* mailmap changed

* Pre-release 2022.12.11 (tardis-sn#2172)

Automated changes for pre-release 2022.12.11

* Post-release 2022.12.12 (tardis-sn#2173)

Automated changes for post-release 2022.12.12

* Add missing __init__.py files to transport and geometry subpackages (tardis-sn#2170)

* Add missing __init__.py files to transport subpackage

* Add missing __init__.py file to geometry subpackage

* Pre-release 2022.12.25 (tardis-sn#2180)

Automated changes for pre-release 2022.12.25

* Post-release 2022.12.26 (tardis-sn#2182)

Automated changes for post-release 2022.12.26

* Pre-release 2023.01.08 (tardis-sn#2186)

Automated changes for pre-release 2023.01.08

* Post-release 2023.01.11 (tardis-sn#2188)

Automated changes for post-release 2023.01.11

* Downloading nlte_atom_data in ref data (tardis-sn#2187)

adding nlte_atom_data to be downloaded

* Adding nlte solver (tardis-sn#2171)

* adding solver part in the calculate method

* adding the objective function

* added the solution vector method

* final touches for the solver

* tests, attempt 1

* ran black

* fixed the issue with tests

* ran black

* added missing doctrings in the nlte_rate_equation_solver.py

* added doctrings to tests

* changed the test to explicitly calculate the lte solution ion number densities.

* ran black

* added the atom data file to download_reference_data.sh

* fixed download_reference_data.sh

* Revert "added the atom data file to download_reference_data.sh"

This reverts commit 58ce29e.

* got rid of index i, kept only shell

* switched DataFrame to pandas.DataFrame in the docstrings

* docstring bug fix

* got rid of the deepcopy of nlte atomic data file

* changed the number of shells to 5

* Update tardis/io/tests/data/tardis_configv1_nlte.yml

Co-authored-by: Christian Vogl <[email protected]>

* Update tardis/plasma/properties/nlte_rate_equation_solver.py

Co-authored-by: Christian Vogl <[email protected]>

* Update tardis/io/tests/data/tardis_configv1_nlte.yml

Co-authored-by: Christian Vogl <[email protected]>

* Update tardis/io/tests/data/tardis_configv1_nlte.yml

Co-authored-by: Christian Vogl <[email protected]>

* ran black

* Update tardis/plasma/properties/nlte_rate_equation_solver.py

Co-authored-by: Christian Vogl <[email protected]>

* Update tardis/plasma/tests/test_nlte_solver.py

Co-authored-by: Christian Vogl <[email protected]>

* restructured a summary in a docstring for the solution_vector_block

* ran black

* added the test for w=0 case

* Update tardis/plasma/properties/nlte_rate_equation_solver.py

Co-authored-by: Christian Vogl <[email protected]>

* Update tardis/plasma/properties/nlte_rate_equation_solver.py

Co-authored-by: Christian Vogl <[email protected]>

* Update tardis/plasma/tests/test_nlte_solver.py

Co-authored-by: Christian Vogl <[email protected]>

* fixed an issue in a docsting

* fixed an issue in a docsting

* removed a TODO comment

Co-authored-by: Christian Vogl <[email protected]>

* Rename T variables to temperature  (tardis-sn#2185)

* fixes tardis-sn#1600

* fixes tardis-sn#1600

* Update tardis/montecarlo/packet_source.py

Co-authored-by: Atharva Arya <[email protected]>

* fixes codestyle

* improve codestyle

* rename the function

* Update packet_source.py

* improve codequality

* add commit

* Update packet_source.py

Co-authored-by: Atharva Arya <[email protected]>

* Fixing test_store_runner_to_hdf (tardis-sn#2198)

* checking if the object has the attribute decode

* fixed another decode issue

* Reading nlte_excitation_species from config (tardis-sn#2195)

* implementing nlte_excitation

* ran black

* fixed a typo

* got rid of unnecessary lines

* rewrote tests

* made a change on assigning the config values to plasma properties

* fixed the tests

* Update tardis/io/tests/test_config_reader.py

Co-authored-by: Christian Vogl <[email protected]>

---------

Co-authored-by: Christian Vogl <[email protected]>

* Fix config reader tests (tardis-sn#2200)

Fix config tests

* Pre-release 2023.02.05 (tardis-sn#2205)

Automated changes for pre-release 2023.02.05

* Post-release 2023.02.06 (tardis-sn#2206)

Automated changes for post-release 2023.02.06

* Add version tag to simulation objects (tardis-sn#2190)

* Add test for versioning info

* Add version string to simulation objects

* Add name to mailmap

* Add another alias to mailmap

* Pre-release 2023.02.12 (tardis-sn#2208)

Automated changes for pre-release 2023.02.12

* Post-release 2023.02.16 (tardis-sn#2210)

Automated changes for post-release 2023.02.16

* Pre-release 2023.02.19 (tardis-sn#2211)

Automated changes for pre-release 2023.02.19

* Cache LFS objects in the tests workflow (tardis-sn#2194)

* Cache LFS objects (actions/checkout#165)

* Delete steps to see file size

Delete commented step which used to download LFS objects using the bash script

* Add lfs:false flag

* Add my username to .mailmap

* Cache hit and allow pytest to run all tests

* Do git lfs checkout if the cache key is found

* Post-release 2023.02.20 (tardis-sn#2213)

Automated changes for post-release 2023.02.20

* Correct the description of 'no_of_packets' in Monte Carlo Configuration (tardis-sn#2214)

* Correct the description of 'no_of_packets' in Monte Carlo Configuration

* Added a simple description

* Pre-release 2023.02.26 (tardis-sn#2222)

Automated changes for pre-release 2023.02.26

* Post-release 2023.02.27 (tardis-sn#2223)

Automated changes for post-release 2023.02.27

* Add docstrings to subpackages (tardis-sn#2204)

* add docstring to subpackage grid

* Add docstrings to subpackages

* Change comments to docstrings

* add docstrings to subpackages

* rename gui/__init__.py

* Reformat using black

* add contributor infos to mailmap file

* add extended summary to docstrings

* Add extended summary to plasma/properties

* Update tardis/io/parsers/__init__.py

Co-authored-by: Sona Chitchyan <[email protected]>

---------

Co-authored-by: kim <[email protected]>
Co-authored-by: Sona Chitchyan <[email protected]>

* Docs Fix: Download Atom Data in rpacket_tracking.ipynb (tardis-sn#2236)

Download atom data from the function

* Fix for automerge (tardis-sn#2242)

Use gh CLI to approve pull requests; use new token

* Fix for release dates (tardis-sn#2243)

* MonteCarlo packet progress bar completes to 100% (tardis-sn#2237)

* Refresh packet progress bar after every iteration

* Udpdated .mailmap

* Reformatted with black

* Fix team reviewers on workflows (tardis-sn#2246)

* Pre-release 2023.03.20 (tardis-sn#2247)

Automated changes for pre-release 2023.03.20

* Post-release 2023.03.20 (tardis-sn#2248)

Automated changes for post-release 2023.03.20

* Create new workflow

* Patch for arepo tests

* Restore deleted parameter

* Black format; Add missing skip reason

* Fix typo in comment

* Add more comments

* Create slash command dispatcher

* Changes to dispatcher

* Black format

---------

Co-authored-by: Sona Chitchyan <[email protected]>
Co-authored-by: Satwik Kambham <[email protected]>
Co-authored-by: Andrew <[email protected]>
Co-authored-by: Christian Vogl <[email protected]>
Co-authored-by: tardis-bot <[email protected]>
Co-authored-by: Stuart Sim <[email protected]>
Co-authored-by: gleck97 <[email protected]>
Co-authored-by: ABHISHEK PATIDAR <[email protected]>
Co-authored-by: Atharva Arya <[email protected]>
Co-authored-by: Kim <[email protected]>
Co-authored-by: Abhishek Patidar <[email protected]>
Co-authored-by: Le Truong <[email protected]>
Co-authored-by: kim <[email protected]>
Co-authored-by: Shreyas Singh <[email protected]>
@DenizUgur
Copy link

I wanted to provide my way of achieving this. Also supports change of individual files. Thanks to @nschloe for providing a way to generate a key for LFS files. If you want to verify if it only downloads the new/updated files add GIT_TRACE=1 as environment variable on ifs pull step.

lfs-cache:
  runs-on: ubuntu-latest
  steps:
    - name: Checkout
      uses: actions/checkout@v3

    - name: Create LFS file list
      run: git lfs ls-files --long | cut -d ' ' -f1 | sort > .lfs-assets-id

    - name: LFS Cache
      uses: actions/cache@v3
      with:
        path: .git/lfs/objects
        key: ${{ runner.os }}-lfs-${{ hashFiles('.lfs-assets-id') }}
        restore-keys: |
          ${{ runner.os }}-lfs-

    - name: Git LFS Pull
      run: git lfs pull

@sandstrom
Copy link

sandstrom commented Jan 11, 2024

@ericsciple Would you be willing to reopen this issue?

It's still a problem for us, and I'd suggest the following:

  • Don't count LFS usage within the Github network. If people are using LFS for binaries instead of putting huge files into Git, that's a net win for your infrastructure costs, so you'd want to encourage it.
  • Add proper support for LFS caching to this action.

Burgestrand added a commit to Burgestrand/milo-web-preview that referenced this issue Mar 5, 2024
albertziegenhagel added a commit to albertziegenhagel/snail-server that referenced this issue May 1, 2024
This is to reduce the required LFS bandwidth of the GitHub actions.
The solution is based on
actions/checkout#165 (comment)
albertziegenhagel added a commit to albertziegenhagel/snail-server that referenced this issue May 1, 2024
This is to reduce the required LFS bandwidth of the GitHub actions.
The solution is based on
actions/checkout#165 (comment)
fyears added a commit to remotely-save/remotely-save that referenced this issue May 7, 2024
@sandstrom
Copy link

sandstrom commented Jun 7, 2024

Friendly ping @cory-miller 😄

(see #165 (comment))

nocarryr added a commit to nocarryr/lupy that referenced this issue Jun 29, 2024
Remove wav files and replace with compressed npz

Apparently github has a monthly bandwidth limit for Git LFS.
That's reasonable and fair.
What I find unreasonable is that the bandwidth usage by
GH Actions workflows is also counted against the monthly
quota:
https://docs.github.com/en/repositories/working-with-files/managing-large-files/about-storage-and-bandwidth-usage#tracking-storage-and-bandwidth-use

After three workflow runs (4 jobs each), this limit was
reached.

Apparently I'm not the first to fall for this:
actions/checkout#165
actions/checkout#834
https://github.com/orgs/community/discussions/26775

To name a few
@mwestphal
Copy link

Here is action that set this up: https://github.com/marketplace/actions/copy-lfs-data

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests