Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems with caching on self-hosted runners #52

Closed
MPV opened this issue Jun 4, 2020 · 26 comments · Fixed by #59
Closed

Problems with caching on self-hosted runners #52

MPV opened this issue Jun 4, 2020 · 26 comments · Fixed by #59

Comments

@MPV
Copy link
Contributor

MPV commented Jun 4, 2020

As mentioned in #38 (comment), Scala Steward seems to do some kind of caching (that maybe isn't useful for ephemeral GitHub Action runners), opening this issue for that.

@MPV
Copy link
Contributor Author

MPV commented Jun 4, 2020

For example, here's some caching files that were created in one of my runners:

runner@runner-pod-h79x4:~$ find | grep workspace
./scala-steward/workspace
./scala-steward/workspace/repos
./scala-steward/workspace/repos/my-org
./scala-steward/workspace/store
./scala-steward/workspace/store/refresh_error
./scala-steward/workspace/store/refresh_error/v1
./scala-steward/workspace/store/refresh_error/v1/github
./scala-steward/workspace/store/refresh_error/v1/github/my-org
./scala-steward/workspace/store/refresh_error/v1/github/my-org/my-repo
./scala-steward/workspace/store/refresh_error/v1/github/my-org/my-repo/refresh_error.json

After deleting those, my Scala Steward action doesn't fail anymore with issues like this:

Launching org.scala-steward:scala-steward-core_2.13:0.5.0-385-e5e4789c-SNAPSHOT
  2020-06-01 10:02:06,285 INFO   
    ____            _         ____  _                             _
   / ___|  ___ __ _| | __ _  / ___|| |_ _____      ____ _ _ __ __| |
   \___ \ / __/ _` | |/ _` | \___ \| __/ _ \ \ /\ / / _` | '__/ _` |
    ___) | (_| (_| | | (_| |  ___) | ||  __/\ V  V / (_| | | | (_| |
   |____/ \___\__,_|_|\__,_| |____/ \__\___| \_/\_/ \__,_|_|  \__,_|
   v0.5.0-385-e5e4789c-SNAPSHOT
   
  2020-06-01 10:02:06,290 INFO  Run self checks
  2020-06-01 10:02:07,163 INFO  Add global sbt plugins
  2020-06-01 10:02:07,177 INFO  Clean workspace /home/runner/scala-steward/workspace
  2020-06-01 10:02:07,196 INFO  ──────────── Steward my-org/my-repo ────────────
  2020-06-01 10:02:07,199 INFO  Check cache of my-org/my-repo
  2020-06-01 10:02:07,230 INFO  Skipping due to previous error
  2020-06-01 10:02:07,237 INFO  ──────────── Total time: Steward my-org/my-repo: 40ms ────────────
  2020-06-01 10:02:07,239 INFO  ──────────── Total time: run: 957ms ────────────

@MPV
Copy link
Contributor Author

MPV commented Jun 4, 2020

A possible workaround might be to add a GHA step that removes all/some of those files...?

@bpg
Copy link
Contributor

bpg commented Jun 4, 2020

Caching is an interesting topic that I'd like to discuss a bit more. I think there are some cases where you'd like to utilize results from previous action runs, for example, a big project with lots of dependencies and/or lots of open PRs from Scala Steward, or an action that handles updates for multiple projects.

Running every time from scratch will eat up into your action minutes allotment. Also, Scala Steward has some features that used the persisted (in the wordspace) state, like scanning frequency, et. al. I was really thinking into adding a caching step to my own projects just to deal with all of this.

So, if we're talking about adding a behaviour in the action to wipe out the workspace before calling SS, I would suggest to

  • add a configuration input to enable/disable this behaviour
  • have it enabled by default (as I assume the majority of action users are small single project repos)
  • explain all of this in documentation and add configuration examples with explicit workspace caching

Thoughts?

@MPV
Copy link
Contributor Author

MPV commented Jun 4, 2020

The current/built-in caching usage of Scala Steward is only per runner and that cache might not be as helpful as compared to how one might otherwise run Scala Steward standalone (towards multiple repos). This in contrast with this action, where the default is to just run towards the same repo.

I’ll see if I can troubleshoot what issues I’ve gotten that have been cached (preventing a rerun to succeed). Essentially whether any caching is performed could be less of an issue if I could rerun and get it to retry creating PRs etc (when it fails on an that is resolved by just retrying).

@MPV
Copy link
Contributor Author

MPV commented Jun 5, 2020

Leaving a note to remember:
If I (or anyone else) runs into this again, we should probable check the contents of that file (from output above):

./scala-steward/workspace/store/refresh_error/v1/github/my-org/my-repo/refresh_error.json

Another workaround could be to just remove that file (until we've found a suitable way forward regarding caching per runner/repo/etc).

@MPV
Copy link
Contributor Author

MPV commented Jun 5, 2020

refresh_error aside...

Do we have any suggestions on how one might do caching using the actions/cache action?

(to cache Scala Steward workspace files between different runners/runs for the same repo)

Had you given this a try @bpg?

@MPV
Copy link
Contributor Author

MPV commented Jun 5, 2020

To share, here's what I'm trying at the moment:

diff --git a/.github/workflows/scala-steward.yml b/.github/workflows/scala-steward.yml
index 9c79225..d8cbe8f 100644
--- a/.github/workflows/scala-steward.yml
+++ b/.github/workflows/scala-steward.yml
@@ -1,25 +1,29 @@
 name: Scala Steward

 on:
   # This workflow will launch at 00:00 every Sunday
   schedule:
     - cron: '0 0 * * 0'
   repository_dispatch:
     types: [scala-steward]

 jobs:
   scala-steward:
     runs-on: self-hosted
     name: Launch Scala Steward
     steps:
+      - run: rm -rf ~/scala-steward
+      - uses: actions/cache@v2
+        with:
+          path: ~/scala-steward
+          key: ${{ runner.os }}
       - name: Setup Java and Scala
         uses: olafurpg/setup-scala@v5
       - name: Launch Scala Steward
         uses: scala-steward-org/scala-steward-action@v2
         with:
           github-token: ${{ secrets.ORG_LEVEL_GITHUB_TOKEN }}
           author-email: [email protected]
           author-name: org-level-robot-user
-      - run: rm -rf ~/scala-steward

I get these results when running the above:

Before scala-steward-action:

Run actions/cache@v2
  with:
    path: ~/scala-steward
    key: Linux
Cache not found for input keys: Linux

After:

Post Run actions/cache@v2
Cache saved successfully
Post job cleanup.
/bin/tar -z -cf cache.tgz -P -C /home/runner/_work/my-repo/my-repo --files-from manifest.txt
Cache saved successfully

🎉

And in the next job:

Run actions/cache@v2
Cache Size: ~0 MB (5867 B)
/bin/tar -z -xf /home/runner/_work/_temp/1a7847a1-58fa-482c-9b00-0bcf25638667/cache.tgz -P -C /home/runner/_work/my-repo/my-repo
Cache restored from key: Linux
Post Run actions/cache@v2
Post job cleanup.
Cache hit occurred on the primary key Linux, not saving cache.

😭

@MPV
Copy link
Contributor Author

MPV commented Jun 5, 2020

Tried with caching based on *.sbt files:

diff --git a/.github/workflows/scala-steward.yml b/.github/workflows/scala-steward.yml
index d8cbe8f..43c2249 100644
--- a/.github/workflows/scala-steward.yml
+++ b/.github/workflows/scala-steward.yml
@@ -18,7 +18,7 @@ jobs:
       - uses: actions/cache@v2
         with:
           path: ~/scala-steward
-          key: ${{ runner.os }}
+          key: ${{ runner.os }}-sbt-${{ hashFiles('**/*.sbt') }}
       - name: Setup Java and Scala
         uses: olafurpg/setup-scala@v5
       - name: Launch Scala Steward

Unfortunately, that just resolved into Linux-sbt- (no hash), since I had forgotten to checkout the repo (have any actions/checkout actions).

I added that now, and get:

Run actions/cache@v2
Cache Size: ~0 MB (8052 B)
/bin/tar -z -xf /home/runner/_work/_temp/4d38ab15-b49a-4a1c-ab30-7ce433cc8161/cache.tgz -P -C /home/runner/_work/sbt-services/sbt-services
Cache restored from key: Linux-sbt-b19d6d84297c3469e702a9fdf501c41235ffe8bd4b62aed495a8ca62cd1b8fd0
Post Run actions/cache@v2
Post job cleanup.
Cache hit occurred on the primary key Linux-sbt-b19d6d84297c3469e702a9fdf501c41235ffe8bd4b62aed495a8ca62cd1b8fd0, not saving cache.

However, one problem here is that even though a dependency might have changed, the cache isn't updated. Changes to my *.sbt files needs to happen before the cache is saved into again — note the not saving cache above.

@MPV
Copy link
Contributor Author

MPV commented Jun 5, 2020

Also, note how the cache is expanded into my workspace for GitHub Actions:
(-C /home/runner/_work/my-repo/my-repo and not ~/scala-steward)

Run actions/cache@v2
  with:
    path: ~/scala-steward
    key: Linux-sbt-b19d6d84297c3469e702a9fdf501c41235ffe8bd4b62aed495a8ca62cd1b8fd0
Cache Size: ~0 MB (8052 B)
/bin/tar -z -xf /home/runner/_work/_temp/17139ec5-0305-4040-9307-8e15c1de1709/cache.tgz -P -C /home/runner/_work/my-repo/my-repo
Cache restored from key: Linux-sbt-b19d6d84297c3469e702a9fdf501c41235ffe8bd4b62aed495a8ca62cd1b8fd0

@MPV
Copy link
Contributor Author

MPV commented Jun 5, 2020

Maybe we should revisit the changes made in #42 (where we moved the workspace into ~/scala-workspace)...

See:

const stewarddir = `${os.homedir()}/scala-steward`

...and maybe move the Scala Steward workspace into the GitHub Actions workspace for the repo instead @alejandrohdezma (so caching can be done per repo, and not per runner)?

@alejandrohdezma
Copy link
Member

Hey! So sorry, I don't know why but I totally miss this issue/conversation. Thank you both so much for all this investigation, this is indeed really interesting!

I did some testing with cache on one of the alpha versions using toolkit/cache but I remember I found some errors and postpone it.

One first thing we could do is using the post action to cleanup the directory so people don't find this problem on self runners.

@alejandrohdezma
Copy link
Member

alejandrohdezma commented Jun 5, 2020

Also, @fthomas could you tell us a bit about which things should be cached from the Scala Steward workspace? That way I can do some testing with it on a different branch :)

@bpg
Copy link
Contributor

bpg commented Jun 5, 2020

Do we have any suggestions on how one might do caching using the actions/cache action?

(to cache Scala Steward workspace files between different runners/runs for the same repo)

Had you given this a try @bpg?

I haven't got far, will try again on the weekend.

In my other project I have a CircleCI Scala Steward custom job running on multiple projects, and caching is a huge help there, taking build time from 6..8 minutes to tens of seconds. So I think it is important to have an option to keep SS cache at the end, and reuse it next time, one way or another.

@fthomas
Copy link
Member

fthomas commented Jun 5, 2020

I would cache everything of the Scala Steward workspace except the store/refresh_error directory. The refresh_error store is used to temporarily ignore repos whose build can't be loaded by Scala Steward. This is useful when Scala Steward is working on a lot of repos and should not be slowed down by a few repos that have broken builds. I guess it is less useful if Scala Steward runs as GH action where it isn't desirable to ignore broken builds because the operator of the action is able to fix the builds or the action.

@alejandrohdezma
Copy link
Member

@MPV I have created #57 to address the problem of files created by this action being left behind. Let me know if that would be enough :)

@alejandrohdezma
Copy link
Member

Thank you very much @fthomas for that explanation! 😸

I'll try to address that in a new PR today

@MPV
Copy link
Contributor Author

MPV commented Jun 6, 2020

Or, could we change it like this?

  1. The action (or Steward) has it's workspace within the repo workspace.
  2. The action (or Steward) cleans any refresh_error files/directories after each run.

Then we could benefit from cross-runners caching (1 + actions/cache) and still not get blocked checks due to refresh error locks (2).

@alejandrohdezma
Copy link
Member

Either way, I think we should remove all the files created by the action since it is what's encouraged by Github so it won't matter if the workspace is in the home directory or the repo workspace and it will be fixed by using action/cache in any case

@MPV
Copy link
Contributor Author

MPV commented Jun 7, 2020

Either way, I think we should remove all the files created by the action since it is what's encouraged by Github so it won't matter if the workspace is in the home directory or the repo workspace and it will be fixed by using action/cache in any case

Could you share a link/example to “encouraged by GitHub”. I searched but couldn’t find that, for example here: https://help.github.com/en/actions/reference/virtual-environments-for-github-hosted-runners#filesystems-on-github-hosted-runners

Also, were you able to use action/cache to cache anything outside the repo workspace?

@alejandrohdezma
Copy link
Member

Hey @MPV, @bpg #59 should solve the cache problem :)

@fthomas I've noticed that Scala Steward always removes the repos directory inside the workspace upon start. Does it make sense then to cache the workspace/repos directory?

@fthomas
Copy link
Member

fthomas commented Jun 8, 2020

Right, the repos directory does not need to be cached.

@alejandrohdezma
Copy link
Member

okey dokey! Thanks @fthomas

@MPV
Copy link
Contributor Author

MPV commented Jun 8, 2020

@alejandrohdezma To add another viewpoint to this:
If we enable caching as in #59, how long will dependencies be cached for?

If dependencies are cached for X days, then it won't be much value in running this action more often than every X days, no?

I have a use-case where we'd like to try using Scala Steward to create fast and semi-automatic bumping of downstream versions from an upstream repo.

Would setting a TTL for the cache be a suitable solution for this?

For example, I could have:

  • a weekly workflow with longer TTL for generic dependencies
  • a repository_dispatch workflow with longer TTL for being triggered by changes by upstream repos

Alternatively, if there were a setting for Scala Steward itself (@fthomas) to drop cache for a specific dependency and force-update just that one (similar to what I suggested in scala-steward-org/scala-steward#1470).

@alejandrohdezma
Copy link
Member

@alejandrohdezma To add another viewpoint to this:
If we enable caching as in #59, how long will dependencies be cached for?

If dependencies are cached for X days, then it won't be much value in running this action more often than every X days, no?

Caches are stored for a maximum of 7 days, so yes, it won't have any effect if the action is run more often than that.

I have a use-case where we'd like to try using Scala Steward to create fast and semi-automatic bumping of downstream versions from an upstream repo.

Would setting a TTL for the cache be a suitable solution for this?

As far as I know, setting TTL for actions/cache is not allowed 😿

Alternatively, if there were a setting for Scala Steward itself (@fthomas) to drop cache for a specific dependency and force-update just that one (similar to what I suggested in fthomas/scala-steward#1470).

If this is enabled in scala-steward it shouldn't be to hard to add it to the action :)

@fthomas
Copy link
Member

fthomas commented Jun 8, 2020

Scala Steward already has a --cache-ttl option that controls how often it checks for new versions. The default value is 2hours.

@MPV
Copy link
Contributor Author

MPV commented Jun 9, 2020

Scala Steward already has a --cache-ttl option that controls how often it checks for new versions. The default value is 2hours.

Opened #62 to be able to set custom cache TTL.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants