-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Break cache if Gemfile.lock
or .rubocop.yml
change.
#300
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@joshuapinter I know you didn't ask me for a review but thought I'd leave a few comments - the changes look pretty good so far! I would add some unit tests to cli_spec.rb
or cache_spec.rb
. Right now we're not covering busting the cache very well but we could be and test the scenarios that should bust it or at least the new ones. Nice work.
lib/erb_lint/cache.rb
Outdated
@@ -76,7 +78,7 @@ def checksum(filename, file_content) | |||
mode = File.stat(filename).mode | |||
|
|||
digester.update( | |||
"#{mode}#{config.to_hash}#{ERBLint::VERSION}#{file_content}" | |||
"#{mode}#{config.to_hash}#{ERBLint::VERSION}#{@rubocop_config}#{@gemfile_lock}#{file_content}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the amount of variables in the cache key now and no actual string content, should we break it out line by line to make it a little more clear vs the current string interpolation with something like:
irb(main):023:0> str_element_1 = "hello"
=> "hello"
irb(main):024:0> str_element_2 = "hi"
=> "hi"
irb(main):025:0> rubocop_config = "some_config"
=> "some_config"
irb(main):026:0] %w[
irb(main):027:0] str_element_1
irb(main):028:0] str_element_2
irb(main):029:0] rubocop_config
irb(main):030:0> ].join
=> "str_element_1str_element_2rubocop_config"
Just a style thing but might make this more readable especially if additional keys are added later.
@@ -9,6 +9,8 @@ def initialize(config, cache_dir = nil) | |||
@cache_dir = cache_dir || CACHE_DIRECTORY | |||
@hits = [] | |||
@new_results = [] | |||
@gemfile_lock = File.read("Gemfile.lock") if File.exist?("Gemfile.lock") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, I think this should effectively cache the File.read
for all calls of checksum
.
I didn’t ask but I was secretly hoping you would. Lol.
I’ll respond to your comments after the kids go to bed. Ha.
…On Jan 8, 2023 at 3:34 PM -0700, Zach Feldman ***@***.***>, wrote:
@zachfeldman commented on this pull request.
@joshuapinter I know you didn't ask me for a review but thought I'd leave a few comments - the changes look pretty good so far! I would add some unit tests to cli_spec.rb or cache_spec.rb. Right now we're not covering busting the cache very well but we could be and test the scenarios that should bust it or at least the new ones. Nice work.
In lib/erb_lint/cache.rb:
> @@ -76,7 +78,7 @@ def checksum(filename, file_content)
mode = File.stat(filename).mode
digester.update(
- "#{mode}#{config.to_hash}#{ERBLint::VERSION}#{file_content}"
+ ***@***.******@***.***_lock}#{file_content}"
With the amount of variables in the cache key now and no actual string content, should we break it out line by line to make it a little more clear vs the current string interpolation with something like:
irb(main):023:0> str_element_1 = "hello"
=> "hello"
irb(main):024:0> str_element_2 = "hi"
=> "hi"
irb(main):025:0> rubocop_config = "some_config"
=> "some_config"
irb(main):026:0] %w[
irb(main):027:0] str_element_1
irb(main):028:0] str_element_2
irb(main):029:0] rubocop_config
irb(main):030:0> ].join
=> "str_element_1str_element_2rubocop_config"
Just a style thing but might make this more readable especially if additional keys are added later.
In lib/erb_lint/cache.rb:
> @@ -9,6 +9,8 @@ def initialize(config, cache_dir = nil)
@cache_dir = cache_dir || CACHE_DIRECTORY
@hits = []
@new_results = []
+ @gemfile_lock = File.read("Gemfile.lock") if File.exist?("Gemfile.lock")
Nice, I think this should effectively cache the File.read for all calls of checksum.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Can I pick it up from here or are you going to get back around to it @joshuapinter? The new(ish?) cache feature brings my app’s linting w/ |
@mculp Please do! The only thing that should be needed are proper specs. We've been using this in development and in CI for the last couple months and it's been working perfectly, always breaking cache when gems or configs have been updated. Makes using it very quick most of the time but also very reliable when things change. |
ae61b11
to
2ca4dee
Compare
Hmmm, just upgraded our fork from 0.3.1 to 0.4.0 and the caching doesn't seem to be working. Every time I run
I also tried using the main repo to see if it was related to this fork but seeing it there too. If I revert to 0.3.1 on either our fork or the main gem, the cache is properly used. Anybody else seeing this? |
@joshuapinter we've been running 0.4.0 for a while now and I haven't noticed the cache not working. I just tried locally and it appears to be working. I tried moving the |
@zachfeldman Thanks for the quick response! So strange... I removed the $ erblint
====
Running ERB Lint...
Cache mode is on
Linting 628 files with 9 linters...
Cache being created for the first time, skipping prune
No errors were found in ERB files
$ erblint
====
Running ERB Lint...
Cache mode is on
Linting 628 files with 9 linters...
Cache being created for the first time, skipping prune
No errors were found in ERB files
$ erblint
====
Running ERB Lint...
Cache mode is on
Linting 628 files with 9 linters...
Cache being created for the first time, skipping prune
No errors were found in ERB files |
@@ -9,6 +9,8 @@ def initialize(config, cache_dir = nil) | |||
@cache_dir = cache_dir || CACHE_DIRECTORY | |||
@hits = [] | |||
@new_results = [] | |||
@gemfile_lock = File.read("Gemfile.lock") if File.exist?("Gemfile.lock") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this will also bust the cache whenever any project dependencies change, even if the rubocop version did not. So, while I think this will work to achieve the desired effect not reusing the cache between different rubocop versions, it will also make running the erblint slower than it could/should be by not utilizing the cache when it could.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. I was aware and okay with that tradeoff, erring on the side of caution.
Alternatively, you would have to hash the versions for rubocop
and all rubocop
related plugins in Gemfile.lock
. Doable but I didn't want to spend any more time on it.
I find this simple and effective, with the caveat that non-rubocop updates will break the cache but if you want to write up the alternative solution that would be great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @joshuapinter. Makes sense; just wanted to make sure it was known.
I agree with the tradeoffs here and that the priority should be addressing the issue over complete optimization. (I came here after hitting this same issue in our own CI workflow and being surprised by failures.)
Just for comparison / completeness, it looks like rubocop itself uses the whole Gemfile.lock in one case as a checksum, and does a plugin-aware checksum of the rubocop scour in another:
RuboCop::Server::Cache.restart_key
: https://github.com/rubocop/rubocop/blob/7fa4e5ad62c2d6e081bcf7352f172282f3285de9/lib/rubocop/server/cache.rb#L51RuboCop::ResultCache#rubocop_checksum
: https://github.com/rubocop/rubocop/blob/7fa4e5ad62c2d6e081bcf7352f172282f3285de9/lib/rubocop/result_cache.rb#L174
@@ -9,6 +9,8 @@ def initialize(config, cache_dir = nil) | |||
@cache_dir = cache_dir || CACHE_DIRECTORY | |||
@hits = [] | |||
@new_results = [] | |||
@gemfile_lock = File.read("Gemfile.lock") if File.exist?("Gemfile.lock") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @joshuapinter. Makes sense; just wanted to make sure it was known.
I agree with the tradeoffs here and that the priority should be addressing the issue over complete optimization. (I came here after hitting this same issue in our own CI workflow and being surprised by failures.)
Just for comparison / completeness, it looks like rubocop itself uses the whole Gemfile.lock in one case as a checksum, and does a plugin-aware checksum of the rubocop scour in another:
RuboCop::Server::Cache.restart_key
: https://github.com/rubocop/rubocop/blob/7fa4e5ad62c2d6e081bcf7352f172282f3285de9/lib/rubocop/server/cache.rb#L51RuboCop::ResultCache#rubocop_checksum
: https://github.com/rubocop/rubocop/blob/7fa4e5ad62c2d6e081bcf7352f172282f3285de9/lib/rubocop/result_cache.rb#L174
@@ -9,6 +9,8 @@ def initialize(config, cache_dir = nil) | |||
@cache_dir = cache_dir || CACHE_DIRECTORY | |||
@hits = [] | |||
@new_results = [] | |||
@gemfile_lock = File.read("Gemfile.lock") if File.exist?("Gemfile.lock") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know what information is available here, but is it worth doing this only if the Rubocop linter is enabled? (If that's possible, it seems like a meaningful improvement.)
@@ -9,6 +9,8 @@ def initialize(config, cache_dir = nil) | |||
@cache_dir = cache_dir || CACHE_DIRECTORY | |||
@hits = [] | |||
@new_results = [] | |||
@gemfile_lock = File.read("Gemfile.lock") if File.exist?("Gemfile.lock") | |||
@rubocop_config = File.read(".rubocop.yml") if File.exist?(".rubocop.yml") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally, if there's a way to fetch this from linters.Rubocop.rubocop_config.inherit_from
, that seems worthwhile, since it is parameterized in the first place: https://github.com/goatapp/rubocop/blob/97e4ffc8a71e9e5239a927c6a534dfc1e0da917f/manual/configuration.md?plain=1#L81
Also, I don't know how common it is, but it looks like this library does allow for URIs there also: https://github.com/goatapp/rubocop/blob/97e4ffc8a71e9e5239a927c6a534dfc1e0da917f/manual/configuration.md?plain=1#L86
Actually, given the complexity / variability there, if there's a way to access an instance of the Rubocop linter here, it may be worth removing dependence on the source of that configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, pardon the commentary. I think this is a good PR, and I don't want to let solving for all the cases get in the way of anything; just reading through the code and thinking aloud.
It looks like runner context is available in run_using_cache
, so maybe some "runner checksum" could be passed an optional arg to cache.get()
and cache.set()
? That solves for:
- through the runner, putting linters in charge of their own checksum logic
- not being coupled to / duplicating awareness of how linters are configured
I'll give this some thought over the weekend and see if I come up with anything useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put up something small for feedback here: #373
Fixes #299.
Wrote in the addition of the
Gemfile.lock
and the.rubocop.yml
files into the cache so that if either of those change, the cache will break.I noticed you were already doing this with
.erb-lint.yml
so that's great, no need to make any changes there. I just followed a similar pattern.I placed the reading of both of these files in the
Cache#initialize
method so it doesn't get run every timechecksum
is called - maybe a little more performant.Besides that, I tested this out extensively in development by disabling the pruning and making various changes to see if the caches were hit or not. Everything seemed to work great.
I attempted writing specs but got a little lost with how to make it work correctly. I couldn't see any pattern laid down with how you handle when
.erb-lint.yml
config is changed so I left that for now. Open to comment and suggestion.We're gonna use our fork in "production" to test it more thoroughly and ensure any config or Gemfile changes won't produce false negatives.
TODOs