Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added cache_dir and cache_workers options to cache results #115

Merged
merged 1 commit into from
Jul 16, 2016

Conversation

gpakosz
Copy link
Contributor

@gpakosz gpakosz commented Nov 25, 2015

💥

implemented #83

@toy
Copy link
Owner

toy commented Dec 2, 2015

Hi Gregory, sorry for the wait and thank you for contribution!
I've looked through commit, but did not yet have time to properly check it and I'll try to find time for that soon.

@gpakosz
Copy link
Contributor Author

gpakosz commented Feb 26, 2016

Hi, I rebased the branch onto the master branch.

ImagePath::Optimized.new(result, original)
return unless optimized

@cache.write(optimized, original)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would switch the order of the arguments as original works as a key

@options_by_format = Hash[workers_by_format.map do |k, v|
[k, Marshal.dump(v.map(&:inspect))]
end]
@workers_by_format = Hash[workers_by_format.map do |k, v|
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Name @workers_by_format is confusing as inside are bins and not workers

@toy
Copy link
Owner

toy commented Mar 16, 2016

Hi Gregory, I am sorry for such a long wait.

Few things to discuss:
I think it is better to create one cache file and not 2 or 3, otherwise it is not atomic, there are too many files and if running with different options the cache will be overwritten.
The name can be combined or concatenated from options and binaries digests.
Also I think it is better to use sub directories as in .git/objects instead of putting everything in one directory.

@gpakosz
Copy link
Contributor Author

gpakosz commented Mar 30, 2016

I followed every single remark you made except using FSPath because it didn't feel like an improvement to use it just to join paths and I need FileUtils.copy_file anyways.

By now, files are stored in #{@cache_dir}/"#{digest[0..1]}/#{digest[2..-1]} just like Git. Also, as suggested, there is now a single cached file per image which digest is computed by hashing the original file content, then updated with options and worker bins.

I made a separate commit. Once reviewed I suggest I squash the two commits before force pushing the branch one last time.

@gpakosz
Copy link
Contributor Author

gpakosz commented Mar 30, 2016

Well CI has errors unrelated to my changes, can you have a look?

Installing rack 1.6.4
An error occurred while installing mime-types-data (3.2016.0221), and Bundler

@gpakosz
Copy link
Contributor Author

gpakosz commented May 1, 2016

Hi Ivan, when do you think we can roll another round of review for this PR?

@toy
Copy link
Owner

toy commented May 2, 2016

Hi Gregory, thanks for poking me about this.
I've already looked at it and checked how it works, but did not manage to find time to write everything. Points to discuss:

  1. Currently an already optimised file will not be cached as write will return if optimized is not present. Simple solution would be to just write the original file to cache, but it looks like wasting disk space. Two other solutions if using the same path are: to create an empty file (then atomic creation of cache is a must) or create a symlink to self. What do you think?
  2. Is binary sha1 really better or the bin version is enough?
  3. There is no need for Marshal, as strings can be just joined together.
  4. Why FSPath:
cached = @cache_dir / digest(original)
return unless cached.exist?

instead of

cached = File.join(@cache_dir, digest(original))
return unless File.exist?(cached)

Also FileUtils.copy_file is not atomic. A method similar to ImagePath#replace, which would create a temporary file in cache directory, copy optimized file over it and rename it to cache path would make it atomic.

@gpakosz
Copy link
Contributor Author

gpakosz commented May 2, 2016

Hi Ivan,

Here are my answers:

  1. I think I would go for a symlink
  2. Using the binary SHA1 seems more robust in case a binary fails to update its version information properly. Are you concerned about performance when computing SHA1s?
  3. Marshal seems like a robust way of dumping a state to a string. Again are you concerned about the performance? In which case Marshal will be faster than manually flattening hashes / arrays and joining strings.
  4. I'll look into FSPath

@toy
Copy link
Owner

toy commented May 2, 2016

  1. Agreed.
  2. I was thinking more about two bins of same version compiled differently. So to me it seems like deciding between been a bit too precise and a bit not enough precise.
  3. I'm concerned about marshal version, but also has not a lot of purpose when there are already just strings. It is also important to ensure that the final strings to digest can't be different for identical options/bins due to order.

toy added a commit that referenced this pull request May 3, 2016
…stly to add it to the list of worker options, closes #130, related to #115
toy added a commit that referenced this pull request May 3, 2016
… it will be in the list of worker options, closes #130, related to #115
toy added a commit that referenced this pull request May 5, 2016
… it will be in the list of worker options, closes #130, related to #115
@gpakosz gpakosz force-pushed the cache branch 2 times, most recently from 84ce463 to 5524362 Compare May 5, 2016 17:30
@gpakosz
Copy link
Contributor Author

gpakosz commented May 5, 2016

I implemented the changes discussed so far.

Note that the symlink only serves the purpose of avoiding going through workers again, but optimize_image will still return nil. This is because I don't think it's a good idea to have the behavior of optimize_image depend on whether caching is enabled or not.

@@ -204,6 +204,19 @@ def wrap_regex(width)
end

op.separator nil
op.separator ' Caching:'
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move this section higher, after main options and before disabling workers

@gpakosz gpakosz force-pushed the cache branch 2 times, most recently from b9ad8f7 to fcd86a7 Compare June 21, 2016 21:47
@toy
Copy link
Owner

toy commented Jun 21, 2016

Please ignore AppVeyor failing, I will try to test how far is image_optim from working on windows.

@gpakosz
Copy link
Contributor Author

gpakosz commented Jun 21, 2016

Yep will do. I noticed it's not configured yet.

@toy
Copy link
Owner

toy commented Jun 21, 2016

Few last points about tests:

  • Please add a test for CachePath#replace (you can adapt one for Path#replace)
  • Test groups for "when cache is enabled" with and without worker digests are very similar, maybe a shared test group?
  • Better to check that only CachePath or nil can be returned and never just Path
  • Marking cached image as already optimised is not tested
  • And behaviour when zero size cache file exists

@gpakosz
Copy link
Contributor Author

gpakosz commented Jun 22, 2016

I'll look into it. I don't know RSpec well though so I'm not sure about what shared test groups are and how to use them

@toy
Copy link
Owner

toy commented Jun 22, 2016

Shared examples in short:

shared_examples "does something" do
  it  do
  end
end

describe  do
  let 
  before 
  it_behaves_like "does something"
end

@gpakosz
Copy link
Contributor Author

gpakosz commented Jun 22, 2016

Yep I finally found my way through it this afternoon :)

@toy
Copy link
Owner

toy commented Jul 9, 2016

Hi Gregory, can you please rebase the branch on master, sqash-split to one or more atomic/logical commits and check if it works on appveyor?

@toy
Copy link
Owner

toy commented Jul 10, 2016

I'll extract capabilities checkers for easy reusing, so failing windows checks can be ignored

@toy
Copy link
Owner

toy commented Jul 10, 2016

Have a look at CapabilityCheckHelpers, any_file_modes_allowed? and inodes_supported? in path_spec.rb

@gpakosz
Copy link
Contributor Author

gpakosz commented Jul 10, 2016

I updated cache_path_spec.rb from path_spec.rb.

@gpakosz
Copy link
Contributor Author

gpakosz commented Jul 10, 2016

What do you prefer between keeping the 10 commits from the cache branch, and squashing everything into a single commit?

For the record, tests and rubocop are all good in every of the 10 commits. I let you decide how you prefer history to look like

@toy
Copy link
Owner

toy commented Jul 10, 2016

Those are development commits, so better 1 commit unless you can and would like to split them into several logical separate ones.
Also please add the link to this PR in changelog entry.

@gpakosz
Copy link
Contributor Author

gpakosz commented Jul 10, 2016

Is it important to link the PR? I linked #83.

By the way, why are you using [#XX](https://github.com/toy/image_optim/issues/XX) in CHANGELOG.markdown? Using #XX is enough, GitHub automatically creates links.

@toy
Copy link
Owner

toy commented Jul 10, 2016

Already not, as important, as you've just linked them by mentioning ;)

I already wanted to check it for some time and thank you for pushing me to do it :), but unfortunately it works only in comments/descriptions and some other stuff, but not in markdown files:
https://github.com/toy/image_optim/blob/a4c2767f7b088f9ca091fd33ba04f9e772f805a7/CHANGELOG.markdown
https://github.com/toy/image_optim/blob/62c85009e3eea006ad47bd5cf0d032d3215ece5f/CHANGELOG.markdown
https://github.com/toy/image_optim/blob/657ed3f1d634608a4b02ef54eeaa50f92b029eb0/CHANGELOG.markdown

@gpakosz
Copy link
Contributor Author

gpakosz commented Jul 11, 2016

Oh indeed you're right

@toy toy merged commit 05920d9 into toy:master Jul 16, 2016
@toy
Copy link
Owner

toy commented Jul 16, 2016

Thank you for finishing this long PR, I'll release a new version soon

@gpakosz
Copy link
Contributor Author

gpakosz commented Jul 17, 2016

You're welcome. Thanks for the review and the suggestions. That was nice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants