Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support br encoding/decoding in response bank #63

Merged
merged 4 commits into from
Apr 4, 2023
Merged

Conversation

drinkbeer
Copy link

@drinkbeer drinkbeer commented Mar 14, 2023

Description

This pull request changes the default encoding in Shopify page cache to use brotli, as it outperforms zlib (gzip) in compression ratio, compression time, decompression time, and memory usage. A comparison between the two encodings can be found here. Additionally, end-to-end testing of various brotli-encoded websites has shown that it outperforms zlib in compression ratio and decompression time.

This change is completely backward compatible, and existing page caches will remain for one hour before being evicted. During the cut-over, the cache hit rate may temporarily decrease as all page caches need to be refilled. Cloudflare only supports one encoding type (gzip today, and we will make it brotli), so if a client requests an encoding other than gzip or brotli (e.g. deflate, or empty accept-encoding), SFR will respond with plain text.

Test

Tested using spin test shop in SFR, and ModHeader chrome plugin (spin test shop does not have Cloudflare between browser and server, so we can manually control the Accept-Encoding header sent from Chrome). We tested 6 scenarios.

click me for details

Scenario 1: server cache miss, client cache miss, Accept-Encoding: gzip

Run echo FLUSHALL | redis-cli -h localhost -p 12091 command in spin to clean up server cache, clean up the browser cache, in ModHeader plugin, set up a rule for Accept-Encoding: gzip in request header.

Run bundle install && dev debug in spin to start the server, and open https://shop1.shopify.sfr.jason-chen.us.spin.dev/ in Chrome:

image

Successfully opened the page with 200 HTTP code, we saw x-cache: miss in the response which means server cache miss for gzip encoding.

Scenario 2: client cache hit, Accept-Encoding: gzip

Refresh the link, the page responsed with 304 which means the client cache hit:
image

In the response header, we saw x-cache: hit, client which also indicates, it's client cache hit.

Scenario 3: server cache hit, client cache miss, Accept-Encoding: gzip

Clear the Chrome cache, and refresh the page

image

HTTP code 200 with x-cache: hit, server which means server cache hit.

Scenario 4: server cache miss, client cache miss, Accept-Encoding: br

Clear browser and server cache, in ModHeader, set up a rule Accept-Encoding: br in request header. Open the test shop in chrome:

image

Successfully opened the page with 200 HTTP code, we saw x-cache: miss in the response which means server cache miss for br encoding.

Scenario 5: client cache hit, Accept-Encoding: br

Refresh the page, and we received a 304 http code:
image

HTTP code 304 indicates it's client cache hit. The response has x-cache: hit, client which also indicates it's a client cache hit.

Scenario 6: server cache hit, client cache miss, Accept-Encoding: gzip

Clear the Chrome cache, and refresh the page

image

HTTP code 200 with x-cache: hit, server which means server cache hit.

@drinkbeer drinkbeer force-pushed the support_br branch 4 times, most recently from 751a6c6 to 003fbb2 Compare March 17, 2023 17:01
@drinkbeer drinkbeer marked this pull request as ready for review March 17, 2023 17:28
@drinkbeer drinkbeer requested a review from a team as a code owner March 17, 2023 17:29
lib/response_bank.rb Outdated Show resolved Hide resolved
lib/response_bank.rb Outdated Show resolved Hide resolved
lib/response_bank.rb Outdated Show resolved Hide resolved
lib/response_bank/response_cache_handler.rb Outdated Show resolved Hide resolved
test/middleware_test.rb Show resolved Hide resolved
Copy link

@colinbendell colinbendell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to support multiple incoming scenarios of Accept-Encoding.

  1. Today, cloudflare sends Accept-Encoding: gzip regardless of what the UA sends
  2. After this PR lands, it will still be like {1}
  3. We will get cloudflare to enable brotli to origin which is when cloudflare will send Accept-Encoding: br
  4. If we revert cloudflare's change it will return to {1}

This PR should be agnostic to the setup in cloudflare. The only thing that should happen is that there will be a period of time where the cache might include 2x the keys because it will include br and gzip, but the other will quickly LRU away an hour later.

lib/response_bank.rb Outdated Show resolved Hide resolved
lib/response_bank.rb Outdated Show resolved Hide resolved
lib/response_bank/middleware.rb Outdated Show resolved Hide resolved
lib/response_bank/middleware.rb Outdated Show resolved Hide resolved
lib/response_bank.rb Outdated Show resolved Hide resolved
lib/response_bank.rb Outdated Show resolved Hide resolved
test/test_helper.rb Outdated Show resolved Hide resolved
@drinkbeer
Copy link
Author

Hey @colinbendell , @casperisfine , @grcooper , I updated the PR description with a Test section which has the screenshots of 6 scenarios I tested in spin test shop. Let me know if there are any other scenarios you want me to cover. I am glad to do more tests.

test/middleware_test.rb Outdated Show resolved Hide resolved
@drinkbeer drinkbeer requested review from grcooper and alexcoco and removed request for a team March 24, 2023 17:14
@drinkbeer drinkbeer force-pushed the support_br branch 2 times, most recently from 96dd8a3 to bd9057d Compare March 27, 2023 17:11
Copy link

@colinbendell colinbendell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small changes in the flow for better readability, otherwise lgtm

lib/response_bank.rb Outdated Show resolved Hide resolved
lib/response_bank.rb Outdated Show resolved Hide resolved
lib/response_bank.rb Outdated Show resolved Hide resolved
test/middleware_test.rb Outdated Show resolved Hide resolved
test/middleware_test.rb Outdated Show resolved Hide resolved
test/middleware_test.rb Show resolved Hide resolved
test/response_cache_handler_test.rb Outdated Show resolved Hide resolved
lib/response_bank.rb Outdated Show resolved Hide resolved
lib/response_bank.rb Outdated Show resolved Hide resolved
lib/response_bank.rb Outdated Show resolved Hide resolved
lib/response_bank.rb Outdated Show resolved Hide resolved
lib/response_bank.rb Show resolved Hide resolved
lib/response_bank/middleware.rb Outdated Show resolved Hide resolved
lib/response_bank/response_cache_handler.rb Outdated Show resolved Hide resolved
@@ -52,7 +52,12 @@ def unversioned_key_hash
private

def key_hash(key)
"cacheable:#{Digest::MD5.hexdigest(key)}"
encoding = @env['response_bank.server_cache_encoding']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
encoding = @env['response_bank.server_cache_encoding']
encoding = @env["response_bank.server_cache_encoding"]

lib/response_bank/response_cache_handler.rb Outdated Show resolved Hide resolved
@@ -74,7 +75,9 @@ GEM
minitest (>= 5.1)
tzinfo (~> 2.0)
ast (2.4.2)
brotli (0.4.0)
Copy link

@ianks ianks Mar 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel great about adding this as a dependency...

  1. There are known segfaults in the issue tracker that have been left unaddressed for a year+
  2. After grepping, there does not seem to be any testing performed GC.stress = true to ensure GC safety

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any alternatives you'd recommend?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct me if I'm wrong, but those segfaults are just type checking on the parameters. Since we control this side, is there a risk? Or is your concern more broad stability?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At risk of sounding Rust-pilled, it's more of a general concern about C code that we consider safe to adopt widely in production. At a bare minimum, I think we should set some basic rules:

  1. Should be tested against latest stable Ruby (this gem only tests up to 3.0)
  2. Leverages ruby_memcheck to test for memory issues
  3. Runs test suite with GC.stress = true to smoke out GC related issues before they make it into production

FWIW, all of the above applies to all native extensions, both C and Rust.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the gem is small enough that we could easily fix it. What is more worrying to me is ownership. It's easy to produce the fixes, but will they be merged / published?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so the fix is here: miyucy/brotli#44. Let's see if the owners merge it. But either way, it's the same owner than the snappy gem we use extensively across shopify, so if they're AWOL, we're fucked either way.

For the GC.stress = true tests, honestly that gem defines a single type, and it has a single reference, and we don't even use that codepath, so I'm not too worried here.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the fix was merged in less than 20 minutes, so there's that.

Copy link

@ianks ianks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went through and GC stress tested myself manually, and things seemed fine. As a follow up, it would be good to add ruby_memcheck to CI for the brotli gem, as well as test on 3.2.

@drinkbeer
Copy link
Author

drinkbeer commented Apr 4, 2023

@colinbendell @ianks @casperisfine @grcooper

I fully tested this PR along with https://github.com/Shopify/storefront-renderer/pull/17817, and the results are all expected. I added the screenshot of the tests in this comment. I am going to merge the PR now.

click me for details

Scenario 1 - accept_encoding gzip, server cache miss, client cache miss

scenario_1_accept_encoding_gzip_server_miss_client_miss

Scenario 2 - accept_encoding gzip, server cache miss, client cache hit

scenario_2_accept_encoding_gzip_server_miss_client_hit

Scenario 3 - accept_encoding gzip, server cache hit, client cache miss

scenario_3_accept_encoding_gzip_server_hit_client_miss

Scenario 4 - accept_encoding br, server cache miss, client cache miss

scenario_4_accept_encoding_br_server_miss_client_miss

Scenario 5 - accept_encoding br, server cache miss, client cache hit

scenario_5_accept_encoding_br_server_miss_client_hit

Scenario 6 - accept_encoding br, server cache hit, client cache miss

scenario_6_accept_encoding_br_server_hit_client_miss

Scenario 7 - accept_encoding multiple encoding, server cache miss, client cache miss

scenario_7_accept_encoding_multiple_server_miss_client_miss

Scenario 8 - accept_encoding multiple encoding, server cache miss, client cache hit

scenario_8_accept_encoding_multiple_server_miss_client_hit

Scenario 9 - accept_encoding multiple encoding, server cache hit, client cache miss

scenario_9_accept_encoding_multiple_server_hit_client_miss

Scenario 10 - accept_encoding empty encoding, server cache miss, client cache miss

scenario_10_accept_encoding_empty_server_miss_client_miss

Scenario 11 - accept_encoding empty encoding, server cache miss, client cache hit

scenario_11_accept_encoding_empty_server_miss_client_hit

Scenario 12 - accept_encoding empty encoding, server cache hit, client cache miss

scenario_12_accept_encoding_empty_server_hit_client_miss

Scenario 13 - accept_encoding not supported encoding, server cache miss, client cache miss

scenario_13_accept_encoding_deflate_server_miss_client_miss

Scenario 14 - accept_encoding not supported encoding, server cache miss, client cache hit

scenario_14_accept_encoding_deflate_server_miss_client_hit

Scenario 15 - accept_encoding not supported encoding, server cache hit, client cache miss

scenario_15_accept_encoding_deflate_server_hit_client_miss

@drinkbeer drinkbeer merged commit 92ad9b5 into main Apr 4, 2023
@drinkbeer drinkbeer deleted the support_br branch April 4, 2023 15:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants