[Flaky] rspec ./spec/system/consumer/caching/shops_caching_spec.rb:17 #11010

filipefurtad0 · 2023-06-14T09:51:17Z

What we should change and why (this is tech debt)

rspec ./spec/system/consumer/caching/shops_caching_spec.rb:17

Failures:

  1) Shops caching caching enterprises AMS data caches data for all enterprises, with the provided options
     Failure/Error: expect(Rails.cache.exist?(key, options)).to be true
     
       expected true
            got false
     
     [Screenshot Image]: /home/runner/work/openfoodnetwork/openfoodnetwork/tmp/capybara/screenshots/failures_r_spec_example_groups_shops_caching_caching_enterprises_ams_data_caches_data_for_all_enterprises__with_the_provided_options_155.png

Context

https://github.com/openfoodfoundation/openfoodnetwork/actions/runs/5263043188/jobs/9512817149
https://openfoodnetwork.slack.com/archives/C012LE8LLDS/p1686718490274599?thread_ts=1686601349.929099&cid=C012LE8LLDS

Impact and timeline

The text was updated successfully, but these errors were encountered:

filipefurtad0 · 2023-06-14T16:41:31Z

Another occurrence here:
https://github.com/openfoodfoundation/openfoodnetwork/actions/runs/5267479488/jobs/9522792719?pr=11013

mkllnk · 2023-06-15T05:40:47Z

Looks like it got introduced by merging:

Fix BOM order loading issue #10999

Confusingly, the next merge introduced another failing spec:

Fix redis and hiredis-client version requirements #10455

mkllnk · 2023-06-15T06:10:08Z

I can't reproduce this. 🤔

filipefurtad0 · 2023-06-15T09:46:04Z

Humm, this one always fails for me locally - which is a good thing I guess 😁

mkllnk · 2023-06-16T02:28:44Z

This error still appears after merging your pull request, @filipefurtad0.

https://github.com/openfoodfoundation/openfoodnetwork/actions/runs/5285132793/jobs/9563317737

rioug · 2023-06-16T04:37:01Z

I managed to make it fail once in a 100 tries...
I noticed that we use the file system as cache store for the test environment:

config.cache_store = :file_store, Rails.root.join("tmp", "cache", "paralleltests#{ENV['TEST_ENV_NUMBER']}")

The only thing I can think of is maybe the hardrive being overloaded and the cache entry not being written yet by the time we check for it. Adding a sleep might fix the issue ? @mkllnk @filipefurtad0 any other idea ?
Other potential avenue, cache being cleared by another process ?

mkllnk · 2023-06-16T05:33:36Z

Interesting. It did fail for Filipe almost every time. I don't think that a file system would be that slow. I would also imagine a filesystem keeping track of changes to be written and blocking if needed.

All environments except test use Redis. And we do set up Redis on CI because it's used by several parts of the application. Maybe we should try to switch test to Redis as well? It may even be faster, who knows. Definitely more realistic.

mkllnk · 2023-06-16T05:43:18Z

This patch works for me but I have no idea if it will be flaky on CI:

diff --git a/config/environments/test.rb b/config/environments/test.rb
index 71228c5e3..70ebf9678 100644
--- a/config/environments/test.rb
+++ b/config/environments/test.rb
@@ -14,7 +14,11 @@ Openfoodnetwork::Application.configure do
   config.public_file_server.headers = { 'Cache-Control' => 'public, max-age=3600' }
 
   # Separate cache stores when running in parallel
-  config.cache_store = :file_store, Rails.root.join("tmp", "cache", "paralleltests#{ENV['TEST_ENV_NUMBER']}")
+  config.cache_store = :redis_cache_store, {
+    driver: :hiredis,
+    url: ENV.fetch("OFN_REDIS_URL", "redis://localhost:6379/1"),
+    reconnect_attempts: 1
+  }
 
   # Show full error reports and disable caching
   config.consider_all_requests_local       = true

rioug · 2023-06-16T05:48:18Z

I think I found the issue, I managed to reproduce it by pausing the shops_caching_spec.rb before the expect and then run another test that calls Rails.cache.clear, I used spec/models/spree/preferences/store_spec.rb.
Rails.cache.clear just clears everything so I don't think moving to redis would fix the issue. I'll see if I can remove the Rails.cache.clear and replace it by something more granular.

mkllnk · 2023-06-16T05:56:16Z

I tried the Redis version on CI and it still failed. So your finding it much better.

mkllnk · 2023-06-16T06:01:47Z

CI is running many containers with groups of specs but each container runs only on rspec process, not in parallel, as far as I know. Is Redis shared on Github Actions? Is cache clearing executed async?

We had the flaky spec before Filipe introduced the explicit cache clearing on the two caching specs. Is the cache automatically cleared by Rspec? In that case it would be difficult to be more granular.

Can we isolate our cache with a Thread id or something similar? Others must have had this problem before...

rioug · 2023-06-16T06:04:30Z

Interestingly, I just found this bit of code in spec/base_spec_helper.rb :

 config.before(:each) do
    reset_spree_preferences do |spree_config|
      # These are all settings that differ from Spree's defaults
      spree_config.default_country_id = default_country_id
      spree_config.checkout_zone = checkout_zone
      spree_config.currency = currency
      spree_config.shipping_instructions = true
    end
  end

reset_spree_preferences does Rails.cache.clear , so cache get cleared before each example. But as you mention rspec processes are not running in parallel. Feels like I am looking at the wrong thing.

rioug · 2023-06-16T06:18:55Z

Can we isolate our cache with a Thread id or something similar? Others must have had this problem before...

Indeed : rails/rails#48341 , but by the look of it rails doesn't offer any solution.

rioug · 2023-06-16T06:55:38Z

I am out of ideas, the only thing I can say is @filipefurtad0 's fix isn't changing anything because a Rails.cache.clear already happen before each example (see comment above).

filipefurtad0 · 2023-06-19T16:09:23Z

Thanks for that investigation @rioug. I guess that explains why it still keeps failing...

Locally, it went from constant failing to passing; hence, I was confident that the change would bring improvement. Let's revert it, once we have another approach for this 👍

rioug · 2023-06-19T23:34:45Z

No worries, this one seems to consistently fails on my fork. But it's a weird one because we put something in cache and try to check the cache straight after and the entry is missing 😕
Anyway @mkllnk made some config changes to use redis for test as well : #11075 let's see if this still occurs once it's merged.

rbroemeling · 2023-06-21T19:49:34Z

Can we isolate our cache with a Thread id or something similar?

@mkllnk Sorry that this is biting you folks as well. I'm dropping in because I noticed the link from rails/rails#48341 back to this.

I was able to isolate my cache by thread identifier using this sort of code as part of the parallelization spin-up in test/test_helper.rb:

# Protect from collisions during parallelized testing by namespacing our caching prefix by our worker identifier.
# Without this protection, all test executors share the same cache, which can lead to transient failures due to cache collisions.
# This protection is partial -- even with it, cache state still travels across tests executing in the same worker, which the author must be cautious of.
parallelize_setup do |worker|
  Rails.cache.options[:namespace] += "#{Time.now.to_i}.#{worker}:"
end

I used #{worker} because that gives the worker number in Rails test parallelization, but you could easily use a thread identifier (Thread.current.object_id) as well. Just be very careful that the adjustment to the :namespace takes place after parallelization (i.e., if you are running threads you need to ensure that it executes after the threads have split off; if it runs before the threads are created then it won't achieve the desired ends).

Not sure if this will help you, but I thought maybe it would so I'd drop it in here in case it was useful -- hope that you find a relatively easy way past this contention!

abdellani · 2023-06-22T08:09:39Z

Another new occurrence (just FYI).
./spec/system/consumer/caching/shops_caching_spec.rb:22

As discussed here (openfoodfoundation#11010 (comment)), reset_spree_preferences already does Rails.cache.clear

mkllnk · 2023-07-26T23:39:50Z

I've been thinking about this spec more. The cache entry expires after 15 seconds which seems plenty but there may be a reason why the system is waiting somewhere. It is the first spec in the spec run which can trigger more boot time like compiling assets. So here are ways it can fail:

      visit shops_path

      # wait until cache expires
      sleep 16

      key, options = CacheService::FragmentCaching.ams_shops
      expect_cached "views/#{key}", options

      visit shops_path
      sleep 10
      visit shops_path
      sleep 6

      key, options = CacheService::FragmentCaching.ams_shops
      expect_cached "views/#{key}", options

I tried the second example to know if another spec could influence this but having the visits in two different it blocks doesn't fail the spec because the cache is reset between each spec. So I'm looking for a delay between the write to the cache and the rendering of the page.

Locally, I'm using Spring to run tests quicker but to simulate CI conditions I stopped spring and then tried again. It usually passes on my machine. But if I add sleep 12 then it fails. The whole test takes 17 seconds. So there's a delay of at least 3 seconds on my machine. The CI machines can be a lot slower and there we may see a delay of 15 seconds which causes the cache to expire before we can test for it.

Then I had the idea of clearing the cache with rails tmp:clear and run the spec again. Suddenly it took ages. It's running a nodejs process, compiling Javascript, I guess. Now the result is even worse:

Ferrum::PendingConnectionsError
 RuntimeError: Requests did not finish in 60 seconds

After that, I ran the test again. Some JS has been compiled and the rest isn't taking as long. The test ran for 36 seconds and failed. I tried to surround the example with Timecope.freeze but it still failed. I guess that Redis has its own clock. Maybe that would have worked with the file system cache. So my last idea is to preload the page once so that all the JS is compiled when we do the real test. Seems to work. I'll open a PR for that.

Compile JS before testing caching #11284

rioug · 2023-07-27T23:57:50Z

Nice find !

mkllnk · 2024-01-15T05:25:21Z

This came back:

https://github.com/openfoodfoundation/openfoodnetwork/actions/runs/7495008823/job/20404210482

The previous PR fixed one spec example but if the execution order of examples changed then this could happen again. I'm looking into a more sustainable solution.

filipefurtad0 · 2024-02-06T11:48:44Z

Some failures were also tracked in this issue.

sigmundpetersen · 2024-06-03T09:52:55Z

@filipefurtad0 @mkllnk any of you planning to work on this issue?
If not we should move it back to All the things

filipefurtad0 · 2024-06-03T09:54:34Z

Thank you for pinging @sigmundpetersen. From my side, we can move it to All the things. Would be great to fix it, but I don't think this is a priority.

mkllnk · 2024-06-04T04:47:01Z

I might continue this again this week. Otherwise I'll move it back.

filipefurtad0 added the tech debt label Jun 14, 2023

filipefurtad0 mentioned this issue Jun 15, 2023

Clears cache around examples #11019

Merged

sigmundpetersen assigned filipefurtad0 Jun 15, 2023

filipefurtad0 mentioned this issue Jun 15, 2023

[Flaky] Redis::CommandError: ERR Protocol error: too big inline request #11021

Closed

mkllnk closed this as completed in #11019 Jun 16, 2023

mkllnk reopened this Jun 16, 2023

rioug self-assigned this Jun 16, 2023

rioug removed their assignment Jun 20, 2023

filipefurtad0 mentioned this issue Jul 13, 2023

Checkout summary: avoid carriage return on price #11105

Merged

jibees mentioned this issue Jul 21, 2023

Adds missing key #11252

Merged

filipefurtad0 added a commit to filipefurtad0/openfoodnetwork that referenced this issue Jul 26, 2023

Reverts openfoodfoundation#11019

d4d8941

As discussed here (openfoodfoundation#11010 (comment)), reset_spree_preferences already does Rails.cache.clear

filipefurtad0 mentioned this issue Jul 26, 2023

Comments out flaky caching example #11280

Merged

mkllnk self-assigned this Jul 26, 2023

mkllnk mentioned this issue Jul 26, 2023

Compile JS before testing caching #11284

Merged

7 tasks

jibees closed this as completed in #11284 Jul 27, 2023

mkllnk reopened this Jan 15, 2024

sigmundpetersen mentioned this issue Feb 6, 2024

[Flaky] spec/system/consumer/caching/shops_caching_spec.rb #11656

Closed

RachL added this to OFN Delivery board Feb 15, 2024

github-project-automation bot moved this to All the things in OFN Delivery board Feb 15, 2024

RachL moved this from All the things to In Dev 💪 in OFN Delivery board Feb 15, 2024

sigmundpetersen mentioned this issue Apr 2, 2024

Fix Rubocop Rails/HelperInstanceVariable #12324

Merged

4 tasks

sigmundpetersen unassigned filipefurtad0 Jun 3, 2024

mkllnk mentioned this issue Jun 5, 2024

Pre-compile all needed assets in CI environment #12552

Closed

4 tasks

sigmundpetersen mentioned this issue Jun 13, 2024

Flaky specs #8293

Open

26 tasks

mkllnk mentioned this issue Jun 19, 2024

Ensure all assets are compiled on test page visit #12585

Merged

4 tasks

rioug closed this as completed in #12585 Jun 24, 2024

github-project-automation bot moved this from In Progress ⚙ to Done in OFN Delivery board Jun 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Flaky] rspec ./spec/system/consumer/caching/shops_caching_spec.rb:17 #11010

[Flaky] rspec ./spec/system/consumer/caching/shops_caching_spec.rb:17 #11010

filipefurtad0 commented Jun 14, 2023

filipefurtad0 commented Jun 14, 2023

mkllnk commented Jun 15, 2023

mkllnk commented Jun 15, 2023

filipefurtad0 commented Jun 15, 2023

mkllnk commented Jun 16, 2023 •

edited

Loading

rioug commented Jun 16, 2023

mkllnk commented Jun 16, 2023

mkllnk commented Jun 16, 2023

rioug commented Jun 16, 2023

mkllnk commented Jun 16, 2023

mkllnk commented Jun 16, 2023

rioug commented Jun 16, 2023

rioug commented Jun 16, 2023

rioug commented Jun 16, 2023

filipefurtad0 commented Jun 19, 2023

rioug commented Jun 19, 2023

rbroemeling commented Jun 21, 2023

abdellani commented Jun 22, 2023

mkllnk commented Jul 26, 2023 •

edited

Loading

rioug commented Jul 27, 2023

mkllnk commented Jan 15, 2024 •

edited

Loading

filipefurtad0 commented Feb 6, 2024

sigmundpetersen commented Jun 3, 2024

filipefurtad0 commented Jun 3, 2024 •

edited

Loading

mkllnk commented Jun 4, 2024

[Flaky] rspec ./spec/system/consumer/caching/shops_caching_spec.rb:17 #11010

[Flaky] rspec ./spec/system/consumer/caching/shops_caching_spec.rb:17 #11010

Comments

filipefurtad0 commented Jun 14, 2023

What we should change and why (this is tech debt)

Context

Impact and timeline

filipefurtad0 commented Jun 14, 2023

mkllnk commented Jun 15, 2023

mkllnk commented Jun 15, 2023

filipefurtad0 commented Jun 15, 2023

mkllnk commented Jun 16, 2023 • edited Loading

rioug commented Jun 16, 2023

mkllnk commented Jun 16, 2023

mkllnk commented Jun 16, 2023

rioug commented Jun 16, 2023

mkllnk commented Jun 16, 2023

mkllnk commented Jun 16, 2023

rioug commented Jun 16, 2023

rioug commented Jun 16, 2023

rioug commented Jun 16, 2023

filipefurtad0 commented Jun 19, 2023

rioug commented Jun 19, 2023

rbroemeling commented Jun 21, 2023

abdellani commented Jun 22, 2023

mkllnk commented Jul 26, 2023 • edited Loading

rioug commented Jul 27, 2023

mkllnk commented Jan 15, 2024 • edited Loading

filipefurtad0 commented Feb 6, 2024

sigmundpetersen commented Jun 3, 2024

filipefurtad0 commented Jun 3, 2024 • edited Loading

mkllnk commented Jun 4, 2024

mkllnk commented Jun 16, 2023 •

edited

Loading

mkllnk commented Jul 26, 2023 •

edited

Loading

mkllnk commented Jan 15, 2024 •

edited

Loading

filipefurtad0 commented Jun 3, 2024 •

edited

Loading