Clean both databases between every test #12725

lowellrex · 2019-11-15T00:42:36Z

During engineering huddle last week, the team decided to explore rolling back the change to selectively clean the test databases (introduced in #11470) because we believe the selective cleaning has introduced additional "flaky" tests and not reduced the runtime of our rspec test suite in Circle CI.

Acceptance criteria

Determine how to measure whether cleaning the tests slows down rspec tests in Circle CI (see comment on this ticket)
Change code to clean databases between each test (Always clean both databases when including rails_helper #12745).
Monitor change after live to ensure Circle CI rspec build runtimes have not increased dramatically.

If rspec build runtimes do not increase significantly:

Remove explicit ...database_cleaner includes and :all_dbs and :postgres arguments from test files that include rails_helper
- We can make those changes in bulk with sed to all files and then check out files that should be excluded from that set from origin/master
- $> diff <(git grep -l 'require "support/vacols_database_cleaner"' | sort) <(git grep -l 'require "rails_helper"' | sort) | grep "<" | sed -e "s/< //g"
- $> diff <(git grep -l 'require "support/database_cleaner"' | sort) <(git grep -l 'require "rails_helper"' | sort) | grep "<" | sed -e "s/< //g"

If rspec build runtimes increase significantly:

Roll back changes in Always clean both databases when including rails_helper #12745.

The text was updated successfully, but these errors were encountered:

lowellrex · 2019-11-15T00:53:43Z

In order to determine whether these changes will have an effect on the running time of our rspec tests in Circle CI we need to collect some benchmarks. We collected the benchmarks by visiting the public webpage that listed the results of our master branch builds: https://circleci.com/gh/department-of-veterans-affairs/caseflow/tree/master?page=0. We ran the following script in the browser's Javascript console on that page and the 14 subsequent pages to collect data about each rspec build for the past month.

buildRows = document.getElementsByClassName("build");
for (idx = 0; idx < buildRows.length; idx++ ) {
  let row = buildRows.item(idx);

  // Figured these paths out by walking the DOM manually.
  let buildName = row.children[2].children[1].children[1].textContent
  if (buildName === "rspec") {
    let rawDate = row.children[3].children[0].children[0].title;
    let duration = row.children[3].children[0].children[1].children[1].textContent;
    let result = row.classList[1];

    console.log(`"${rawDate}", "${duration}", "${result}"`);
  }
}

We pasted the results of each of these into a text editor, finessed them into a CSV format, uploaded them to Google Docs, and created the graph below.

Rolling 5 iteration average in red. 16 October through 14 November.

From this we can see that our rspec builds have consistently taken ~17 minutes to run on Circle CI over the past month.

Follow up work:

Gather data for the month before/after Speed up test suite #11470 went live and compare.
Review runtimes after rolling back changes in Speed up test suite #11470.

lowellrex · 2019-11-15T16:56:58Z

Gathered data for the month before and after the changes in #11470 went live on 30 July 2019 (https://docs.google.com/spreadsheets/d/1ORnvebEnhbAPLfgA8XUYWPKALUdjGcVLu_Dpg3TWLgM/edit#gid=1433492087).

Rolling 5 iteration average of successful rspec build between 1 July and 31 August.

lowellrex · 2019-11-18T16:23:09Z

The failure rate of rspec builds on CircleCI was 27% for the month before the change took place on held steady at 27% for the month after the change.

lowellrex · 2019-11-22T15:52:35Z

Looking at the first 3 days of data it looks like we've increased the runtime of our rspec tests reliably by ~90 seconds (~9% increase in runtime). Failure rate of 4/24 (~17%) over that time which is slightly less than the 23% failure rate of the two months prior, though probably not a meaningful number yet because there are only 24 examples so far. Let's gather some more data (2 weeks?) before making any decisions about how to proceed here (whether we roll-back the changes in #12745 or not).

Note: All data to the right of the yellow line happened after the change in #12745 was made. We've corrected the x-axis in this chart so time progresses left-to-right. (Source)

lowellrex · 2019-11-22T15:59:18Z

Handing off future work here to @kevmo

Connects #12725 ### Description Consolidate db cleaner config into one file, which is autoloaded by rails_helper. If we want to make VACOLS db cleaner optional by tag in the future, we can modify this single file. The `require` and the spec tagging are separate changes. This also adds db cleaner for the new `:etl` specs. These require explicit tagging, but they occupy a small known corner of the spec suite and should not leak across specs like FACOLS and Caseflow dbs do.

lomky · 2019-11-26T20:32:16Z

What is this graph?

1 | ||
2 | |||||||
3 | |||
5 | 
8 |

Context
Currently in data gather mode; analysis and some scriptable changes to follow, possibly spinning off a discussion if results are mixed.

Estimating at 2

monfresh · 2020-01-16T22:15:35Z

This is great work! I'd love to know if there have been any updates since November. One thing I noticed is that y'all are measuring the total build runtime, which includes various setup steps, such as spinning up the environment and the pre-test steps. For better accuracy, I would recommend only looking at the RSpec portion of the build, and picking the longest time of the 5 different nodes. I'm not sure how easy that would be to obtain programmatically.

In addition, I would recommend determining whether or not to roll back the changes based on specific numeric goals, as opposed to subjective terms like "dramatically" or "significantly". The minimum loss of time can be measured by multiplying the average increase in build time by the average number of daily builds.

It looks like there were about 88 RSpec builds over the past 24 hours that actually ran. I'm not sure if that's typical, so let's round down to 80. A 90-second increase in build time would represent a loss of 57 work days per year! That's an entire quarter!

(1.5 minutes x 80 x 5 days per week x 46 weeks per year (to account for time off)/60 minutes per hour/8 hours per workday = 57.5 work days

monfresh · 2020-01-17T19:31:45Z

By the way, the reason I'm interested in these developments is because I'm speaking on this subject at a Ruby conference in Paris next month: https://2020.rubyparis.org/#speakers 😄

I few more thoughts came to my mind:

I think we can all agree that the selective DB cleaning sped up the test suite. The question is, at what cost? I think the most accurate and least time-consuming way to determine if the selective DB cleaning is causing test failures is to run the test suite without parallelization. I could be wrong, but I believe that the tests are run in the same order every time, and because of that, if there is a DB cleaning issue, running the tests serially multiple times should result in the same errors every time. To narrow down the offending test, RSpec provides the bisect option.

I have not tested this, but an option faster than bisect could be to add a before(:each) block that checks if the DB is empty, and if it isn't, it outputs to the log the name of the test and the state of the DB, and then aborts the test suite. Remember to ignore and results that mention the vftypes and issref tables, because those are excluded from cleaning.

If it turns out failures are not due to selective cleaning, then it's probably safe to assume the process has been working since July and that it's safe to reintroduce it. I would run this research as soon as possible, because if selective cleaning is working fine, every day that it's not enabled results in the loss of 2 person-hours per day.

If selective cleaning is causing failures, then I would be interested to know if it was due to a missing require statement and/or tag, and if it was in a file from around the time the feature was introduced, or a more recent one. The latter would indicate that the process is prone to human error. To that end, I thought that perhaps there's a way to remove the human from the equation. I wonder if it's possible to write a script that can run an RSpec test and determine if it uses a DB, and then automatically add the appropriate require statement and tag to it.

As for the failure rate measurement described above in this issue, if I'm reading it correctly, it seems flawed because it is not comparing apples to apples. If you're looking at daily builds over a period of time, there will be code differences between certain builds that will skew the results. A more accurate measurement would be to take the code right before the removal of the selective cleaning I introduced, and run x amount of builds against that branch. Then take the code that reverts my changes and run x amount builds against that branch, then measure the difference in failures. I'm not sure what the best number of builds is for the results to be statistically significant. My guess is that it has to be high enough that it will take longer to measure than the first options I mentioned.

monfresh · 2020-01-21T21:07:11Z

Here's a script that should surface any DB cleanup issues:

RSpec.configure do |config|
  config.before(:each) do |example|
    path = example.metadata[:file_path]
    line_number = example.metadata[:line_number]

    ActiveRecord::Base.connection.tables.each do |table|
      next if %w[schema_migrations ar_internal_metadata vftypes issref].include?(table)
      klass = table.singularize.camelize.constantize
      count = klass.count
      next if count.zero?
      puts "#{klass.name} has #{count} records"
      puts "current test is: #{path}:#{line_number}"
      exit! 1
    end
  end
end

I've only tested this on another project that uses Postgres. I'm not sure if the Oracle table names (vftypes and issref) are correct. You can verify by running ActiveRecord::Base.connection.tables in the Rails console and see what the Oracle table names are.

Since all tests are run in the same order every time, in alphabetical order by folder, if you can get the script to fail at the same place every time, it will mean that the test before the one labeled as "current test" is the one that didn't clean the DB.

cc @kevmo

ferristseng · 2020-10-27T15:42:35Z

Monitor change after live to ensure Circle CI rspec build runtimes have not increased dramatically.

Checking this off because we have made improvements to increased build times, which don't seem connected to this PR but due to an increase in # of tests.

Remove explicit ...database_cleaner includes and :all_dbs and :postgres arguments from test files that include rails_helper

This is the last part of this, which should be fairly minor, so I'm adding the Beginner Friendly tag

lowellrex added Type: Tech-Improvement caseflow-developer labels Nov 15, 2019

lowellrex self-assigned this Nov 15, 2019

lowellrex mentioned this issue Nov 15, 2019

Engineering topics for discussion #10569

Closed

7 tasks

lowellrex mentioned this issue Nov 18, 2019

Always clean both databases when including rails_helper #12745

Merged

3 tasks

hschallhorn mentioned this issue Nov 18, 2019

Echo engineering team lines of effort #7556

Closed

ferristseng added the Eng: Backend Work Group backend engineering task label Nov 21, 2019

lowellrex assigned kevmo and unassigned lowellrex Nov 22, 2019

pkarman mentioned this issue Nov 25, 2019

Database cleaner cleanup #12800

Merged

jimruggiero added Team: Echo 🐬 Priority: Medium Blocking issue w/workaround, or "second in" priority for new work. labels Dec 11, 2019

hschallhorn mentioned this issue Jan 27, 2020

Echo engineering team lines of effort 2020 #13175

Closed

alisan16 removed the caseflow-developer label Mar 4, 2020

alisan16 unassigned kevmo Mar 4, 2020

hschallhorn added this to the PI3-S6: Civil Rights Sprint milestone Apr 7, 2020

hschallhorn modified the milestones: PI3-S6: Civil Rights Sprint, PI3-S7: Innovation & Planning Sprint Apr 20, 2020

ferristseng added the Eng: Beginner Friendly Good tickets for new team members label Oct 27, 2020

ThorntonMatthew closed this as completed Apr 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clean both databases between every test #12725

Clean both databases between every test #12725

lowellrex commented Nov 15, 2019 •

edited by ferristseng

Loading

lowellrex commented Nov 15, 2019 •

edited

Loading

lowellrex commented Nov 15, 2019

lowellrex commented Nov 18, 2019

lowellrex commented Nov 22, 2019 •

edited

Loading

lowellrex commented Nov 22, 2019

lomky commented Nov 26, 2019

monfresh commented Jan 16, 2020 •

edited

Loading

monfresh commented Jan 17, 2020

monfresh commented Jan 21, 2020

ferristseng commented Oct 27, 2020

Clean both databases between every test #12725

Clean both databases between every test #12725

Comments

lowellrex commented Nov 15, 2019 • edited by ferristseng Loading

Acceptance criteria

lowellrex commented Nov 15, 2019 • edited Loading

lowellrex commented Nov 15, 2019

lowellrex commented Nov 18, 2019

lowellrex commented Nov 22, 2019 • edited Loading

lowellrex commented Nov 22, 2019

lomky commented Nov 26, 2019

monfresh commented Jan 16, 2020 • edited Loading

monfresh commented Jan 17, 2020

monfresh commented Jan 21, 2020

ferristseng commented Oct 27, 2020

lowellrex commented Nov 15, 2019 •

edited by ferristseng

Loading

lowellrex commented Nov 15, 2019 •

edited

Loading

lowellrex commented Nov 22, 2019 •

edited

Loading

monfresh commented Jan 16, 2020 •

edited

Loading