Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean both databases between every test #12725

Closed
3 of 5 tasks
lowellrex opened this issue Nov 15, 2019 · 10 comments
Closed
3 of 5 tasks

Clean both databases between every test #12725

lowellrex opened this issue Nov 15, 2019 · 10 comments
Labels
Eng: Backend Work Group backend engineering task Eng: Beginner Friendly Good tickets for new team members Priority: Medium Blocking issue w/workaround, or "second in" priority for new work. Team: Echo 🐬 Type: Tech-Improvement

Comments

@lowellrex
Copy link
Contributor

lowellrex commented Nov 15, 2019

During engineering huddle last week, the team decided to explore rolling back the change to selectively clean the test databases (introduced in #11470) because we believe the selective cleaning has introduced additional "flaky" tests and not reduced the runtime of our rspec test suite in Circle CI.

Acceptance criteria

If rspec build runtimes do not increase significantly:

  • Remove explicit ...database_cleaner includes and :all_dbs and :postgres arguments from test files that include rails_helper
    • We can make those changes in bulk with sed to all files and then check out files that should be excluded from that set from origin/master
    • $> diff <(git grep -l 'require "support/vacols_database_cleaner"' | sort) <(git grep -l 'require "rails_helper"' | sort) | grep "<" | sed -e "s/< //g"
    • $> diff <(git grep -l 'require "support/database_cleaner"' | sort) <(git grep -l 'require "rails_helper"' | sort) | grep "<" | sed -e "s/< //g"

If rspec build runtimes increase significantly:

@lowellrex
Copy link
Contributor Author

lowellrex commented Nov 15, 2019

In order to determine whether these changes will have an effect on the running time of our rspec tests in Circle CI we need to collect some benchmarks. We collected the benchmarks by visiting the public webpage that listed the results of our master branch builds: https://circleci.com/gh/department-of-veterans-affairs/caseflow/tree/master?page=0. We ran the following script in the browser's Javascript console on that page and the 14 subsequent pages to collect data about each rspec build for the past month.

buildRows = document.getElementsByClassName("build");
for (idx = 0; idx < buildRows.length; idx++ ) {
  let row = buildRows.item(idx);

  // Figured these paths out by walking the DOM manually.
  let buildName = row.children[2].children[1].children[1].textContent
  if (buildName === "rspec") {
    let rawDate = row.children[3].children[0].children[0].title;
    let duration = row.children[3].children[0].children[1].children[1].textContent;
    let result = row.classList[1];

    console.log(`"${rawDate}", "${duration}", "${result}"`);
  }
}

We pasted the results of each of these into a text editor, finessed them into a CSV format, uploaded them to Google Docs, and created the graph below.

image
Rolling 5 iteration average in red. 16 October through 14 November.

From this we can see that our rspec builds have consistently taken ~17 minutes to run on Circle CI over the past month.

Follow up work:

@lowellrex
Copy link
Contributor Author

Gathered data for the month before and after the changes in #11470 went live on 30 July 2019 (https://docs.google.com/spreadsheets/d/1ORnvebEnhbAPLfgA8XUYWPKALUdjGcVLu_Dpg3TWLgM/edit#gid=1433492087).

image
Rolling 5 iteration average of successful rspec build between 1 July and 31 August.

@lowellrex
Copy link
Contributor Author

The failure rate of rspec builds on CircleCI was 27% for the month before the change took place on held steady at 27% for the month after the change.

@lowellrex
Copy link
Contributor Author

lowellrex commented Nov 22, 2019

Looking at the first 3 days of data it looks like we've increased the runtime of our rspec tests reliably by ~90 seconds (~9% increase in runtime). Failure rate of 4/24 (~17%) over that time which is slightly less than the 23% failure rate of the two months prior, though probably not a meaningful number yet because there are only 24 examples so far. Let's gather some more data (2 weeks?) before making any decisions about how to proceed here (whether we roll-back the changes in #12745 or not).

image
Note: All data to the right of the yellow line happened after the change in #12745 was made. We've corrected the x-axis in this chart so time progresses left-to-right. (Source)

@lowellrex
Copy link
Contributor Author

Handing off future work here to @kevmo

@lowellrex lowellrex assigned kevmo and unassigned lowellrex Nov 22, 2019
va-bot pushed a commit that referenced this issue Nov 25, 2019
Connects #12725 

### Description
Consolidate db cleaner config into one file, which is autoloaded by rails_helper.

If we want to make VACOLS db cleaner optional by tag in the future, we can modify this single file. The `require` and the spec tagging are separate changes.

This also adds db cleaner for the new `:etl` specs. These require explicit tagging, but they occupy a small known corner of the spec suite and should not leak across specs like FACOLS and Caseflow dbs do.
@lomky
Copy link
Contributor

lomky commented Nov 26, 2019

What is this graph?

1 | ||
2 | |||||||
3 | |||
5 | 
8 | 

Context
Currently in data gather mode; analysis and some scriptable changes to follow, possibly spinning off a discussion if results are mixed.

Estimating at 2

@jimruggiero jimruggiero added Team: Echo 🐬 Priority: Medium Blocking issue w/workaround, or "second in" priority for new work. labels Dec 11, 2019
@monfresh
Copy link
Contributor

monfresh commented Jan 16, 2020

This is great work! I'd love to know if there have been any updates since November. One thing I noticed is that y'all are measuring the total build runtime, which includes various setup steps, such as spinning up the environment and the pre-test steps. For better accuracy, I would recommend only looking at the RSpec portion of the build, and picking the longest time of the 5 different nodes. I'm not sure how easy that would be to obtain programmatically.

In addition, I would recommend determining whether or not to roll back the changes based on specific numeric goals, as opposed to subjective terms like "dramatically" or "significantly". The minimum loss of time can be measured by multiplying the average increase in build time by the average number of daily builds.

It looks like there were about 88 RSpec builds over the past 24 hours that actually ran. I'm not sure if that's typical, so let's round down to 80. A 90-second increase in build time would represent a loss of 57 work days per year! That's an entire quarter!

(1.5 minutes x 80 x 5 days per week x 46 weeks per year (to account for time off)/60 minutes per hour/8 hours per workday = 57.5 work days

@monfresh
Copy link
Contributor

By the way, the reason I'm interested in these developments is because I'm speaking on this subject at a Ruby conference in Paris next month: https://2020.rubyparis.org/#speakers 😄

I few more thoughts came to my mind:

I think we can all agree that the selective DB cleaning sped up the test suite. The question is, at what cost? I think the most accurate and least time-consuming way to determine if the selective DB cleaning is causing test failures is to run the test suite without parallelization. I could be wrong, but I believe that the tests are run in the same order every time, and because of that, if there is a DB cleaning issue, running the tests serially multiple times should result in the same errors every time. To narrow down the offending test, RSpec provides the bisect option.

I have not tested this, but an option faster than bisect could be to add a before(:each) block that checks if the DB is empty, and if it isn't, it outputs to the log the name of the test and the state of the DB, and then aborts the test suite. Remember to ignore and results that mention the vftypes and issref tables, because those are excluded from cleaning.

If it turns out failures are not due to selective cleaning, then it's probably safe to assume the process has been working since July and that it's safe to reintroduce it. I would run this research as soon as possible, because if selective cleaning is working fine, every day that it's not enabled results in the loss of 2 person-hours per day.

If selective cleaning is causing failures, then I would be interested to know if it was due to a missing require statement and/or tag, and if it was in a file from around the time the feature was introduced, or a more recent one. The latter would indicate that the process is prone to human error. To that end, I thought that perhaps there's a way to remove the human from the equation. I wonder if it's possible to write a script that can run an RSpec test and determine if it uses a DB, and then automatically add the appropriate require statement and tag to it.

As for the failure rate measurement described above in this issue, if I'm reading it correctly, it seems flawed because it is not comparing apples to apples. If you're looking at daily builds over a period of time, there will be code differences between certain builds that will skew the results. A more accurate measurement would be to take the code right before the removal of the selective cleaning I introduced, and run x amount of builds against that branch. Then take the code that reverts my changes and run x amount builds against that branch, then measure the difference in failures. I'm not sure what the best number of builds is for the results to be statistically significant. My guess is that it has to be high enough that it will take longer to measure than the first options I mentioned.

@monfresh
Copy link
Contributor

Here's a script that should surface any DB cleanup issues:

RSpec.configure do |config|
  config.before(:each) do |example|
    path = example.metadata[:file_path]
    line_number = example.metadata[:line_number]

    ActiveRecord::Base.connection.tables.each do |table|
      next if %w[schema_migrations ar_internal_metadata vftypes issref].include?(table)
      klass = table.singularize.camelize.constantize
      count = klass.count
      next if count.zero?
      puts "#{klass.name} has #{count} records"
      puts "current test is: #{path}:#{line_number}"
      exit! 1
    end
  end
end

I've only tested this on another project that uses Postgres. I'm not sure if the Oracle table names (vftypes and issref) are correct. You can verify by running ActiveRecord::Base.connection.tables in the Rails console and see what the Oracle table names are.

Since all tests are run in the same order every time, in alphabetical order by folder, if you can get the script to fail at the same place every time, it will mean that the test before the one labeled as "current test" is the one that didn't clean the DB.

cc @kevmo

@ferristseng
Copy link
Contributor

Monitor change after live to ensure Circle CI rspec build runtimes have not increased dramatically.

Checking this off because we have made improvements to increased build times, which don't seem connected to this PR but due to an increase in # of tests.

Remove explicit ...database_cleaner includes and :all_dbs and :postgres arguments from test files that include rails_helper

This is the last part of this, which should be fairly minor, so I'm adding the Beginner Friendly tag

@ferristseng ferristseng added the Eng: Beginner Friendly Good tickets for new team members label Oct 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Eng: Backend Work Group backend engineering task Eng: Beginner Friendly Good tickets for new team members Priority: Medium Blocking issue w/workaround, or "second in" priority for new work. Team: Echo 🐬 Type: Tech-Improvement
Projects
None yet
Development

No branches or pull requests

9 participants