-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clean both databases between every test #12725
Comments
In order to determine whether these changes will have an effect on the running time of our rspec tests in Circle CI we need to collect some benchmarks. We collected the benchmarks by visiting the public webpage that listed the results of our buildRows = document.getElementsByClassName("build");
for (idx = 0; idx < buildRows.length; idx++ ) {
let row = buildRows.item(idx);
// Figured these paths out by walking the DOM manually.
let buildName = row.children[2].children[1].children[1].textContent
if (buildName === "rspec") {
let rawDate = row.children[3].children[0].children[0].title;
let duration = row.children[3].children[0].children[1].children[1].textContent;
let result = row.classList[1];
console.log(`"${rawDate}", "${duration}", "${result}"`);
}
} We pasted the results of each of these into a text editor, finessed them into a CSV format, uploaded them to Google Docs, and created the graph below.
From this we can see that our rspec builds have consistently taken ~17 minutes to run on Circle CI over the past month. Follow up work:
|
Gathered data for the month before and after the changes in #11470 went live on 30 July 2019 (https://docs.google.com/spreadsheets/d/1ORnvebEnhbAPLfgA8XUYWPKALUdjGcVLu_Dpg3TWLgM/edit#gid=1433492087).
|
The failure rate of rspec builds on CircleCI was 27% for the month before the change took place on held steady at 27% for the month after the change. |
Looking at the first 3 days of data it looks like we've increased the runtime of our rspec tests reliably by ~90 seconds (~9% increase in runtime). Failure rate of 4/24 (~17%) over that time which is slightly less than the 23% failure rate of the two months prior, though probably not a meaningful number yet because there are only 24 examples so far. Let's gather some more data (2 weeks?) before making any decisions about how to proceed here (whether we roll-back the changes in #12745 or not).
|
Handing off future work here to @kevmo |
Connects #12725 ### Description Consolidate db cleaner config into one file, which is autoloaded by rails_helper. If we want to make VACOLS db cleaner optional by tag in the future, we can modify this single file. The `require` and the spec tagging are separate changes. This also adds db cleaner for the new `:etl` specs. These require explicit tagging, but they occupy a small known corner of the spec suite and should not leak across specs like FACOLS and Caseflow dbs do.
Context Estimating at 2 |
This is great work! I'd love to know if there have been any updates since November. One thing I noticed is that y'all are measuring the total build runtime, which includes various setup steps, such as spinning up the environment and the pre-test steps. For better accuracy, I would recommend only looking at the RSpec portion of the build, and picking the longest time of the 5 different nodes. I'm not sure how easy that would be to obtain programmatically. In addition, I would recommend determining whether or not to roll back the changes based on specific numeric goals, as opposed to subjective terms like "dramatically" or "significantly". The minimum loss of time can be measured by multiplying the average increase in build time by the average number of daily builds. It looks like there were about 88 RSpec builds over the past 24 hours that actually ran. I'm not sure if that's typical, so let's round down to 80. A 90-second increase in build time would represent a loss of 57 work days per year! That's an entire quarter!
|
By the way, the reason I'm interested in these developments is because I'm speaking on this subject at a Ruby conference in Paris next month: https://2020.rubyparis.org/#speakers 😄 I few more thoughts came to my mind: I think we can all agree that the selective DB cleaning sped up the test suite. The question is, at what cost? I think the most accurate and least time-consuming way to determine if the selective DB cleaning is causing test failures is to run the test suite without parallelization. I could be wrong, but I believe that the tests are run in the same order every time, and because of that, if there is a DB cleaning issue, running the tests serially multiple times should result in the same errors every time. To narrow down the offending test, RSpec provides the bisect option. I have not tested this, but an option faster than If it turns out failures are not due to selective cleaning, then it's probably safe to assume the process has been working since July and that it's safe to reintroduce it. I would run this research as soon as possible, because if selective cleaning is working fine, every day that it's not enabled results in the loss of 2 person-hours per day. If selective cleaning is causing failures, then I would be interested to know if it was due to a missing As for the failure rate measurement described above in this issue, if I'm reading it correctly, it seems flawed because it is not comparing apples to apples. If you're looking at daily builds over a period of time, there will be code differences between certain builds that will skew the results. A more accurate measurement would be to take the code right before the removal of the selective cleaning I introduced, and run x amount of builds against that branch. Then take the code that reverts my changes and run x amount builds against that branch, then measure the difference in failures. I'm not sure what the best number of builds is for the results to be statistically significant. My guess is that it has to be high enough that it will take longer to measure than the first options I mentioned. |
Here's a script that should surface any DB cleanup issues: RSpec.configure do |config|
config.before(:each) do |example|
path = example.metadata[:file_path]
line_number = example.metadata[:line_number]
ActiveRecord::Base.connection.tables.each do |table|
next if %w[schema_migrations ar_internal_metadata vftypes issref].include?(table)
klass = table.singularize.camelize.constantize
count = klass.count
next if count.zero?
puts "#{klass.name} has #{count} records"
puts "current test is: #{path}:#{line_number}"
exit! 1
end
end
end I've only tested this on another project that uses Postgres. I'm not sure if the Oracle table names (vftypes and issref) are correct. You can verify by running Since all tests are run in the same order every time, in alphabetical order by folder, if you can get the script to fail at the same place every time, it will mean that the test before the one labeled as "current test" is the one that didn't clean the DB. cc @kevmo |
Checking this off because we have made improvements to increased build times, which don't seem connected to this PR but due to an increase in # of tests.
This is the last part of this, which should be fairly minor, so I'm adding the |
During engineering huddle last week, the team decided to explore rolling back the change to selectively clean the test databases (introduced in #11470) because we believe the selective cleaning has introduced additional "flaky" tests and not reduced the runtime of our rspec test suite in Circle CI.
Acceptance criteria
rspec
build runtimes have not increased dramatically.If
rspec
build runtimes do not increase significantly:...database_cleaner
includes and:all_dbs
and:postgres
arguments from test files that includerails_helper
origin/master
$> diff <(git grep -l 'require "support/vacols_database_cleaner"' | sort) <(git grep -l 'require "rails_helper"' | sort) | grep "<" | sed -e "s/< //g"
$> diff <(git grep -l 'require "support/database_cleaner"' | sort) <(git grep -l 'require "rails_helper"' | sort) | grep "<" | sed -e "s/< //g"
If
rspec
build runtimes increase significantly:The text was updated successfully, but these errors were encountered: