Engineering topics for discussion #10569

pkarman · 2019-04-26T13:54:03Z

A running agenda of things to discuss:

FACOLS
Guidelines for when it is okay to ignore GitHub's pre-merge status checks
Code Climate
Kibana space recycling automation (https://dsva.slack.com/archives/C2ZAMLK88/p1556580882067200)
Coding standards (design patterns, naming, etc.)
Property-based testing
How to make transitioning to unfamiliar parts of the codebase easier

lomky · 2019-06-14T19:47:21Z

Bat Team discussion
- Now that we've all been through it once, what are the teams overall thoughts?
- Tabled for a discussion post recompete

lowellrex · 2019-07-15T12:12:35Z

Managing Caseflow's multiple repositories
- What are the pain points?
- Can we consolidate any of these repos to make our lives easier?

pkarman · 2019-07-24T19:27:40Z

ZenHub/GitHub practices
- Assign yourself to your PRs
- Attach screenshot/GIF when appropriate
- In ZH, tickets in Ready for Dev pipeline may need estimation.

lpciferri · 2019-07-25T00:57:42Z

Revisiting which slack channels Sentry alerts go to. See #10493, #10538, and #3638 also.

monfresh · 2019-07-25T02:48:20Z

Testing best practices
- Things I noticed while working on speeding up the test suite that we could be doing going forward to ensure faster tests.

I'm happy to do a WW instead if the huddle is not the place for this.

anyakhvost · 2019-07-25T19:23:54Z

Code reviews - what are the reviewer's responsibilities?

jcq · 2019-07-25T19:34:27Z

Obtaining dev environment DB images w/o needing access to official AWS (due to delays in obtaining access)

lowellrex · 2019-08-09T20:03:37Z

Discuss performance of the application.
- How do we know if the app is performing poorly?
- How do we know which parts of the app are slow?
- Adding performance tests to our automated tests (Count VACOLS requests in tests and simulate latency locally #11711)

monfresh · 2019-08-12T16:18:36Z

To add on to Lowell's questions:

How do we ensure each new commit does not decrease performance?

jcq · 2019-08-20T21:30:10Z

Discuss upgrades to frontend (runtime)
Discuss possible upgrades to frontend (build tooling)

pkarman · 2019-08-22T13:12:59Z

Sentry alerts: they are not slowing down. Can we prevent them by fixing the code?

lowellrex · 2019-10-28T16:21:11Z

Should we clean VACOLS and postgres between test runs by default
- Regularly causes test failures because our human-intensive solution for identifying when we need to clean databases leaks state.
- The runtime of our rspec builds in CircleCI has not decreased meaningfully as a result of these changes. Initial discussion expected reduction in runtime from 18 minutes to ~6.5 minutes. Runtimes on CircleCI continue to be around 17 minutes.
- I propose rolling back PR Speed up test suite #11470

Created ticket to do this work: #12725

monfresh · 2019-10-28T20:18:22Z

@lowellrex I saw your comment pop up in my email. What a coincidence! I recently finished a blog post about the work I did to speed up the test suite (not yet published). I spent time measuring the time savings in Circle CI by taking the average of the longest RSpec runs for 5 builds before and after my changes, and the difference was 0.65 minutes. I also looked at the average number of builds over the past week, which was around 50. The highest number of builds I counted on a particular day was 83. Using an average of 50 builds per day, and 0.65 minutes saved per build, that's 32.5 minutes saved every day, which I would consider meaningful. Over a period of 1 year, that's 3 work weeks saved.

This doesn't take into account the time saved locally when running tests. Every engineer saves 5 seconds every time they run a single test that doesn't use VACOLS. In addition, every new test that is added that doesn't use VACOLS will run faster than it would have. If this change is reverted, the test suite will get slower over time at a faster rate, which can be evidenced by the speed difference observed locally.

Some questions I would ask:

Exactly how often do these failures happen, that are for sure due to missing :postgres/:all_dbs tags, and/or require statements?
Are the failures due to existing specs or new ones?
What is the most common issue?
- Specs are not tagged at all with either :postgres or :all_dbs
- The wrong tag was used
- Missing require statements
- The wrong require statement
Who is not properly tagging specs?
- Mostly new team members?
- An even mix of team members?
- Mostly seasoned members?
- Mostly the same folks?
Is there a way to rewrite the specs that didn't clean the DB such that they don't need one or both DBs at all?

One suggestion I would make: when writing a new test or set of tests in a file, run them twice in a row. That is a reliable way I have found to identify DB cleaning issues. Looking at test.log also helps to see which DBs are used.

monfresh · 2019-10-30T14:25:46Z

Another thing I would recommend looking into is Knapsack Pro to see how it compares to the current parallelization implementation. I was planning on trying it out, but didn't have time before I left. I hooked it up on login.gov and had a good experience with it. It's been improved since then, and they have a dynamic queue mode now that will optimize each node so that they all finish around the same time, as opposed to having one node take a minute longer than the rest, for example.

yoomlam · 2019-11-06T23:31:54Z

I notice some recurring themes. Personally, I've spent way too many hours babysitting CircleCI as I rerun 15-minute rspec tests to determine if failures are caused by me, a flakey test, or a new non-deterministically-failing test.

Some considerations:

If developers are copy-and-pasting existing test files to create new tests, we should point them to well-written tests so as to promote performant and otherwise better tests.
I notice this Testing Best Practices page. Can we automate some of these guidelines as checker or linting rules?
Can we identify relevant problems with current tests (including pitfalls and desired spec file changes), present solutions to those problems to all the engineers, and make a concerted, time-boxed effort to make those changes?

pkarman · 2019-11-07T14:08:24Z

@yoomlam I think it's fine to re-visit our approach to flakey tests as part of Huddle. Some things to be aware of:

If tests are failing on Circle CI that are not related to your PR, it's quite likely they are flakes. We track those in all the flakey tests #10516 so that's the first place I go to check if it's a known flake or something new.
The Bat Team does work on flakes as time allows. See https://github.com/department-of-veterans-affairs/caseflow/wiki/Bat-Team#flakey-tests
If a flake is blocking your PR, mark it with skip and make sure it is logged on all the flakey tests #10516
Drop a note in #appeals-engineering if you are going to work on fixing a flake, just to make sure others are aware (as someone else might also be working on a fix).

alisan16 · 2019-12-05T18:09:11Z

Closing this issue in favor of using a Google doc to track discussion

pkarman added the discussion label Apr 26, 2019

pkarman self-assigned this Apr 26, 2019

pkarman added the Type: Tech-Improvement label May 28, 2019

pkarman assigned alisan16 Sep 26, 2019

pkarman removed their assignment Nov 9, 2019

lowellrex mentioned this issue Nov 15, 2019

Clean both databases between every test #12725

Closed

5 tasks

alisan16 closed this as completed Feb 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Engineering topics for discussion #10569

Engineering topics for discussion #10569

pkarman commented Apr 26, 2019 •

edited by alisan16

Loading

lomky commented Jun 14, 2019 •

edited by pkarman

Loading

lowellrex commented Jul 15, 2019 •

edited by alisan16

Loading

pkarman commented Jul 24, 2019 •

edited

Loading

lpciferri commented Jul 25, 2019 •

edited

Loading

monfresh commented Jul 25, 2019 •

edited by pkarman

Loading

anyakhvost commented Jul 25, 2019 •

edited by pkarman

Loading

jcq commented Jul 25, 2019 •

edited by alisan16

Loading

lowellrex commented Aug 9, 2019 •

edited by alisan16

Loading

monfresh commented Aug 12, 2019

jcq commented Aug 20, 2019 •

edited by pkarman

Loading

pkarman commented Aug 22, 2019 •

edited

Loading

lowellrex commented Oct 28, 2019 •

edited

Loading

monfresh commented Oct 28, 2019

monfresh commented Oct 30, 2019

yoomlam commented Nov 6, 2019

pkarman commented Nov 7, 2019

alisan16 commented Dec 5, 2019

Engineering topics for discussion #10569

Engineering topics for discussion #10569

Comments

pkarman commented Apr 26, 2019 • edited by alisan16 Loading

lomky commented Jun 14, 2019 • edited by pkarman Loading

lowellrex commented Jul 15, 2019 • edited by alisan16 Loading

pkarman commented Jul 24, 2019 • edited Loading

lpciferri commented Jul 25, 2019 • edited Loading

monfresh commented Jul 25, 2019 • edited by pkarman Loading

anyakhvost commented Jul 25, 2019 • edited by pkarman Loading

jcq commented Jul 25, 2019 • edited by alisan16 Loading

lowellrex commented Aug 9, 2019 • edited by alisan16 Loading

monfresh commented Aug 12, 2019

jcq commented Aug 20, 2019 • edited by pkarman Loading

pkarman commented Aug 22, 2019 • edited Loading

lowellrex commented Oct 28, 2019 • edited Loading

monfresh commented Oct 28, 2019

monfresh commented Oct 30, 2019

yoomlam commented Nov 6, 2019

pkarman commented Nov 7, 2019

alisan16 commented Dec 5, 2019

pkarman commented Apr 26, 2019 •

edited by alisan16

Loading

lomky commented Jun 14, 2019 •

edited by pkarman

Loading

lowellrex commented Jul 15, 2019 •

edited by alisan16

Loading

pkarman commented Jul 24, 2019 •

edited

Loading

lpciferri commented Jul 25, 2019 •

edited

Loading

monfresh commented Jul 25, 2019 •

edited by pkarman

Loading

anyakhvost commented Jul 25, 2019 •

edited by pkarman

Loading

jcq commented Jul 25, 2019 •

edited by alisan16

Loading

lowellrex commented Aug 9, 2019 •

edited by alisan16

Loading

jcq commented Aug 20, 2019 •

edited by pkarman

Loading

pkarman commented Aug 22, 2019 •

edited

Loading

lowellrex commented Oct 28, 2019 •

edited

Loading