Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Engineering topics for discussion #10569

Closed
7 tasks done
pkarman opened this issue Apr 26, 2019 · 17 comments
Closed
7 tasks done

Engineering topics for discussion #10569

pkarman opened this issue Apr 26, 2019 · 17 comments

Comments

@pkarman
Copy link
Contributor

pkarman commented Apr 26, 2019

A running agenda of things to discuss:

  • FACOLS
  • Guidelines for when it is okay to ignore GitHub's pre-merge status checks
  • Code Climate
  • Kibana space recycling automation (https://dsva.slack.com/archives/C2ZAMLK88/p1556580882067200)
  • Coding standards (design patterns, naming, etc.)
  • Property-based testing
  • How to make transitioning to unfamiliar parts of the codebase easier
@lomky
Copy link
Contributor

lomky commented Jun 14, 2019

  • Bat Team discussion
    • Now that we've all been through it once, what are the teams overall thoughts?
    • Tabled for a discussion post recompete

@lowellrex
Copy link
Contributor

lowellrex commented Jul 15, 2019

@pkarman
Copy link
Contributor Author

pkarman commented Jul 24, 2019

  • ZenHub/GitHub practices
    • Assign yourself to your PRs
    • Attach screenshot/GIF when appropriate
    • In ZH, tickets in Ready for Dev pipeline may need estimation.

@lpciferri
Copy link

lpciferri commented Jul 25, 2019

Revisiting which slack channels Sentry alerts go to. See #10493, #10538, and #3638 also.

@monfresh
Copy link
Contributor

monfresh commented Jul 25, 2019

  • Testing best practices
    • Things I noticed while working on speeding up the test suite that we could be doing going forward to ensure faster tests.

I'm happy to do a WW instead if the huddle is not the place for this.

@anyakhvost
Copy link

anyakhvost commented Jul 25, 2019

  • Code reviews - what are the reviewer's responsibilities?

@jcq
Copy link
Contributor

jcq commented Jul 25, 2019

  • Obtaining dev environment DB images w/o needing access to official AWS (due to delays in obtaining access)

@lowellrex
Copy link
Contributor

lowellrex commented Aug 9, 2019

@monfresh
Copy link
Contributor

To add on to Lowell's questions:

  • How do we ensure each new commit does not decrease performance?

@jcq
Copy link
Contributor

jcq commented Aug 20, 2019

  • Discuss upgrades to frontend (runtime)
  • Discuss possible upgrades to frontend (build tooling)

@pkarman
Copy link
Contributor Author

pkarman commented Aug 22, 2019

  • Sentry alerts: they are not slowing down. Can we prevent them by fixing the code?

@lowellrex
Copy link
Contributor

lowellrex commented Oct 28, 2019

  • Should we clean VACOLS and postgres between test runs by default
    • Regularly causes test failures because our human-intensive solution for identifying when we need to clean databases leaks state.
    • The runtime of our rspec builds in CircleCI has not decreased meaningfully as a result of these changes. Initial discussion expected reduction in runtime from 18 minutes to ~6.5 minutes. Runtimes on CircleCI continue to be around 17 minutes.
    • I propose rolling back PR Speed up test suite #11470

Created ticket to do this work: #12725

@monfresh
Copy link
Contributor

@lowellrex I saw your comment pop up in my email. What a coincidence! I recently finished a blog post about the work I did to speed up the test suite (not yet published). I spent time measuring the time savings in Circle CI by taking the average of the longest RSpec runs for 5 builds before and after my changes, and the difference was 0.65 minutes. I also looked at the average number of builds over the past week, which was around 50. The highest number of builds I counted on a particular day was 83. Using an average of 50 builds per day, and 0.65 minutes saved per build, that's 32.5 minutes saved every day, which I would consider meaningful. Over a period of 1 year, that's 3 work weeks saved.

This doesn't take into account the time saved locally when running tests. Every engineer saves 5 seconds every time they run a single test that doesn't use VACOLS. In addition, every new test that is added that doesn't use VACOLS will run faster than it would have. If this change is reverted, the test suite will get slower over time at a faster rate, which can be evidenced by the speed difference observed locally.

Some questions I would ask:

  • Exactly how often do these failures happen, that are for sure due to missing :postgres/:all_dbs tags, and/or require statements?
  • Are the failures due to existing specs or new ones?
  • What is the most common issue?
    • Specs are not tagged at all with either :postgres or :all_dbs
    • The wrong tag was used
    • Missing require statements
    • The wrong require statement
  • Who is not properly tagging specs?
    • Mostly new team members?
    • An even mix of team members?
    • Mostly seasoned members?
    • Mostly the same folks?
  • Is there a way to rewrite the specs that didn't clean the DB such that they don't need one or both DBs at all?

One suggestion I would make: when writing a new test or set of tests in a file, run them twice in a row. That is a reliable way I have found to identify DB cleaning issues. Looking at test.log also helps to see which DBs are used.

@monfresh
Copy link
Contributor

Another thing I would recommend looking into is Knapsack Pro to see how it compares to the current parallelization implementation. I was planning on trying it out, but didn't have time before I left. I hooked it up on login.gov and had a good experience with it. It's been improved since then, and they have a dynamic queue mode now that will optimize each node so that they all finish around the same time, as opposed to having one node take a minute longer than the rest, for example.

@yoomlam
Copy link
Contributor

yoomlam commented Nov 6, 2019

I notice some recurring themes. Personally, I've spent way too many hours babysitting CircleCI as I rerun 15-minute rspec tests to determine if failures are caused by me, a flakey test, or a new non-deterministically-failing test.

Some considerations:

  1. If developers are copy-and-pasting existing test files to create new tests, we should point them to well-written tests so as to promote performant and otherwise better tests.
  2. I notice this Testing Best Practices page. Can we automate some of these guidelines as checker or linting rules?
  3. Can we identify relevant problems with current tests (including pitfalls and desired spec file changes), present solutions to those problems to all the engineers, and make a concerted, time-boxed effort to make those changes?

@pkarman
Copy link
Contributor Author

pkarman commented Nov 7, 2019

@yoomlam I think it's fine to re-visit our approach to flakey tests as part of Huddle. Some things to be aware of:

@pkarman pkarman removed their assignment Nov 9, 2019
@alisan16
Copy link
Contributor

alisan16 commented Dec 5, 2019

Closing this issue in favor of using a Google doc to track discussion

@alisan16 alisan16 closed this as completed Feb 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants