How to avoid coverage drift? #118

kbaum · 2018-09-17T12:48:06Z

This is more of an idea than an issue. If we include the checksum as the key for each file within the redis store, we would remove the need for the user clearing the store. For example:

Digest::MD5.file('app/models/user.rb').hexdigest

This would also remove confusion around a files coverage not being accurate when the file has been changed but the store has not been cleared.

In order to avoid stale files hanging out forever within redis, we could also just supply a high default ttl and the stale files should automatically be cleaned up by redis.

danmayer · 2018-09-18T02:54:51Z

interesting idea, we do have a ttl option but it defaults to forever... so something like 1 year as a high default would work...

I like the idea, I had been considering namespacing by deployed git hash for a while, but this in some ways is better as it doesn't impact files that weren't changed between deployments.

danmayer · 2018-11-03T18:48:56Z

updated title from "Include checksum as part of key for each file in store"

danmayer · 2018-11-03T20:48:34Z

I rephrased this as I think there are a couple different things that could be helpful.

One thing was that clearing Coverband coverage can be confusing as people don't know or trust how long the data has been there... The most often reason for clearing is because of the coverage drift where line numbers don't match the captured data anymore.

While the checksum solution solves the drift issue, it does cause a slightly different trust issue. Basically, that a frequently changed file might no have a very long history showing usage... So it could look like it isn't used if it was just part of a not often run cron job or something like that.

I guess in a perfect world we would really have line number level MD5 and only throw out coverage when the actual line has moved / changed. This is not realistic.

Ideas to help with understanding the time range that the coverage data is valid for, and to avoid the drift.

store some metadata like when the file first started to record coverage data...
tie this with the MD5 idea

update the UI so for any file you would have something like

app/controllers/users_controller.rb version XYZ first seen on 09-15-2018 08:05am, last coverage updated on 10-03-2018 08:05am

In theory, one could actually users see all versions of a file and navigate back and forth... seeing coverage change over file history... over being able to generate a coverage report based on time range... Show me coverage data between date X and Y.

I had been considering similar ideas based on git hashes at time of deploy... so you could query coverage based on related git versions or release time... but I do think the MD5 version has advantages as it wouldn't be tied to any code repository or deployment methods.

The benchmarks you shared @kbaum make it seem reasonably fast... I think when first introduced it still probably makes sense to do as an optional feature.

Related: I believe one still might want to be able to clear all coverage data, as changing routes or controllers could mean downstream models no longer receive coverage... That being said if we add the metadata about coverage time alongside the MD5 hash, this is where coverage queries over time could be very powerful... Never clear, but query the time of interest and know that MD5 shows only hits on most recent versions of files.

kbaum · 2018-11-04T13:39:40Z

Love the idea of tying the md5 change with updating the UI to display "first seen" and "coverage last updated". Showing previous versions of a file would also be a great feature though not a must have for the first release.

Seeing coverage data over time could be useful eventually but not sure it's worth the complexity it introduces. I would imagine the size of the data stored in redis would greatly increase since we would need to store snapshots of data.

I agree that one might want to clear coverage data but with the features discussed hear, it feels like the exception.

danmayer · 2018-11-04T15:51:45Z

Agreed on avoiding complexity at first, I think MD5 and metadata but only showing the current version would be a good first version of this. Showing old versions introduces some complexity on the code viewing side of the report so that might not ever be worth adding.

kbaum · 2018-11-15T00:51:19Z

@danmayer guess we can close this one?

danmayer · 2018-11-16T19:26:34Z

yeah just added the notes about putting meta data in the views later into the future changes file...

nassredean mentioned this issue Oct 8, 2018

Include checksum as part of key for each file in store #128

Closed

danmayer changed the title ~~Include checksum as part of key for each file in store~~ How to avoid coverage drift? Nov 3, 2018

danmayer added the enhancement label Nov 9, 2018

danmayer self-assigned this Nov 9, 2018

danmayer mentioned this issue Nov 11, 2018

use md5 to avoid coverage drift on changes #137

Merged

danmayer closed this as completed Nov 16, 2018

maxp-hover mentioned this issue Jan 13, 2020

"Coverage Drift" protection doesn't seem to work #366

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to avoid coverage drift? #118

How to avoid coverage drift? #118

kbaum commented Sep 17, 2018

danmayer commented Sep 18, 2018

danmayer commented Nov 3, 2018

danmayer commented Nov 3, 2018

kbaum commented Nov 4, 2018

danmayer commented Nov 4, 2018

kbaum commented Nov 15, 2018

danmayer commented Nov 16, 2018

How to avoid coverage drift? #118

How to avoid coverage drift? #118

Comments

kbaum commented Sep 17, 2018

danmayer commented Sep 18, 2018

danmayer commented Nov 3, 2018

danmayer commented Nov 3, 2018

kbaum commented Nov 4, 2018

danmayer commented Nov 4, 2018

kbaum commented Nov 15, 2018

danmayer commented Nov 16, 2018