Normalize memory cache values #119

kbaum · 2018-09-21T01:27:04Z

This PR does two things:

First the memory cache filters out files that have no new coverage. Previously the memory cache was not filtering out files that were hitting the same lines over and over since the value of the line was being incremented.

Second, this PR removes the difference calculation on the coverage collector. With the memory cache enabled, I'm not sure why we would need something calculating the difference in coverage. I think we can just send the latest coverage from ruby ::Coverage.peek_result to the underlying store. If we go with this approach perhaps we should always have memory cache enabled?

We only want to report to coverband if a new line is covered.

Initializing with what is in redis does not make sense. Suppose within redis the file has coverage. On the first request, the file has no coverage, we will then pass this request to the underlying store.

This was resulting in memory cache incorrectly caching coverage.

danmayer · 2018-09-21T18:57:54Z

I guess we need to decide the goal... binary was this line run or not or an accurate number of times a line is hit. There were some requests and PRs to have Coverband support line usage counts vs just hit or not hit... If you just send the peek_results directly every time the data is reported the count would include the number of times a line was hit and all the past times it was hit... This was actually an issue when I first moved to Coverage...

start server
request page
report coverage
- expectation is the controller code is hit once, which is what peek_results shows
- coverband redis will increment from 0 -> 1
request same page
report coverage
- expectation is controller code is hit twice, which is what peek_results shows
- coverband redis should move from 1 -> 2, so we don't want to do 1 + 2 -> 3
- You don't want to just write 2 directly to the redis store as it is receiving data from multiple processes/threads. If you do that the latest process or thread would overwrite all the other code usage from others.
- If it the server was all single process / thread and we only cared about coverage since deploy we could just dump peak results to redis or even a S3 file in is the entirety on a given interval, but I don't think that works across processes, deploys, etc.

Hopefully, that highlights the issue I see... but perhaps I am not following you correctly

danmayer · 2018-09-21T18:58:11Z

let me know your thoughts @kbaum

kbaum · 2018-09-21T22:59:03Z

@danmayer - Excellent explanation on why the coverage collector is calculating differences.

In our case, we only care where the line was hit or not. Using this approach has potential for major performance improvements as we are only hitting redis when new code paths have been covered. I think it makes sense to offer this as an option since my bet is others are less concerned with tracking how many times a line was hit. I wonder what the use case is around tracking the exact number of times a line was hit? Is it worth the performance cost?

Another thought is that maybe this all becomes a moot point if we move toward the coverage being reported to redis within a background thread?

danmayer · 2018-09-24T06:03:37Z

yeah, I think as we work on the background thread and drop some other complexity we should be able to remove most of the perf hit for it. We could make it an optional feature, but given one of the goals is to simplify install / usage, let's hold off on optional unless there is a good perf or other reason.

In terms of why it is valuable, I have used it to find hot spots in the code... ex Middleware that is hit on every request but doesn't need to be. Same for before filters and around filters. Generally finding code that is executed on nearly every request is a good place to start for some optimization. The other thing I haven't done but have always considered useful would be to compare the highest hit production code vs test coverage. If something is on the critical path on production it could justify additional testing.

Agree that the most valuable is hit or not hit, but overall usage ratio from very often to barely at all can be helpful. Maintaining code that is hit once a month at some point might be more of a liability than of value, depending on use cases.

kbaum added 3 commits September 20, 2018 14:18

Normalize memory cache files

7438d13

We only want to report to coverband if a new line is covered.

Do not pre-initialize with what is in store

1bd8d9a

Initializing with what is in redis does not make sense. Suppose within redis the file has coverage. On the first request, the file has no coverage, we will then pass this request to the underlying store.

Remove difference calc in coverage collector

2cb9764

This was resulting in memory cache incorrectly caching coverage.

kbaum requested a review from danmayer September 21, 2018 01:27

kbaum mentioned this pull request Sep 28, 2018

Background Coverage Data Reporting #124

Closed

kbaum closed this Oct 8, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Normalize memory cache values #119

Normalize memory cache values #119

kbaum commented Sep 21, 2018

danmayer commented Sep 21, 2018

danmayer commented Sep 21, 2018

kbaum commented Sep 21, 2018

danmayer commented Sep 24, 2018

Normalize memory cache values #119

Normalize memory cache values #119

Conversation

kbaum commented Sep 21, 2018

danmayer commented Sep 21, 2018

danmayer commented Sep 21, 2018

kbaum commented Sep 21, 2018

danmayer commented Sep 24, 2018