Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normalize memory cache values #119

Closed
wants to merge 3 commits into from
Closed

Conversation

kbaum
Copy link
Collaborator

@kbaum kbaum commented Sep 21, 2018

This PR does two things:

First the memory cache filters out files that have no new coverage. Previously the memory cache was not filtering out files that were hitting the same lines over and over since the value of the line was being incremented.

Second, this PR removes the difference calculation on the coverage collector. With the memory cache enabled, I'm not sure why we would need something calculating the difference in coverage. I think we can just send the latest coverage from ruby ::Coverage.peek_result to the underlying store. If we go with this approach perhaps we should always have memory cache enabled?

We only want to report to coverband if a new
line is covered.
Initializing with what is in redis does not make sense.
Suppose within redis the file has coverage. On the first
request, the file has no coverage, we will then
pass this request to the underlying store.
This was resulting in memory cache incorrectly
caching coverage.
@kbaum kbaum requested a review from danmayer September 21, 2018 01:27
@danmayer
Copy link
Owner

I guess we need to decide the goal... binary was this line run or not or an accurate number of times a line is hit. There were some requests and PRs to have Coverband support line usage counts vs just hit or not hit... If you just send the peek_results directly every time the data is reported the count would include the number of times a line was hit and all the past times it was hit... This was actually an issue when I first moved to Coverage...

  • start server
  • request page
  • report coverage
    • expectation is the controller code is hit once, which is what peek_results shows
    • coverband redis will increment from 0 -> 1
  • request same page
  • report coverage
    • expectation is controller code is hit twice, which is what peek_results shows
    • coverband redis should move from 1 -> 2, so we don't want to do 1 + 2 -> 3
    • You don't want to just write 2 directly to the redis store as it is receiving data from multiple processes/threads. If you do that the latest process or thread would overwrite all the other code usage from others.
    • If it the server was all single process / thread and we only cared about coverage since deploy we could just dump peak results to redis or even a S3 file in is the entirety on a given interval, but I don't think that works across processes, deploys, etc.

Hopefully, that highlights the issue I see... but perhaps I am not following you correctly

@danmayer
Copy link
Owner

let me know your thoughts @kbaum

@kbaum
Copy link
Collaborator Author

kbaum commented Sep 21, 2018

@danmayer - Excellent explanation on why the coverage collector is calculating differences.

In our case, we only care where the line was hit or not. Using this approach has potential for major performance improvements as we are only hitting redis when new code paths have been covered. I think it makes sense to offer this as an option since my bet is others are less concerned with tracking how many times a line was hit. I wonder what the use case is around tracking the exact number of times a line was hit? Is it worth the performance cost?

Another thought is that maybe this all becomes a moot point if we move toward the coverage being reported to redis within a background thread?

@danmayer
Copy link
Owner

yeah, I think as we work on the background thread and drop some other complexity we should be able to remove most of the perf hit for it. We could make it an optional feature, but given one of the goals is to simplify install / usage, let's hold off on optional unless there is a good perf or other reason.

In terms of why it is valuable, I have used it to find hot spots in the code... ex Middleware that is hit on every request but doesn't need to be. Same for before filters and around filters. Generally finding code that is executed on nearly every request is a good place to start for some optimization. The other thing I haven't done but have always considered useful would be to compare the highest hit production code vs test coverage. If something is on the critical path on production it could justify additional testing.

Agree that the most valuable is hit or not hit, but overall usage ratio from very often to barely at all can be helpful. Maintaining code that is hit once a month at some point might be more of a liability than of value, depending on use cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants