Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(gossipsub): feature gate metrics related code #5711

Open
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

drHuangMHT
Copy link
Contributor

@drHuangMHT drHuangMHT commented Dec 4, 2024

Description

May close #2923.

Notes & open questions

Breaking change: Contains public API changes.
I'm trying to keep the change as small as possible, which introduces a lot of duplicate code. Is there a better way?

Change checklist

  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • A changelog entry has been made in the appropriate crates

Copy link
Contributor

mergify bot commented Dec 4, 2024

This pull request has merge conflicts. Could you please resolve them @drHuangMHT? 🙏

@drHuangMHT
Copy link
Contributor Author

Oops that would probably take a while to find where the side effect is.

Copy link
Contributor

@elenaf9 elenaf9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great if we can avoid the duplicated API for each Behavior::new_* method.

Wdyt think of instead:

  • by default don't include metrics when creating a new behavior
  • havea single, feature-gated method
    fn with_metrics(&mut self, metrics_registry: &mut Registry, metrics_config: MetricsConfig,)
    to enable them? The method could optionally also take& return ownership of self, so that one could call Behavior::new(..).with_metrics(...).

@drHuangMHT
Copy link
Contributor Author

drHuangMHT commented Dec 5, 2024

* havea single, feature-gated method
  `fn with_metrics(&mut self, metrics_registry: &mut Registry, metrics_config: MetricsConfig,)`
  to enable them? The method could optionally also take& return  ownership of self, so that one could call `Behavior::new(..).with_metrics(...)`.

A mutable borrow would allow users to change the metrics inside at runtime, I'll make it taking the ownership then.

@drHuangMHT
Copy link
Contributor Author

drHuangMHT commented Dec 5, 2024

The scoring part can be tricky though. The behaviour need to provide a reference to PeerScore in order to log metrics.
I have an idea though:

  • Use a dummy Metrics struct to bypass the type check, it will never be constructed.
  • Keep the scoring code as-is(with gates on metrics related code).
  • Drop the metrics variable immediately in metric_score when the feature is not enabled to silence unused variable warning.

@elenaf9
Copy link
Contributor

elenaf9 commented Dec 5, 2024

The scoring part can be tricky though. The behaviour need to provide a reference to PeerScore in order to log metrics. I have an idea though:

  • Use a dummy Metrics struct to bypass the type check, it will never be constructed.
  • Keep the scoring code as-is(with gates on metrics related code).
  • Drop the metrics variable immediately in metric_score when the feature is not enabled to silence unused variable warning.

Wdyt of:

  • score doesn't do anything with metrics, but instead just returns a tuple (usize, usize) with the number of MessageDeficit and IPColocations that happened.
  • metrics_score calls score, and based on the returned counters updates the metrics?

I usually try to avoid such tuple return types, but in this specific case it may be cleaner than constructing a dummy structure?
We could also wrap them in newtypes (MessageDeficit(usize), IpColocation(usize)).

@drHuangMHT
Copy link
Contributor Author

drHuangMHT commented Dec 5, 2024

Wdyt of:

  • score doesn't do anything with metrics, but instead just returns a tuple (usize, usize) with the number of MessageDeficit and IPColocations that happened.

There are 98 places where score is called(93 in tests though), it is not very easy to make changes to its signature directly. We can define a new function instead and let score call it.

  • metrics_score calls score, and based on the returned counters updates the metrics?

No problem.
EDIT: Now I think about it, if the feature is not enabled, the counters are basically useless and you still pay the price to keep them. We can live with it no problem, but it is not necessary, that's the point.

I usually try to avoid such tuple return types, but in this specific case it may be cleaner than constructing a dummy structure? We could also wrap them in newtypes (MessageDeficit(usize), IpColocation(usize)).

Just out of curiosity: do alias work in this case?

@elenaf9
Copy link
Contributor

elenaf9 commented Dec 5, 2024

Hmm, I just noticed that score / metric_score already return an f64... then it's even less ideal to return two additional integers...
Given that in both cases where we log a metric we register a Penalty, the method could also return a Vec<Penalty>, that the score_metric function then iterates on.

There are 98 places where score is called(93 in tests though), it is not very easy to make changes to its signature directly. We can define a new function instead and let score call it.

Good idea 👍.

Now I think about it, if the feature is not enabled, the counters are basically useless and you still pay the price to keep them. We can live with it no problem, but it is not necessary, that's the point.

If we do the vec, we could only push something into the vec if metrics are enabled. I would assume that the compiler then just optimizes it away when the feature is not enabled.
But doing it like this is also a bit hacky and would require proper documentation.

@drHuangMHT
Copy link
Contributor Author

If we do the vec, we could only push something into the vec if metrics are enabled. I would assume that the compiler then just optimizes it away when the feature is not enabled. But doing it like this is also a bit hacky and would require proper documentation.

The penalties are fungible, and I don't think the ordering is of any importance, so a number will suffice. And yeah we don't have to increment the counter when the feature is not enable, the impact will be minimal.

@drHuangMHT
Copy link
Contributor Author

Well this probably also improves performance now that the increment by 1 won't be called repeatedly.

@drHuangMHT drHuangMHT requested a review from elenaf9 December 6, 2024 03:22
Copy link
Contributor

@elenaf9 elenaf9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One comment, rest LGTM.
I'd still like to wait for @jxs's review since he's more familiar with gossipsub than I am.

protocols/gossipsub/src/peer_score.rs Outdated Show resolved Hide resolved
protocols/gossipsub/src/peer_score.rs Outdated Show resolved Hide resolved
Copy link
Member

@jxs jxs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi and sorry for the delay! Thanks for the inititative @drHuangMHT, and @elenaf9 for the review.

I am torn on this one, I know #2923 was opened and that gossipsub uses metrics differently than any other behaviour, but I wonder if there there is any use case for usage of gossipsub without metrics where the the dependency on prometheus-client creates significant problems? Is that your case @drHuangMHT?

Cause I think this PR makes it even more confusing to understand the scoring system, with variables declared in places for metrics like here where it's not clear why it's outside the metrics scope just below. and duplicated functions used only for metrics (remove_peer_from_mesh) where it's also not clear why the whole code needs to be duplicated.
If we really wanna make prometheus-client a optional dependency can we try to first decouple the metrics logic from the scoring system? So that when we make metrics feature gated it's only

#[cfg(feature = "metrics")]
if let Some(metrics) = self.metrics.as_mut() {
    // code
}

protocols/gossipsub/src/rpc.rs Outdated Show resolved Hide resolved
protocols/gossipsub/src/rpc.rs Outdated Show resolved Hide resolved
@drHuangMHT
Copy link
Contributor Author

I am torn on this one, I know #2923 was opened and that gossipsub uses metrics differently than any other behaviour, but I wonder if there there is any use case for usage of gossipsub without metrics where the the dependency on prometheus-client creates significant problems? Is that your case @drHuangMHT?

No, I just picked the issue up. But just as thomas said it wouldn't be a bad thing to do.

Cause I think this PR makes it even more confusing to understand the scoring system, with variables declared in places for metrics like here where it's not clear why it's outside the metrics scope just below.

Ugh because another block of gated code used the same variable. If I moved it in there will be a scoping problem. I'll look into that to see what I can do. I'm keeping them separate because I don't want to change the original code structure too much.

and duplicated functions used only for metrics (remove_peer_from_mesh) where it's also not clear why the whole code needs to be duplicated.

Oops that's probably an oversight. I'll look into it, the duplication probably isn't necessary.

If we really wanna make prometheus-client a optional dependency can we try to first decouple the metrics logic from the scoring system? So that when we make metrics feature gated it's only

I'll try.

@drHuangMHT drHuangMHT requested a review from jxs December 10, 2024 09:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

protocols/gossipsub: Consider feature-gating the metrics support
3 participants