-
Notifications
You must be signed in to change notification settings - Fork 247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(telemetry)_: add metrics for message reliability #5899
Conversation
Jenkins BuildsClick to see older builds (66)
|
Codecov ReportAttention: Patch coverage is
✅ All tests successful. No failed tests found.
Additional details and impacted files@@ Coverage Diff @@
## develop #5899 +/- ##
===========================================
+ Coverage 47.20% 47.61% +0.40%
===========================================
Files 840 842 +2
Lines 138167 138390 +223
===========================================
+ Hits 65225 65889 +664
+ Misses 65406 64704 -702
- Partials 7536 7797 +261
Flags with carried forward coverage won't be shown. Click here to find out more.
|
b3b0f2e
to
ace5b50
Compare
ace5b50
to
73a8f56
Compare
73a8f56
to
6a50b1a
Compare
telemetry/client.go
Outdated
PeerCountByOriginMetric TelemetryType = "PeerCountByOrigin" | ||
DialFailureMetric TelemetryType = "DialFailure" | ||
MissedMessageMetric TelemetryType = "MissedMessages" | ||
MissedRelevantMessageMetric TelemetryType = "MissedRelevantMessages" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great to add some comments(like how it works) on these MissedXXX types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added comments for each type indicating what is being tracked
6a50b1a
to
dff912f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
telemetry/client.go
Outdated
postBody["errorType"] = dialFailure.ErrorType | ||
postBody["errorMsg"] = dialFailure.ErrorMsg | ||
postBody["protocols"] = dialFailure.Protocols | ||
body, _ := json.Marshal(postBody) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why ignore the error? 🤔
wakuv2/common/helpers.go
Outdated
for _, attempt := range dialAttempts { | ||
attempt = strings.TrimSpace(strings.Trim(attempt, "* ")) | ||
matches := reAttempt.FindStringSubmatch(attempt) | ||
if len(matches) == 3 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if len(matches) == 3 { | |
if len(matches) != 3 { | |
continue | |
} | |
// ... |
func (det DialErrorType) String() string { | ||
return [...]string{ | ||
"Unknown", | ||
"I/O Timeout", | ||
"Connection Refused", | ||
"Relay Circuit Failed", | ||
"Relay No Reservation", | ||
"Security Negotiation Failed", | ||
"Concurrent Dial Succeeded", | ||
"Concurrent Dial Failed", | ||
"Connections Per IP Limit Exceeded", | ||
"Stream Reset", | ||
"Relay Resource Limit Exceeded", | ||
"Error Opening Hop Stream to Relay", | ||
"Dial Backoff", | ||
}[det] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though I like the hack, I think map[string]string
is a more reliable solution in terms of potential changes. And the performance should remain the same.
dff912f
to
70672a9
Compare
Add metrics for dial errors, missed messages, missed relevant messages, and confirmed delivery.
70672a9
to
06e9025
Compare
author shashankshampi <[email protected]> 1729780155 +0530 committer shashankshampi <[email protected]> 1730274350 +0530 test: Code Migration from status-cli-tests fix_: functional tests (#5979) * fix_: generate on test-functional * chore(test)_: fix functional test assertion --------- Co-authored-by: Siddarth Kumar <[email protected]> feat(accounts)_: cherry-pick Persist acceptance of Terms of Use & Privacy policy (#5766) (#5977) * feat(accounts)_: Persist acceptance of Terms of Use & Privacy policy (#5766) The original GH issue status-im/status-mobile#21113 came from a request from the Legal team. We must show to Status v1 users the new terms (Terms of Use & Privacy Policy) right after they upgrade to Status v2 from the stores. The solution we use is to create a flag in the accounts table, named hasAcceptedTerms. The flag will be set to true on the first account ever created in v2 and we provide a native call in mobile/status.go#AcceptTerms, which allows the client to persist the user's choice in case they are upgrading (from v1 -> v2, or from a v2 older than this PR). This solution is not the best because we should store the setting in a separate table, not in the accounts table. Related Mobile PR status-im/status-mobile#21124 * fix(test)_: Compare addresses using uppercased strings --------- Co-authored-by: Icaro Motta <[email protected]> test_: restore account (#5960) feat_: `LogOnPanic` linter (#5969) * feat_: LogOnPanic linter * fix_: add missing defer LogOnPanic * chore_: make vendor * fix_: tests, address pr comments * fix_: address pr comments fix(ci)_: remove workspace and tmp dir This ensures we do not encounter weird errors like: ``` + ln -s /home/jenkins/workspace/go_prs_linux_x86_64_main_PR-5907 /home/jenkins/workspace/go_prs_linux_x86_64_main_PR-5907@tmp/go/src/github.com/status-im/status-go ln: failed to create symbolic link '/home/jenkins/workspace/go_prs_linux_x86_64_main_PR-5907@tmp/go/src/github.com/status-im/status-go': File exists script returned exit code 1 ``` Signed-off-by: Jakub Sokołowski <[email protected]> chore_: enable windows and macos CI build (#5840) - Added support for Windows and macOS in CI pipelines - Added missing dependencies for Windows and x86-64-darwin - Resolved macOS SDK version compatibility for darwin-x86_64 The `mkShell` override was necessary to ensure compatibility with the newer macOS SDK (version 11.0) for x86_64. The default SDK (10.12) was causing build failures because of the missing libs and frameworks. OverrideSDK creates a mapping from the default SDK in all package categories to the requested SDK (11.0). fix(contacts)_: fix trust status not being saved to cache when changed (#5965) Fixes status-im/status-desktop#16392 cleanup added logger and cleanup review comments changes fix_: functional tests (#5979) * fix_: generate on test-functional * chore(test)_: fix functional test assertion --------- Co-authored-by: Siddarth Kumar <[email protected]> feat(accounts)_: cherry-pick Persist acceptance of Terms of Use & Privacy policy (#5766) (#5977) * feat(accounts)_: Persist acceptance of Terms of Use & Privacy policy (#5766) The original GH issue status-im/status-mobile#21113 came from a request from the Legal team. We must show to Status v1 users the new terms (Terms of Use & Privacy Policy) right after they upgrade to Status v2 from the stores. The solution we use is to create a flag in the accounts table, named hasAcceptedTerms. The flag will be set to true on the first account ever created in v2 and we provide a native call in mobile/status.go#AcceptTerms, which allows the client to persist the user's choice in case they are upgrading (from v1 -> v2, or from a v2 older than this PR). This solution is not the best because we should store the setting in a separate table, not in the accounts table. Related Mobile PR status-im/status-mobile#21124 * fix(test)_: Compare addresses using uppercased strings --------- Co-authored-by: Icaro Motta <[email protected]> test_: restore account (#5960) feat_: `LogOnPanic` linter (#5969) * feat_: LogOnPanic linter * fix_: add missing defer LogOnPanic * chore_: make vendor * fix_: tests, address pr comments * fix_: address pr comments chore_: enable windows and macos CI build (#5840) - Added support for Windows and macOS in CI pipelines - Added missing dependencies for Windows and x86-64-darwin - Resolved macOS SDK version compatibility for darwin-x86_64 The `mkShell` override was necessary to ensure compatibility with the newer macOS SDK (version 11.0) for x86_64. The default SDK (10.12) was causing build failures because of the missing libs and frameworks. OverrideSDK creates a mapping from the default SDK in all package categories to the requested SDK (11.0). fix(contacts)_: fix trust status not being saved to cache when changed (#5965) Fixes status-im/status-desktop#16392 test_: remove port bind chore(wallet)_: move route execution code to separate module chore_: replace geth logger with zap logger (#5962) closes: #6002 feat(telemetry)_: add metrics for message reliability (#5899) * feat(telemetry)_: track message reliability Add metrics for dial errors, missed messages, missed relevant messages, and confirmed delivery. * fix_: handle error from json marshal chore_: use zap logger as request logger iterates: status-im/status-desktop#16536 test_: unique project per run test_: use docker compose v2, more concrete project name fix(codecov)_: ignore folders without tests Otherwise Codecov reports incorrect numbers when making changes. https://docs.codecov.com/docs/ignoring-paths Signed-off-by: Jakub Sokołowski <[email protected]> test_: verify schema of signals during init; fix schema verification warnings (#5947) fix_: update defaultGorushURL (#6011) fix(tests)_: use non-standard port to avoid conflicts We have observed `nimbus-eth2` build failures reporting this port: ```json { "lvl": "NTC", "ts": "2024-10-28 13:51:32.308+00:00", "msg": "REST HTTP server could not be started", "topics": "beacnde", "address": "127.0.0.1:5432", "reason": "(98) Address already in use" } ``` https://ci.status.im/job/nimbus-eth2/job/platforms/job/linux/job/x86_64/job/main/job/PR-6683/3/ Signed-off-by: Jakub Sokołowski <[email protected]> fix_: create request logger ad-hoc in tests Fixes `TestCall` failing when run concurrently. chore_: configure codecov (#6005) * chore_: configure codecov * fix_: after_n_builds
author shashankshampi <[email protected]> 1729780155 +0530 committer shashankshampi <[email protected]> 1730274350 +0530 test: Code Migration from status-cli-tests fix_: functional tests (#5979) * fix_: generate on test-functional * chore(test)_: fix functional test assertion --------- Co-authored-by: Siddarth Kumar <[email protected]> feat(accounts)_: cherry-pick Persist acceptance of Terms of Use & Privacy policy (#5766) (#5977) * feat(accounts)_: Persist acceptance of Terms of Use & Privacy policy (#5766) The original GH issue status-im/status-mobile#21113 came from a request from the Legal team. We must show to Status v1 users the new terms (Terms of Use & Privacy Policy) right after they upgrade to Status v2 from the stores. The solution we use is to create a flag in the accounts table, named hasAcceptedTerms. The flag will be set to true on the first account ever created in v2 and we provide a native call in mobile/status.go#AcceptTerms, which allows the client to persist the user's choice in case they are upgrading (from v1 -> v2, or from a v2 older than this PR). This solution is not the best because we should store the setting in a separate table, not in the accounts table. Related Mobile PR status-im/status-mobile#21124 * fix(test)_: Compare addresses using uppercased strings --------- Co-authored-by: Icaro Motta <[email protected]> test_: restore account (#5960) feat_: `LogOnPanic` linter (#5969) * feat_: LogOnPanic linter * fix_: add missing defer LogOnPanic * chore_: make vendor * fix_: tests, address pr comments * fix_: address pr comments fix(ci)_: remove workspace and tmp dir This ensures we do not encounter weird errors like: ``` + ln -s /home/jenkins/workspace/go_prs_linux_x86_64_main_PR-5907 /home/jenkins/workspace/go_prs_linux_x86_64_main_PR-5907@tmp/go/src/github.com/status-im/status-go ln: failed to create symbolic link '/home/jenkins/workspace/go_prs_linux_x86_64_main_PR-5907@tmp/go/src/github.com/status-im/status-go': File exists script returned exit code 1 ``` Signed-off-by: Jakub Sokołowski <[email protected]> chore_: enable windows and macos CI build (#5840) - Added support for Windows and macOS in CI pipelines - Added missing dependencies for Windows and x86-64-darwin - Resolved macOS SDK version compatibility for darwin-x86_64 The `mkShell` override was necessary to ensure compatibility with the newer macOS SDK (version 11.0) for x86_64. The default SDK (10.12) was causing build failures because of the missing libs and frameworks. OverrideSDK creates a mapping from the default SDK in all package categories to the requested SDK (11.0). fix(contacts)_: fix trust status not being saved to cache when changed (#5965) Fixes status-im/status-desktop#16392 cleanup added logger and cleanup review comments changes fix_: functional tests (#5979) * fix_: generate on test-functional * chore(test)_: fix functional test assertion --------- Co-authored-by: Siddarth Kumar <[email protected]> feat(accounts)_: cherry-pick Persist acceptance of Terms of Use & Privacy policy (#5766) (#5977) * feat(accounts)_: Persist acceptance of Terms of Use & Privacy policy (#5766) The original GH issue status-im/status-mobile#21113 came from a request from the Legal team. We must show to Status v1 users the new terms (Terms of Use & Privacy Policy) right after they upgrade to Status v2 from the stores. The solution we use is to create a flag in the accounts table, named hasAcceptedTerms. The flag will be set to true on the first account ever created in v2 and we provide a native call in mobile/status.go#AcceptTerms, which allows the client to persist the user's choice in case they are upgrading (from v1 -> v2, or from a v2 older than this PR). This solution is not the best because we should store the setting in a separate table, not in the accounts table. Related Mobile PR status-im/status-mobile#21124 * fix(test)_: Compare addresses using uppercased strings --------- Co-authored-by: Icaro Motta <[email protected]> test_: restore account (#5960) feat_: `LogOnPanic` linter (#5969) * feat_: LogOnPanic linter * fix_: add missing defer LogOnPanic * chore_: make vendor * fix_: tests, address pr comments * fix_: address pr comments chore_: enable windows and macos CI build (#5840) - Added support for Windows and macOS in CI pipelines - Added missing dependencies for Windows and x86-64-darwin - Resolved macOS SDK version compatibility for darwin-x86_64 The `mkShell` override was necessary to ensure compatibility with the newer macOS SDK (version 11.0) for x86_64. The default SDK (10.12) was causing build failures because of the missing libs and frameworks. OverrideSDK creates a mapping from the default SDK in all package categories to the requested SDK (11.0). fix(contacts)_: fix trust status not being saved to cache when changed (#5965) Fixes status-im/status-desktop#16392 test_: remove port bind chore(wallet)_: move route execution code to separate module chore_: replace geth logger with zap logger (#5962) closes: #6002 feat(telemetry)_: add metrics for message reliability (#5899) * feat(telemetry)_: track message reliability Add metrics for dial errors, missed messages, missed relevant messages, and confirmed delivery. * fix_: handle error from json marshal chore_: use zap logger as request logger iterates: status-im/status-desktop#16536 test_: unique project per run test_: use docker compose v2, more concrete project name fix(codecov)_: ignore folders without tests Otherwise Codecov reports incorrect numbers when making changes. https://docs.codecov.com/docs/ignoring-paths Signed-off-by: Jakub Sokołowski <[email protected]> test_: verify schema of signals during init; fix schema verification warnings (#5947) fix_: update defaultGorushURL (#6011) fix(tests)_: use non-standard port to avoid conflicts We have observed `nimbus-eth2` build failures reporting this port: ```json { "lvl": "NTC", "ts": "2024-10-28 13:51:32.308+00:00", "msg": "REST HTTP server could not be started", "topics": "beacnde", "address": "127.0.0.1:5432", "reason": "(98) Address already in use" } ``` https://ci.status.im/job/nimbus-eth2/job/platforms/job/linux/job/x86_64/job/main/job/PR-6683/3/ Signed-off-by: Jakub Sokołowski <[email protected]> fix_: create request logger ad-hoc in tests Fixes `TestCall` failing when run concurrently. chore_: configure codecov (#6005) * chore_: configure codecov * fix_: after_n_builds
author shashankshampi <[email protected]> 1729780155 +0530 committer shashankshampi <[email protected]> 1730274350 +0530 test: Code Migration from status-cli-tests fix_: functional tests (#5979) * fix_: generate on test-functional * chore(test)_: fix functional test assertion --------- Co-authored-by: Siddarth Kumar <[email protected]> feat(accounts)_: cherry-pick Persist acceptance of Terms of Use & Privacy policy (#5766) (#5977) * feat(accounts)_: Persist acceptance of Terms of Use & Privacy policy (#5766) The original GH issue status-im/status-mobile#21113 came from a request from the Legal team. We must show to Status v1 users the new terms (Terms of Use & Privacy Policy) right after they upgrade to Status v2 from the stores. The solution we use is to create a flag in the accounts table, named hasAcceptedTerms. The flag will be set to true on the first account ever created in v2 and we provide a native call in mobile/status.go#AcceptTerms, which allows the client to persist the user's choice in case they are upgrading (from v1 -> v2, or from a v2 older than this PR). This solution is not the best because we should store the setting in a separate table, not in the accounts table. Related Mobile PR status-im/status-mobile#21124 * fix(test)_: Compare addresses using uppercased strings --------- Co-authored-by: Icaro Motta <[email protected]> test_: restore account (#5960) feat_: `LogOnPanic` linter (#5969) * feat_: LogOnPanic linter * fix_: add missing defer LogOnPanic * chore_: make vendor * fix_: tests, address pr comments * fix_: address pr comments fix(ci)_: remove workspace and tmp dir This ensures we do not encounter weird errors like: ``` + ln -s /home/jenkins/workspace/go_prs_linux_x86_64_main_PR-5907 /home/jenkins/workspace/go_prs_linux_x86_64_main_PR-5907@tmp/go/src/github.com/status-im/status-go ln: failed to create symbolic link '/home/jenkins/workspace/go_prs_linux_x86_64_main_PR-5907@tmp/go/src/github.com/status-im/status-go': File exists script returned exit code 1 ``` Signed-off-by: Jakub Sokołowski <[email protected]> chore_: enable windows and macos CI build (#5840) - Added support for Windows and macOS in CI pipelines - Added missing dependencies for Windows and x86-64-darwin - Resolved macOS SDK version compatibility for darwin-x86_64 The `mkShell` override was necessary to ensure compatibility with the newer macOS SDK (version 11.0) for x86_64. The default SDK (10.12) was causing build failures because of the missing libs and frameworks. OverrideSDK creates a mapping from the default SDK in all package categories to the requested SDK (11.0). fix(contacts)_: fix trust status not being saved to cache when changed (#5965) Fixes status-im/status-desktop#16392 cleanup added logger and cleanup review comments changes fix_: functional tests (#5979) * fix_: generate on test-functional * chore(test)_: fix functional test assertion --------- Co-authored-by: Siddarth Kumar <[email protected]> feat(accounts)_: cherry-pick Persist acceptance of Terms of Use & Privacy policy (#5766) (#5977) * feat(accounts)_: Persist acceptance of Terms of Use & Privacy policy (#5766) The original GH issue status-im/status-mobile#21113 came from a request from the Legal team. We must show to Status v1 users the new terms (Terms of Use & Privacy Policy) right after they upgrade to Status v2 from the stores. The solution we use is to create a flag in the accounts table, named hasAcceptedTerms. The flag will be set to true on the first account ever created in v2 and we provide a native call in mobile/status.go#AcceptTerms, which allows the client to persist the user's choice in case they are upgrading (from v1 -> v2, or from a v2 older than this PR). This solution is not the best because we should store the setting in a separate table, not in the accounts table. Related Mobile PR status-im/status-mobile#21124 * fix(test)_: Compare addresses using uppercased strings --------- Co-authored-by: Icaro Motta <[email protected]> test_: restore account (#5960) feat_: `LogOnPanic` linter (#5969) * feat_: LogOnPanic linter * fix_: add missing defer LogOnPanic * chore_: make vendor * fix_: tests, address pr comments * fix_: address pr comments chore_: enable windows and macos CI build (#5840) - Added support for Windows and macOS in CI pipelines - Added missing dependencies for Windows and x86-64-darwin - Resolved macOS SDK version compatibility for darwin-x86_64 The `mkShell` override was necessary to ensure compatibility with the newer macOS SDK (version 11.0) for x86_64. The default SDK (10.12) was causing build failures because of the missing libs and frameworks. OverrideSDK creates a mapping from the default SDK in all package categories to the requested SDK (11.0). fix(contacts)_: fix trust status not being saved to cache when changed (#5965) Fixes status-im/status-desktop#16392 test_: remove port bind chore(wallet)_: move route execution code to separate module chore_: replace geth logger with zap logger (#5962) closes: #6002 feat(telemetry)_: add metrics for message reliability (#5899) * feat(telemetry)_: track message reliability Add metrics for dial errors, missed messages, missed relevant messages, and confirmed delivery. * fix_: handle error from json marshal chore_: use zap logger as request logger iterates: status-im/status-desktop#16536 test_: unique project per run test_: use docker compose v2, more concrete project name fix(codecov)_: ignore folders without tests Otherwise Codecov reports incorrect numbers when making changes. https://docs.codecov.com/docs/ignoring-paths Signed-off-by: Jakub Sokołowski <[email protected]> test_: verify schema of signals during init; fix schema verification warnings (#5947) fix_: update defaultGorushURL (#6011) fix(tests)_: use non-standard port to avoid conflicts We have observed `nimbus-eth2` build failures reporting this port: ```json { "lvl": "NTC", "ts": "2024-10-28 13:51:32.308+00:00", "msg": "REST HTTP server could not be started", "topics": "beacnde", "address": "127.0.0.1:5432", "reason": "(98) Address already in use" } ``` https://ci.status.im/job/nimbus-eth2/job/platforms/job/linux/job/x86_64/job/main/job/PR-6683/3/ Signed-off-by: Jakub Sokołowski <[email protected]> fix_: create request logger ad-hoc in tests Fixes `TestCall` failing when run concurrently. chore_: configure codecov (#6005) * chore_: configure codecov * fix_: after_n_builds
Adds metrics for missed messages, delivery confirmation, and peer count by shard/origin.
Tracks additional events in telemetry:
Important changes:
Dogfooding PR: status-im/status-desktop#16540