Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add 4 uptime metrics in prometheus #5506

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 10 additions & 2 deletions server/model/monitor.js
Original file line number Diff line number Diff line change
Expand Up @@ -980,7 +980,15 @@ class Monitor extends BeanModel {
await R.store(bean);

log.debug("monitor", `[${this.name}] prometheus.update`);
this.prometheus?.update(bean, tlsInfo);
let uptimeMetrics = {};
let data24h = await uptimeCalculator.get24Hour();
let data30d = await uptimeCalculator.get30Day();
let data1y = await uptimeCalculator.get1Year();
Comment on lines +984 to +986
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the performance impact of this?
This makes doing a ping substancially more exspensive...
Unsure if this is an isssue.

Have you tested this?

uptimeMetrics.avgPing = data24h.avgPing ? Number(data24h.avgPing.toFixed(2)) : null;
uptimeMetrics.data24h = data24h.uptime;
uptimeMetrics.data30d = data30d.uptime;
uptimeMetrics.data1y = data1y.uptime;
this.prometheus?.update(bean, tlsInfo, uptimeMetrics);

previousBeat = bean;

Expand Down Expand Up @@ -1730,7 +1738,7 @@ class Monitor extends BeanModel {
*/
async handleTlsInfo(tlsInfo) {
await this.updateTlsInfo(tlsInfo);
this.prometheus?.update(null, tlsInfo);
this.prometheus?.update(null, tlsInfo, null);

if (!this.getIgnoreTls() && this.isEnabledExpiryNotification()) {
log.debug("monitor", `[${this.name}] call checkCertExpiryNotifications`);
Expand Down
101 changes: 100 additions & 1 deletion server/prometheus.js
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,31 @@
help: "Is the certificate still valid? (1 = Yes, 0= No)",
labelNames: commonLabels
});

const monitorUptime1y = new PrometheusClient.Gauge({
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please go through https://prometheus.io/docs/practices/naming/ and make sure that the new metrics are better "alligned" to what prometheus wants metrics to be like ^^

name: "monitor_uptime_1y",
help: "Monitor Uptime 1y (%)",
labelNames: commonLabels,
});

const monitorUptime30d = new PrometheusClient.Gauge({
name: "monitor_uptime_30d",
help: "Monitor Uptime 30d (%)",
labelNames: commonLabels,
});

const monitorUptime24h = new PrometheusClient.Gauge({
name: "monitor_uptime_24h",
help: "Monitor Uptime 24h (%)",
labelNames: commonLabels,
});

const monitorAverageResponseTime = new PrometheusClient.Gauge({
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please make sure that users + developers know that this is 24h ping times by adding the 24h suffix as above.

name: "monitor_average_response_time",
help: "Monitor Average Response Time (ms)",
labelNames: commonLabels,
});

const monitorResponseTime = new PrometheusClient.Gauge({
name: "monitor_response_time",
help: "Monitor Response Time (ms)",
Expand Down Expand Up @@ -48,13 +73,13 @@
};
}

/**

Check failure on line 76 in server/prometheus.js

View workflow job for this annotation

GitHub Actions / check-linters

Missing JSDoc @param "uptime" declaration
* Update the metrics page
* @param {object} heartbeat Heartbeat details
* @param {object} tlsInfo TLS details
* @returns {void}
*/
update(heartbeat, tlsInfo) {
update(heartbeat, tlsInfo, uptime) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please adress the linting error


if (typeof tlsInfo !== "undefined") {
try {
Expand All @@ -80,6 +105,76 @@
}
}

if (uptime) {
if (typeof uptime.avgPing !== "undefined") {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no clue what war crimes we did commit here.
(Looking at the surrounding code, this fits in..)

I have no clue why the whole typeof verifcation exists in this class.
Seems strange.

We know what the type we pass into this function is..

try {
if (typeof uptime.avgPing === "number") {
monitorAverageResponseTime.set(
this.monitorLabelValues,
uptime.avgPing
);
} else {
// Is it good?
monitorAverageResponseTime.set(
this.monitorLabelValues,
-1
);
}
} catch (e) {
log.error("prometheus", "Caught error");
log.error("prometheus", e);
}
}
if (typeof uptime.data24h !== "undefined") {
try {
if (typeof uptime.data24h === "number") {
monitorUptime24h.set(
this.monitorLabelValues,
uptime.data24h
);
} else {
// Is it good?
monitorUptime24h.set(this.monitorLabelValues, -1);
}
} catch (e) {
log.error("prometheus", "Caught error");
log.error("prometheus", e);
}
}
if (typeof uptime.data30d !== "undefined") {
try {
if (typeof uptime.data30d === "number") {
monitorUptime30d.set(
this.monitorLabelValues,
uptime.data30d
);
} else {
// Is it good?
monitorUptime30d.set(this.monitorLabelValues, -1);
}
} catch (e) {
log.error("prometheus", "Caught error");
log.error("prometheus", e);
}
}
if (typeof uptime.data1y !== "undefined") {
try {
if (typeof uptime.data1y === "number") {
monitorUptime1y.set(
this.monitorLabelValues,
uptime.data1y
);
} else {
// Is it good?
monitorUptime1y.set(this.monitorLabelValues, -1);
}
} catch (e) {
log.error("prometheus", "Caught error");
log.error("prometheus", e);
}
}
}

if (heartbeat) {
try {
monitorStatus.set(this.monitorLabelValues, heartbeat.status);
Expand Down Expand Up @@ -110,6 +205,10 @@
try {
monitorCertDaysRemaining.remove(this.monitorLabelValues);
monitorCertIsValid.remove(this.monitorLabelValues);
monitorUptime1y.remove(this.monitorLabelValues);
monitorUptime30d.remove(this.monitorLabelValues);
monitorUptime24h.remove(this.monitorLabelValues);
monitorAverageResponseTime.remove(this.monitorLabelValues);
monitorResponseTime.remove(this.monitorLabelValues);
monitorStatus.remove(this.monitorLabelValues);
} catch (e) {
Expand Down
Loading