-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a collector for ZFS, currently focussed on ARC stats. #213
Conversation
It is tested on FreeBSD 10.2-RELEASE and Linux (ZFS on Linux 0.6.5.4). On FreeBSD, Solaris, etc. ZFS metrics are exposed through sysctls. ZFS on Linux exposes the same metrics through procfs `/proc/spl/...`. In addition to sysctl metrics, 'computed metrics' are exposed by the collector, which are based on several sysctl values. There is some conditional logic involved in computing these metrics which cannot be easily mapped to PromQL. Not all 92 ARC sysctls are exposed right now but this can be changed with one additional LOC each.
fetchedMetrics := make(map[zfsSysctl]prometheus.Gauge) | ||
|
||
insertGauge := func(sysctl, name string) { | ||
fetchedMetrics[zfsSysctl(sysctl)] = prometheus.NewGauge( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should be using ConstMetrics rather than creating Gauges. Gauges are the like are for direction instrumentation, whereas a custom collector like this should use ConstMetrics.
The README will also need updating. |
What was causing |
Looks like a false positive. |
Use ConstMetric with UntypedValue. Handle nozfs build flag. Remove computed values.
collector now does not expose metrics if stats cannot be fetched.
Please review the above commits. Will squash them together as soon as you confirm all issues mentioned have been addressed. Side note: Novice users will have trouble making sense of / interpreting ZFS statistics. Should we include a link to some resources, e.g. the implementation of zfs-stats? (It is a popular tool on FreeBSD for getting a quick overview. Currently, node_exporter exposes a subset of zfs-stat's metrics, some of which are computed). |
//------------------------------------------------------------------------------ | ||
|
||
type zfsMetric struct { | ||
subsystem zfsSubsystemName // The Prometheus subsystem name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments should end in periods
Links to reference docs are good. As someone who uses ZFS on linux, the zfs-stats source doesn't seem to add anything that's not already obvious from the names. |
return zfsErrorValue, fmt.Errorf("sysctl '%s' found") | ||
} | ||
|
||
func (p *zfsMetricProvider) prepareUpdateArcstats(zfsArcstatsProcpath string) (err error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would benefit from a unittest
Neat, did not realize this work was happening. @eliothedeman and I had previously started work on our own ZFS exporter, but integrating the information into We expose metrics using go-zfs, but as has been stated, many of these do require root on Linux. I run ZoL and would be very interested in making these stats accessible on Linux as well. What would be an appropriate way to go about doing this? It seems that creating some sort of "ZFS helper" binary which can independently run as root and spit out metrics for the textfile collector could work. |
func (c *zfsCollector) zfsAvailable() (err error) { | ||
file, err := c.openArcstatsFile() | ||
if err != nil { | ||
file.Close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
File needs to be closed in the err == nil
case as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's new to me. Do you have any links on this one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check out this blog: http://blog.golang.org/defer-panic-and-recover.
You could also just use os.Stat()
to check for the /proc
file instead of actually opening it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And actually you only need to close the file when the opening was successful. When there was an error, you won't even get a valid *File
back, but nil
. So the code above would actually crash the entire node exporter with a nil pointer dereference panic in case the file can't be read.
Has someone asked ZoL why all the stats we're interested in require root? No other filesystem requires that that I'm aware of. |
Filed an issue with ZoL: openzfs/zfs#4468. |
@@ -0,0 +1,109 @@ | |||
package collector |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please include the same copyright/license header here as in all the other Node Exporter files. Same for the other Go files here.
I'm definitely in favor. 👍 |
@discordianfish regarding the requirement for superuser privileges to run the Remember that the part about parsing Only the per-dataset / per-pool metrics cannot be gathered without either
About the current state of this patch: In the Linux-part, the cleanups mention by @brian-brazil probably need to be done. Should work as non-root for now. I am short on time right now and won't touch the FreeBSD side in the foreseeable future ( Feel free to strip the FreeBSD-part and try to upstream the currently existing Linux-part, however ;) |
@problame My team at HPE would like to see the Linux part of this get upstreamed. Our thoughts are to do exactly what you said: get rid of the FreeBSD stuff and do the cleanups on the Linux stuff that @brian-brazil has mentioned. Would you be okay with these changes being made to try to get this PR accepted? I can start a new branch if it would be better to preserve the FreeBSD stuff on this one, or I can commit directly to this branch. |
@cbstewart9393 It doesn't look like I'm going to have the time to implement the FreeBSD part for this any time soon. However, I would suggest to keep the implementation easily extensible to non-Linux OSes, as is now. |
Going to close this in favor of #369 |
It is tested on FreeBSD 10.2-RELEASE and Linux (ZFS on Linux 0.6.5.4).
It does not require root privileges to run on either of the above systems.
On FreeBSD, Solaris, etc. ZFS metrics are exposed through sysctls.
ZFS on Linux exposes the same metrics through procfs
/proc/spl/...
.In addition to sysctl metrics, computed metrics are exposed by
the collector, which are based on several sysctl values.
There is some conditional logic involved in computing these metrics
which cannot be easily mapped to PromQL.
Not all 92 ARC sysctls are exposed right now but this can be changed
with one additional LOC each.
Questions about behavior:
What should happen if
-collectors.enabled zfs
is set but the stats cannot be found (e.g. because the kernel module is not loaded).Currently node_exporter terminates with an error message.
If the behavior described above is not desired, should the collector determine at runtime whether data is available?
2.1. Should it expose no metrics if no data is available?
2.2. Could the
zfs
collector be enabled by default if it does nothing on a non-ZFS system?