-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lost performance data in file output #273
Comments
could you double check if performance data processing is still enabled when that happens? It can be disabled on the "Process Info" page. Or check the logs for external commands in that direction. |
Processing is enabled - other services are OK at the same time. Log does not show any changes on this attribute. I have another observation - after change in service state the performance data started to be processed again.
From that time the performance data processing continues normally, even with the status of a service changed back to OK. |
from looking into the code, there is no reason why this could happen except the process performance data flag has been disabled. |
@VladimirBilik which application consumes your performance data file? And what is the target database? (Rrdtool, Graphite, etc... )? |
Have you already tried to uses mod_gearman for processing of performance data? Fortunately I don't use NPCD, process_perfdata.pl and Naemons in-build I always used Did you already run Naemon in foreground to see if there will be any errors reported like so?
Maybe you see something like |
I've just started naemon in foreground with screen session. Regarding PNP with mod_gearman: it is a bit complicated to compile it in my environment, so I will try the NPCD. According the Naemon source code there is nothing between calling
and
I'm looking into update_service_performance_data. Giving assumption that process_performance_data=TRUE and svc->process_performance_data=TRUE in my static Naemon configuration, there is just one more check for service_perfdata_process_empty_results.
What do you think? |
I quickly updated my Naemon demo system from 1.0.3 to 1.0.8. (I'am not using mod_gearman) naemon-core/src/naemon/checks_service.c Lines 1007 to 1022 in 1d42073
which is just a few lines above update_service_performance_data()
So, should be nearly impossible that performance data gets vanished between those lines. My system is now running at Naemon 1.0.8 since |
My assumption about settings of service_perfdata_process_empty_results was wrong, I see that it is set to 1 by default. So the mentioned part of code with conditions is never run.
|
Regarding testing NPCD mode: a version distributed with pnp4nagios (pnp4nagios-0.6.25-1.el7.x86_64) have npcdmod.o just for Nagios, which doesn't work with my Naemon:
so I have to compile a version of mine (a bit complication with maintenance) |
As far as I know does
|
while thats true, its not too complicated to get pnp running with naemon, we build the npcdmod module in omd for naemon like this: https://github.com/ConSol/omd/tree/labs/packages/pnp4nagios4 |
@VladimirBilik were you able to find the root cause for the missing performance data? |
Unfortunately no.
while I need to have performance data in format defined in service_perfdata_file_template:
i.e. there is missing the Yesterday I have switched to new core 1.0.9 and performace data template with But I still have no idea why it is happens. Most strange for me is that when some service stops getting data, after forcing check (just once) it is switched to normal behaviour with full data in service_perfdata_file |
@VladimirBilik I'm on vacation and don't have any access to some powerful hardware... The demo system is just a cheap VPS where i can't deploy such a load load. Have you already tried to strip your system down and removed all broker modules? |
Running naemon 1.0.8 with mod_gearman 3.0.7, I have problem with lost performance data occasionally. According to debug the data are delivered from plugin to Naemon, but after some processing they are not written into performance data file. This happens irregularrly. If the data is getting lost on some service it keeps loosing on it until I reload the naemon or force the check of service manually.
I have configured host + services like this:
With debugging turned on the correct check of service looks like this:
the check of service with lost data looks like this:
I.e. it seems like the plugin output is not processed after service and host flapping checks.
It happens on various plugins - it is not related to particular one. System does not report any plugin/process crash, no memory leaks (32G RAM/4G RAM is still free). Any idea what else to check?
Thanks.
The text was updated successfully, but these errors were encountered: