-
-
Notifications
You must be signed in to change notification settings - Fork 745
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ST2 Metrics Collection #4004
ST2 Metrics Collection #4004
Conversation
Just some of my quick thoughts - if I worked on something like this, I would start in the following order:
I think integration with 3rd party tools is important, but I think it's even more important to have a "generic" API endpoint which metrics which everyone can consume (this includes our WebUI, Prometheus or any other monitoring system a user might use). On a related note, it would be good to identify / list somewhere metrics we should collect. Perhaps we can just keep a list in the description of this PR. |
My original intent was to have an API endpoint where metrics were collected over a stream across our bus. When thinking through this that would require us to begin collecting that information somewhere - mongo isn't the right tool, postgres isn't the right tool. We'd need to rely on another service to begin collecting that information which is yet another dependency on st2. We can make this dependency optional but it also puts us into a position where we're opinionated on how to collect these metrics. This isn't desirable in environments where they're already collecting stats in a specific way. One thing I'm still considering is having a new st2 service that will be able to aggregate the collection of these metrics. This is primarily to accommodate both streaming style collection ( |
I'm wondering if it is at all possible to expose this metrics some way outside of normal stackstorm communication channels (something along the lines of dtrace hooks) and leave gathering this metrics to external process. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there anyway to implement this at the runner abstraction? I understand the need for runner specific metrics like the python wrapper execution. But for the python runner, that can probably be generic.
# Note: We should eventually refactor this code to make runner standalone and not | ||
# depend on a db connection (as it was in the past) - this param should be passed | ||
# to the runner by the action runner container | ||
with CounterWithTimer(PYTHON_RUNNER_EXECUTION): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do this with a decorator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK, yes we can.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm personally not a big fan of decorators, at least when they are used in this manner (wrapping large blocks of code) - it adds another level of indentation and makes code harder to read (imo, cyclomatic complexity).
I'm fine when they are used to wrap function / method definitions. One way to reduce this level of indentation would be to move this code block in a utility method, then call that utility method inside the decorator. This way original code block is indented as it was before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Edit: It looks like we are wrapping whole run method, in this case we should just add a decorator around the whole method :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Kami LOL
def send_time(self, key=None): | ||
""" Send current time from start time. | ||
""" | ||
time_delta = datetime.now() - self._start_time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How come this is not in UTC?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch.
from st2common.metrics.drivers.statsd_driver import StatsdDriver | ||
return StatsdDriver() | ||
|
||
return BaseMetricsDriver() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we use stevedore to load the drivers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps, need to look into it.
st2common/st2common/util/loader.py
Outdated
# file but don't have oslo context. | ||
from st2common.metrics.metrics import CounterWithTimer | ||
from st2common.constants.metrics import METRICS_REGISTER_RUNNER | ||
with CounterWithTimer(METRICS_REGISTER_RUNNER): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use a decorator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
METRICS_TIMER_SUFFIX = "_timer" | ||
|
||
PYTHON_RUNNER_EXECUTION = "python_runner_execution" | ||
PYTHON_WRAPPER_EXECUTION = "python_wrapper_execution" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Metrics for other runners?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I like the idea of doing generic runner metrics at the runner caller as you mentioned. Also need to incorporate the feedback from community and internal on what to test. This is progressing currently.
@bigmstone, What happens if there's failure during metrics collections or if there's failure with the statsd server? Can you add some tests that shows errors will be handled gracefully and exception logged and won't affect execution of the action? Thanks |
@m4dcoder nothing. It's UDP so it serializes the packet and that is all. That said I should see if there's anything the package is set to raise and catch it/log it - but this wouldn't come from a network error or the like. It would be from either an internal error or similar. Possibly also from a poor configuration (non-usable port, non-usable IP, etc.) |
@@ -119,7 +123,8 @@ def pre_run(self): | |||
if self._log_level == PYTHON_RUNNER_DEFAULT_LOG_LEVEL: | |||
self._log_level = cfg.CONF.actionrunner.python_runner_log_level | |||
|
|||
def run(self, action_parameters): | |||
@CounterWithTimer(PYTHON_RUNNER_EXECUTION) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks 👍
|
||
def metrics_initialize(): | ||
"""Initialize metrics constant | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe you need to reference the METRICS constant as global. See this example as reference https://github.com/StackStorm/st2/blob/39517c0fb80359c48324a5b6ba4088972fbcb9de/st2common/st2common/services/coordination.py.
5aae4ae
to
e65d268
Compare
Makefile
Outdated
@@ -275,6 +275,9 @@ requirements: virtualenv .sdist-requirements | |||
# new version of requests) which we cant resolve at this moment | |||
$(VIRTUALENV_DIR)/bin/pip install "prance==0.6.1" | |||
|
|||
# Install st2common to register metrics drivers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about s/register metrics drivers/register common plugins?
Makefile
Outdated
@@ -275,6 +275,9 @@ requirements: virtualenv .sdist-requirements | |||
# new version of requests) which we cant resolve at this moment | |||
$(VIRTUALENV_DIR)/bin/pip install "prance==0.6.1" | |||
|
|||
# Install st2common to register metrics drivers | |||
(cd ${ROOT_DIR}/st2common; ${ROOT_DIR}/$(VIRTUALENV_DIR)/bin/python setup.py install) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok with this. I thought making the requirements already did this because the last time I tested this branch, I didn't have to explicitly run the setup. Can you use python setup.py develop
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@m4dcoder I'm getting all kinda nasty side effects from doing this. I've tried develop and install trying to work around it. Found some strange WebOb bug that manifests itself only when st2common
is installed and was able to resolve it by bumping the version. Now I'm investigating these config issues. Long story short just ignore my commits until I get the tests passing.
…nd keep import time small.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Just need to rebase and get CI to pass.
Fixes #2974