ST2 Metrics Collection #4004

bigmstone · 2018-02-16T00:23:44Z

Kami · 2018-02-16T14:46:23Z

Just some of my quick thoughts - if I worked on something like this, I would start in the following order:

Code instrumentation (make sure we have instrumentation in place and all the metrics we care about)
Expose those metrics via an HTTP endpoint - e.g. /v1/health/metrics or similar.
Third party tool integration (Prometheus or similar)

I think integration with 3rd party tools is important, but I think it's even more important to have a "generic" API endpoint which metrics which everyone can consume (this includes our WebUI, Prometheus or any other monitoring system a user might use).

On a related note, it would be good to identify / list somewhere metrics we should collect. Perhaps we can just keep a list in the description of this PR.

bigmstone · 2018-02-16T18:16:36Z

My original intent was to have an API endpoint where metrics were collected over a stream across our bus. When thinking through this that would require us to begin collecting that information somewhere - mongo isn't the right tool, postgres isn't the right tool. We'd need to rely on another service to begin collecting that information which is yet another dependency on st2. We can make this dependency optional but it also puts us into a position where we're opinionated on how to collect these metrics. This isn't desirable in environments where they're already collecting stats in a specific way.

One thing I'm still considering is having a new st2 service that will be able to aggregate the collection of these metrics. This is primarily to accommodate both streaming style collection (statsd) as well as scraping type collection (prometheus). We could potentially do running average/in-memory stats and provide that via a health API there, but it will be imporant to note to the end user that the data won't persist anywhere so if the service goes down for whatever reason the numbers won't be reliable. Also what's our HA story with this type of implementation? Consensus algo running between them to constantly keep metrics in sync?

enykeev · 2018-02-19T10:56:18Z

I'm wondering if it is at all possible to expose this metrics some way outside of normal stackstorm communication channels (something along the lines of dtrace hooks) and leave gathering this metrics to external process.

…trics

…ner metric

…trics

m4dcoder

Is there anyway to implement this at the runner abstraction? I understand the need for runner specific metrics like the python wrapper execution. But for the python runner, that can probably be generic.

m4dcoder · 2018-03-14T19:52:36Z

contrib/runners/python_runner/python_runner/python_runner.py

-                # Note: We should eventually refactor this code to make runner standalone and not
-                # depend on a db connection (as it was in the past) - this param should be passed
-                # to the runner by the action runner container
+        with CounterWithTimer(PYTHON_RUNNER_EXECUTION):


Can we do this with a decorator?

ACK, yes we can.

I'm personally not a big fan of decorators, at least when they are used in this manner (wrapping large blocks of code) - it adds another level of indentation and makes code harder to read (imo, cyclomatic complexity).

I'm fine when they are used to wrap function / method definitions. One way to reduce this level of indentation would be to move this code block in a utility method, then call that utility method inside the decorator. This way original code block is indented as it was before.

Edit: It looks like we are wrapping whole run method, in this case we should just add a decorator around the whole method :)

m4dcoder · 2018-03-14T20:04:05Z

st2common/st2common/metrics/metrics.py

+    def send_time(self, key=None):
+        """ Send current time from start time.
+        """
+        time_delta = datetime.now() - self._start_time


How come this is not in UTC?

Good catch.

m4dcoder · 2018-03-14T20:23:47Z

st2common/st2common/metrics/metrics.py

+        from st2common.metrics.drivers.statsd_driver import StatsdDriver
+        return StatsdDriver()
+
+    return BaseMetricsDriver()


Can't we use stevedore to load the drivers?

Perhaps, need to look into it.

m4dcoder · 2018-03-14T20:50:40Z

st2common/st2common/util/loader.py

+    # file but don't have oslo context.
+    from st2common.metrics.metrics import CounterWithTimer
+    from st2common.constants.metrics import METRICS_REGISTER_RUNNER
+    with CounterWithTimer(METRICS_REGISTER_RUNNER):


Use a decorator?

m4dcoder · 2018-03-14T20:52:32Z

st2common/st2common/constants/metrics.py

+METRICS_TIMER_SUFFIX = "_timer"
+
+PYTHON_RUNNER_EXECUTION = "python_runner_execution"
+PYTHON_WRAPPER_EXECUTION = "python_wrapper_execution"


Metrics for other runners?

Yes, I like the idea of doing generic runner metrics at the runner caller as you mentioned. Also need to incorporate the feedback from community and internal on what to test. This is progressing currently.

m4dcoder · 2018-03-16T21:02:25Z

@bigmstone, What happens if there's failure during metrics collections or if there's failure with the statsd server? Can you add some tests that shows errors will be handled gracefully and exception logged and won't affect execution of the action? Thanks

bigmstone · 2018-03-16T21:07:54Z

@m4dcoder nothing. It's UDP so it serializes the packet and that is all. That said I should see if there's anything the package is set to raise and catch it/log it - but this wouldn't come from a network error or the like. It would be from either an internal error or similar. Possibly also from a poor configuration (non-usable port, non-usable IP, etc.)

…trics

Kami · 2018-03-22T07:54:32Z

contrib/runners/python_runner/python_runner/python_runner.py

@@ -119,7 +123,8 @@ def pre_run(self):
        if self._log_level == PYTHON_RUNNER_DEFAULT_LOG_LEVEL:
            self._log_level = cfg.CONF.actionrunner.python_runner_log_level

-    def run(self, action_parameters):
+    @CounterWithTimer(PYTHON_RUNNER_EXECUTION)


Thanks 👍

…ator

…trics

m4dcoder · 2018-05-04T23:00:33Z

st2common/st2common/metrics/base.py

+
+def metrics_initialize():
+    """Initialize metrics constant
+    """


I believe you need to reference the METRICS constant as global. See this example as reference https://github.com/StackStorm/st2/blob/39517c0fb80359c48324a5b6ba4088972fbcb9de/st2common/st2common/services/coordination.py.

…trics

…reation on invoke.

m4dcoder · 2018-05-07T21:12:38Z

Makefile

@@ -275,6 +275,9 @@ requirements: virtualenv .sdist-requirements
 	# new version of requests) which we cant resolve at this moment
 	$(VIRTUALENV_DIR)/bin/pip install "prance==0.6.1"

+	# Install st2common to register metrics drivers


How about s/register metrics drivers/register common plugins?

m4dcoder · 2018-05-07T21:15:58Z

Makefile

@@ -275,6 +275,9 @@ requirements: virtualenv .sdist-requirements
 	# new version of requests) which we cant resolve at this moment
 	$(VIRTUALENV_DIR)/bin/pip install "prance==0.6.1"

+	# Install st2common to register metrics drivers
+	(cd ${ROOT_DIR}/st2common; ${ROOT_DIR}/$(VIRTUALENV_DIR)/bin/python setup.py install)


I'm ok with this. I thought making the requirements already did this because the last time I tested this branch, I didn't have to explicitly run the setup. Can you use python setup.py develop instead?

@m4dcoder I'm getting all kinda nasty side effects from doing this. I've tried develop and install trying to work around it. Found some strange WebOb bug that manifests itself only when st2common is installed and was able to resolve it by bumping the version. Now I'm investigating these config issues. Long story short just ignore my commits until I get the tests passing.

…trics

…ing tests

…trics

…nd keep import time small.

m4dcoder

LGTM. Just need to rebase and get CI to pass.

…trics

Hacking together PRC for metrics collection.

ad5f44f

bigmstone added 16 commits February 21, 2018 17:05

Merge branch 'master' of github.com:StackStorm/st2 into issue-2974/me…

39f25be

…trics

Adding context managers and moving existing metrics to it. Adding run…

5658ce3

…ner metric

Adding constants

c860cd8

Merge branch 'master' of github.com:StackStorm/st2 into issue-2974/me…

a8b8880

…trics

Adding config parse to pythonrunner tests

5cdf32d

Updating sample config

9140efb

Adding licenses and unit tests for statsd driver and base driver

8084bae

Changing to Gauge for prometheus driver

b278249

reverting changes to wrapper when collecting metrics inside the wrapper.

5b0888a

Adding __call__ to allow metcis context managers to be decorators.

9c3b270

Finishing up tests and adding metrics test for runner loader

7cee50a

Merge branch 'master' of github.com:StackStorm/st2 into issue-2974/me…

aaa7689

…trics

Fixing linting issues

aaf9b02

Finishing out tests

25efdf5

Merge branch 'master' of github.com:StackStorm/st2 into issue-2974/me…

963e836

…trics

Adding changelog

a43f493

m4dcoder requested changes Mar 14, 2018

View reviewed changes

Merge branch 'master' of github.com:StackStorm/st2 into issue-2974/me…

078712b

…trics

Kami reviewed Mar 22, 2018

View reviewed changes

bigmstone added 5 commits March 23, 2018 07:49

Fixing bootstrap file to parse config so metrics can be used w/ decor…

edc8dae

…ator

Merge branch 'master' of github.com:StackStorm/st2 into issue-2974/me…

8abf1a3

…trics

Reverting unneeded change

d4dbbb5

Merge branch 'master' of github.com:StackStorm/st2 into issue-2974/me…

8542b48

…trics

Moving metrics drivers to contrib to prep for stevedore load

7d84381

m4dcoder requested changes May 4, 2018

View reviewed changes

bigmstone added 4 commits May 5, 2018 11:19

Adding global

21196c5

Adding noop metrics driver

18cdea5

Merge branch 'master' of github.com:StackStorm/st2 into issue-2974/me…

9c6b0ca

…trics

Adding st2common to virtualenv for tests. Fixing stevedore instance c…

e65d268

…reation on invoke.

bigmstone force-pushed the issue-2974/metrics branch from 5aae4ae to e65d268 Compare May 7, 2018 21:03

m4dcoder requested changes May 7, 2018

View reviewed changes

bigmstone added 10 commits May 7, 2018 23:14

Bumping webob version

8998d43

Removing st2common from make requirements

b49f509

Merge branch 'master' of github.com:StackStorm/st2 into issue-2974/me…

e61fa7f

…trics

Reverting python runner test changes

37356cb

Removing _ from private variable.

68dc6d5

Adding st2common to requirements. Loading noop driver by default. Fix…

87cefe0

…ing tests

Merge branch 'master' of github.com:StackStorm/st2 into issue-2974/me…

398bf44

…trics

Making requests requirement consistent

10d3aff

Removing counter w/ timer from python_runner to address it elseware a…

c322ed0

…nd keep import time small.

Adding st2common to requirements for tox.

58e6199

m4dcoder approved these changes May 10, 2018

View reviewed changes

bigmstone added 9 commits May 10, 2018 16:16

Checking out master version of python runner and python runner tests

2b9c1f0

Merge branch 'master' of github.com:StackStorm/st2 into issue-2974/me…

f5b9ab5

…trics

Merge branch 'master' of github.com:StackStorm/st2 into issue-2974/me…

a298423

…trics

Moving to python-statsd to avoid conflict

d607b8d

Improving statsd driver

ebcb7b6

Fixing statsd tests

73c4996

Revert requirements

b954e7a

Merge branch 'master' of github.com:StackStorm/st2 into issue-2974/me…

ceb35de

…trics

Moving st2common install. Making requirements for st2common.

10fc25e

bigmstone merged commit 49eb46d into master May 15, 2018

bigmstone deleted the issue-2974/metrics branch May 15, 2018 19:04

alexandrejuma mentioned this pull request Sep 13, 2018

Expose metrics API for Prometheus #4341

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ST2 Metrics Collection #4004

ST2 Metrics Collection #4004

bigmstone commented Feb 16, 2018 •

edited

Loading

Kami commented Feb 16, 2018

bigmstone commented Feb 16, 2018

enykeev commented Feb 19, 2018

m4dcoder left a comment

m4dcoder Mar 14, 2018

bigmstone Mar 14, 2018

Kami Mar 21, 2018

Kami Mar 21, 2018

m4dcoder Mar 21, 2018

m4dcoder Mar 14, 2018

bigmstone Mar 14, 2018

m4dcoder Mar 14, 2018

bigmstone Mar 14, 2018

m4dcoder Mar 14, 2018

bigmstone Mar 14, 2018

m4dcoder Mar 14, 2018

bigmstone Mar 14, 2018

m4dcoder commented Mar 16, 2018

bigmstone commented Mar 16, 2018 •

edited

Loading

Kami Mar 22, 2018

m4dcoder May 4, 2018

m4dcoder May 7, 2018

m4dcoder May 7, 2018

bigmstone May 8, 2018

m4dcoder left a comment

ST2 Metrics Collection #4004

ST2 Metrics Collection #4004

Conversation

bigmstone commented Feb 16, 2018 • edited Loading

Kami commented Feb 16, 2018

bigmstone commented Feb 16, 2018

enykeev commented Feb 19, 2018

m4dcoder left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

m4dcoder commented Mar 16, 2018

bigmstone commented Mar 16, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

m4dcoder left a comment

Choose a reason for hiding this comment

bigmstone commented Feb 16, 2018 •

edited

Loading

bigmstone commented Mar 16, 2018 •

edited

Loading