Instrument python runner to get more metrics around delayed executions #2974

lakshmi-kannan · 2016-10-25T19:49:17Z

Simple Python actions take a few seconds to complete execution.

(virtualenv)vagrant@st2dev /m/s/s/st2 ❯❯❯ st2 run examples.isprime value=7                                  STORM-2935/st2_timers_list ✭ ✱ ◼
..
id: 580fb666d9d7ed14ab36f33f
status: succeeded
parameters:
  value: 7

(virtualenv)vagrant@st2dev /m/s/s/st2 ❯❯❯ st2 execution get 580fb666d9d7ed14ab36f33f -dj                    STORM-2935/st2_timers_list ✭ ✱ ◼
{
    "action": {
        "ref": "examples.isprime"
    },
    "context": {
        "user": "stanley"
    },
    "end_timestamp": "2016-10-25T19:45:44.263915Z",
    "id": "580fb666d9d7ed14ab36f33f",
    "liveaction": {
        "action": "examples.isprime",
        "action_is_workflow": false,
        "callback": {},
        "id": "580fb666d9d7ed14ab36f33e",
        "parameters": {
            "value": 7
        },
        "runner_info": {
            "hostname": "st2dev.stackstorm.net",
            "pid": 5224
        }
    },
    "parameters": {
        "value": 7
    },
     "start_timestamp": "2016-10-25T19:45:42.049665Z",
    "status": "succeeded"
}

It looks like the fork call takes a couple of seconds but I benchmarked the fork call and it doesn't appear that fork is expensive from a standalone tester (https://gist.github.com/lakshmi-kannan/eb03a3f92d72b6f391a2f18fc8211883). So we have to instrument our code to get more fine grained metrics.

The text was updated successfully, but these errors were encountered:

enykeev · 2016-10-28T08:41:13Z

I don't see execution log we've added in #2718. Have we forgot to expose it on detailed view of st2client?

Kami · 2016-10-28T08:46:07Z

@enykeev Yeah, good point - we probably need to update client to include it when -d flag is used :)

Kami · 2016-10-28T08:56:37Z

@lakshmi-kannan Done some more measuring and it looks like that bulk of the time (around 1.5 seconds). is indeed spent in run_process (as assumed). Other stuff takes around 500ms or so to total run time of an execution is around 2 seconds.

I checked eventlet implementation to see what is going on (e.g. if it's using some long poll timeout or similar) and I couldn't find anything offending - https://github.com/eventlet/eventlet/blob/master/eventlet/green/subprocess.py

eventlet patches .communicate() which is the suspicious part. I tried replacing communicate with just a call to .wait() and this drop the run-time of that function to 700ms. So if we want to optimize it, we need to dig in and see exactly what is going on with .communicate() (it's also quite likely that we can't do much about it).

Another thing is also worth nothing (and something we should write in the documentation) is that no one should expect sub-1 second run time for Python runner actions - if users want to use it for something which requires such response times they will need to write their own runner, doing that with a Python runner which involves forking a subprocess with Python interpreter is simply not possible.

Kami · 2016-10-28T08:58:10Z

I did some more testing - removing communicate and doing process.wait(); process.stdout.read(), process.stderr.read().

It looks like that the slow part is indeed reading from stdout and stderr pipe (that's what basically communicate does).

Kami · 2016-10-28T09:29:09Z

I tried various things including changing the buffer size, etc, and I can't get the run-time to decrease when reading from stdout and stderr.

So in short there is not much we do for the Python runner. One thing we should do though is document the performance profile of runners - StackStorm is primarily also designed for longer running actions so if users need sub-second or Lambda like performance StackStorm is not a good fit (they could get a run time down by using a custom runner though). We should make this very explicit in documentation so there won't be any surprises.

enykeev added the enhancement label Nov 22, 2016

bigmstone mentioned this issue Feb 16, 2018

ST2 Metrics Collection #4004

Merged

5 tasks

bigmstone closed this as completed in #4004 May 15, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Instrument python runner to get more metrics around delayed executions #2974

Instrument python runner to get more metrics around delayed executions #2974

lakshmi-kannan commented Oct 25, 2016 •

edited

Loading

enykeev commented Oct 28, 2016

Kami commented Oct 28, 2016

Kami commented Oct 28, 2016

Kami commented Oct 28, 2016

Kami commented Oct 28, 2016

Instrument python runner to get more metrics around delayed executions #2974

Instrument python runner to get more metrics around delayed executions #2974

Comments

lakshmi-kannan commented Oct 25, 2016 • edited Loading

enykeev commented Oct 28, 2016

Kami commented Oct 28, 2016

Kami commented Oct 28, 2016

Kami commented Oct 28, 2016

Kami commented Oct 28, 2016

lakshmi-kannan commented Oct 25, 2016 •

edited

Loading