Replies: 2 comments 1 reply
-
Thanks for the write-up! (and apologies for the glacial response) I am supportive of removing the extra tick from the end of call. It adds symmetry to the begin and end semantics (the first bit of a call currently starts in the same tick as the caller - it seems reasonable for the last tick to occur in the same tick during which the caller is resumed). This will also make it easier to reason about a caller and its callee together - there can be no interference between the observable simulation state between the end of the callee's execution and the resumption of the caller. Regarding how this affects existing models, and whether we should auto-generate a |
Beta Was this translation helpful? Give feedback.
-
Actually, I've realized that Consider the following snippets (where // 1
call(() -> spawn(() -> delay(2, SECONDS)));
log("Hello");
// 2
spawn(() -> delay(2, SECONDS));
log("Hello"); In the first example, |
Beta Was this translation helpful? Give feedback.
-
Summary
Currently, when a task spawned by
call()
terminates, we wait one tick of the simulation engine before resuming its parent. As an illustrative edge case, performingcall(() -> {})
(i.e. delegating to and blocking on the completion of a do-nothing task) ends up behaving identically todelay(Duration.ZERO)
, since both cause the current task to yield for a tick.I am concerned that this behavior is actually incorrect in general, as it can cause tasks that a modeler expects to occur simultaneously to, in fact, occur sequentially. Moreover, the only remedy for a task occurring late is to add extra delays to all the other potentially-simultaneous tasks, which is exactly the kind of non-local reasoning we sought to eliminate in the Merlin modeling system (c.f. the effects of
immediately
on model design and maintenance in APGen).I propose that we change the behavior of
call()
to resume the calling task immediately when the called task completes. Not only does this address the above problem, but it more generally aligns with existing expectations about method invocation in Java. In other words, this removes a rough edge where Merlin does not meet the bar of being "just Java with added flavor". Moreover, if the original behavior is desired in some places, it can be achieved by insertingdelay(Duration.ZERO)
after thecall
-- or, indeed, anywhere else near the call site that might be more appropriate to capture the modeler's local intent.Proposed source-level change
This is a very small change in terms of lines of code, so I'm putting this section first so we can set the development effort question to rest quickly.
In
SimulationEngine#stepEffectModel
, instead of deferring theAwaitingChildren
phase of a task to the next tick, begin processing children immediately. In other words, replace these lines:With this line:
The
stepWaitingTask
logic already resumes the parent immediately on completion, so it is only the tick artificially inserted between "modeled logic completed" and "all delegated work completed" that needs to be removed.If we also decide that the specific case of
call(new MyActivity())
should retain the extra tick -- see below -- then we will want to insert an extradelay(Duration.ZERO)
into the generation ofcall
stubs inMissionModelGenerator#generateActivityActions
.Historical rationale
The current behavior is something of a historical artifact due to how we ended up with the current design. Back when the system was first implemented, we didn't have
call
-- at least, not as a primitive action. Instead, we hadwaitFor
, which would take a task ID and block until that task completed. Thecall
action was defined aswaitFor(spawn())
, so it inherited the behavior ofwaitFor
.With
waitFor
, the "general" case is that the target task won't be completed yet, so the code following awaitFor
will resume at a later time. (We considered this to be the general case because you wouldn't usewaitFor
if you knew you didn't have to wait!) This means that the transaction enclosing the effects prior towaitFor
will have closed, and the transactions of any other task at that time will also have closed, meaning that the later code will observe the effects of all of those transactions.In the edge case where the task has already completed, we don't want to change our behavior suddenly and discontinuously from the planner's normal expectation. As such, we extended the bulk behavior to the boundary, inserting an extra tick to ensure that the modeler can always assume they're in a new transaction following a
waitFor
.However,
waitFor
is now gone, and with it goes any possibility that a task might attempt to wait on a task that has already completed. (Consider: when you usecall(act)
, the task described byact
can't have completed because we haven't even started it yet!) This means that the reason for the extra tick has now disappeared. That doesn't on its own mean that we should remove it, but it does mean we should ask whether it's still necessary -- and whether we gain more by a different choice.Problems with the current behavior
Keep in mind that the problems below have always existed -- but our hands were tied because
call
was built onwaitFor
, andwaitFor
had to gracefully deal with the edge case described above. Now we have the opportunity to fix these problems.call
cannot be used for optimizationFirst,
call
is little more than a simulation-aware Java method call. This is more true than ever with the recent changes to allowcall
to return the result of the called task. In fact, there are exactly two differences betweencall(() -> x);
andx
itself:The first difference can already be achieved with the explicit use of
delay(Duration.ZERO)
, socall()
adds no additional capabilities here.The second actually has significant positive optimization benefits, as a modeler can wrap expensive parts of a task in a
call()
to aThreadedTask
where the thread overhead is insignificant relative to the computation itself, then replace the outer task with aReplayingTask
which affords fast context-switching at the cost of replaying over earlier parts of the task. Since the costly parts of the task are wrapped in a separate task, the replaying task remains paused until the subtask completes, and every time thereafter the replaying task can skip over that whole subtask in one step, no matter how complex the logic in that subtask was.Unfortunately, the extra tick actively prevents
call()
from being used for optimization, because that extra tick can end up causing extremely different observable behavior! For instance, if a task added a series of rates that might alternate between positive and negative -- a situation not uncommon in data models with channels that may even overflow into other channels -- then the net sum might end up rather close to zero, but any number of intermediate sums might end up quite positive or quite negative. These unnecessarily-observable positive or negative quantities could then trip conditions, e.g. on daemon tasks designed to take automatic action when some threshold is reached. So any attempt to usecall()
to optimize simulation performance can actually lead to significant discrepancies in modeled behavior.Aside on
call
on activities:There is a third use of
call
: as convenient syntactic sugar for invoking activities withcall(new MyActivity())
. The tasks produced by activities are special in that their output spans are visible to planners, so modelers will often model things that should be visible to planners as activities, then invoke them withcall()
. From a simulation perspective, however, this boils down to a normalcall
-- the generated code stub for that activity type just looks up the actual task to invoke, which itself emits events causing its span to be visible in the simulation results.The proposed change doesn't implicate activities specially; activity output spans will still be visible to planners as always, and the concrete task lookup will remain unchanged. However, If there is a desire for activities to generally force an extra tick on their callers, that can be added to the generated
call
stubs. (I would prefer not to, for the sake of a consistent mental model.)(I've long wanted to add an ability to generate visible spans without having to model them with activity types, so I don't consider this part of the essence of
call
. In fact, modeling something as an activity type means that a planner can use that activity type in their plan, even if the modeler only wanted to provide output information, and not a new opportunity for control. IIRC, this also came up in discussions about APGen, but I don't remember the details.)Nth-distant ancestors are delayed by N ticks
Tasks can decompose into quite complex and deep trees of subtasks; and oftentimes, parent tasks end by
call
ing a child task. Since an extra tick is inserted between every terminating activity and its caller, a sequence of five nested calls will incur a total of five ticks' worth of delay by the time the first task resumes. Together withspawn
ing multiple children, this can lead to tasks that look like they should occur simultaneously actually occurring in sequence. For instance:In the current system, this program will output
1
; under the proposed change, this program will output0
.The
call
s do nothing, and intuitively we should be able to remove them locally. If we removed them from the secondspawn
, under the current system we will now observe0
, while under the proposed change there will be no difference.We can make something similar happen with an unintuitive ordering of effects:
Under the current system, this program will output
1
; under the proposed change, this program will output0
. The common theme here is thatcall
ought to be transparent -- if you call a no-op, the whole call should be a no-op. The behavior of a call should be all and only the behavior of the thing being called.This issue also has implications for my prototype proposal for exposing a relative scheduling capability from the simulation engine. That prototype draws on the existing ability of tasks to form a delegation tree, leveraging
call
to achieve end-relative invocation. However, in the course of compiling a plan into a top-level task that decomposes into its constituent directives, and to achieve a sensible modularity between the tree-structure and the individual directives, it ends up being useful to nest twocall
s together. The inclusion of an extra tick makes me nervous about whether it's possible to have two directives planned which we expect to occur at the same time, but actually occur in two distinct ticks due to the current behavior ofcall
. Whether or not the problem is real, the currentcall
complicates reasoning about simple program refactorings like this.(Yes, this particular issue is what led me to think about the current behavior of
call
; but I hope I've demonstrated that this isn't particular to my pet prototype.)Beta Was this translation helpful? Give feedback.
All reactions