Parallelism across independent resources in simulation #620
Replies: 1 comment 18 replies
-
Hi Joel! Thanks for writing all of this up. I think speeding up simulation along these lines is a really good idea. I have some misgivings about the specifics, but I do believe that there is a way to push the core idea through. I'll focus here on my responses to your proposal itself. I may write a separate reply about my thoughts for overcoming the obstacles I identify below.
Maybe the nomenclature has shifted (again), but from what I recall, a "resource" in Merlin is a quantity that is observable and trackable by the external environment (i.e. to be recorded as part of simulation results). I think you're referring to cells, i.e. the internal state of the model.
Can I take this to mean you have a Rust prototype of this proposal? 👀
For some fun historical context, this was something we explored very early in the development of activity modeling in Merlin. At least at the time, we found it really hard to design a static dependency declaration and injection system that didn't entail a significant loss of ergonomics. That's why we ended up with the current approach, where we can infer a dynamic dependency graph based on what the activity is observed to do. We never did end up fully constructing that dependency graph, although we do perform the same kind of dependency analysis to determine when a resource needs to be re-queried due to changes to upstream cells. There may even still be a comment in
When we first considered an approach like this, we had a lot of trouble with how to give names to those pieces of state, and how to do type-safe dependency injection of the same. Moreover, individual cells are not really meant to be accessed directly -- a cell is morally a field of some other model class, and it is that model that defines the relevant methods for manipulating that state. I think we would want to name these models and do dependency injection at that level. The issue with binding against models instead of cells is that there's no obvious way to tell what cells can be influenced from one model. Firstly, the sim driver doesn't observe the mission model as anything more than one big black box -- we don't currently have any insight into its internal structure. We would need some new mechanism to at least identify groups of cell allocations. Secondly, there is nothing stopping a modeler from providing one model instance a reference to another model instance, so the cells a model can influence is a superset of the cells the model allocated itself. If an activity declares a dependency on a given model instance, it doesn't give us enough information to precisely infer all of the cells that are accessible from that root. On the flip side, the issue with binding against cells is that it entirely breaks encapsulation. An activity is liable to call a method on a model, which calls another method on another model, etc., until finally doing something to a cell. Activity type authors cannot be expected to know the full breadth of cells each method influences, just so that they can declare them in the In addition (and this is perhaps a more solvable issue), cells (and models) can be allocated in e.g. a loop. If we want to name each cell (or each model) that can be depended upon by an activity, we'll need an addressing scheme that can cope with the fact that we don't necessarily know all of the instantiated cells/models until the mission model has been instantiated. I'm further concerned about how activity decomposition factors into this. If an activity with access to state X spawns an activity with access to state Y, then before the spawn, we might be fooled into thinking that nothing can influence Y, and hence Y can move forward. However, if the child is then spawned, we suddenly have changes to Y occurring when we didn't expect them to. It seems we would need to declare which child activities can be spawned, too, so that the child dependencies can be imputed to the parent as well. All this means that I think it will be very hard to precisely identify pre-simulation which cells an activity is allowed to influence whilst also preserving the "it's just Java with mild restrictions" flavor of the current modeling style.
I like this idea very much; it's definitely at the heart of how this whole endeavor could be achieved. With our current approach to The problem is, how do we tell when we can't step a cell up? Put differently, how do we know whether stepping the cell up will miss effects that haven't been committed yet to earlier times? Our current solution is to simply never run a task until all cells are guaranteed to be available -- we never block during the task, because we already blocked on all cells before we resumed the task to begin with. Your solution to this problem can, I think fairly, be characterized as statically declaring which cells an activity can affect; this would allow us to simply look at the activities prior to our current time and ask if any of them could potentially affect us. However, as noted above, I'm concerned about (1) the amount of information we need to be given to achieve a precise analysis, (2) the ergonomics implications of requiring that information, and (3) how certain we can reasonably be that we didn't miss some loophole that renders our analysis unsound. I think there's a middle ground, where we still use runtime dependency tracking, but optimistically run tasks that might query cells that cannot be stepped up to the current time with confidence. I want to call this the "branch prediction" approach -- we step a cell up optimistically, and roll back if we find our assumption invalidated -- but I'll wait to describe this separately. (I'm not sure if this is what you referred to by "time warp", but this would leverage logic similar to how ReplayingTask already works.)
@mattdailis and I have been talking about an approach that decouples resource profiling from task simulation, so I think this doesn't need to be considered (luckily). The idea is that, as the simulation commits events to its history (in your case proposal, changes are considered committed when all clocks have moved past them), a separate worker thread walks along history behind the simulation, and steps copies of the resource-relevant cells. That is, simulation and profiling have their own copies of state; profiling is just replaying the history that simulation has already traversed. In principle, this lets simulation get arbitrarily far ahead of profiling, without jeopardizing our ability to stream results.
Yes, this is something we knew and designed around early on. This is also why we didn't bake any type of cell into Merlin itself. Merlin allows modelers to define their own cell types, with their own events and own semantics for those events. The hope is that, if you know a lot about what you need your particular cell for, you can encode domain operations on that cell as events, giving them better behavior with respect to concurrency and potentially better behavior with respect to dependency tracking. As this becomes more important, I hope that we can see cell modeling become more of a valuable tool. As it stands, the existing cells may not always be the most performant, but at least modelers can start with them and refine them as they identify performance bottlenecks.
I would very much like if we could instrument simulation to get a better idea of how these dependencies look. As I've mentioned, we designed the interface between the driver and the model so that we could precisely identify all state, and reads and writes against that state, over the course of the simulation. The intent was to build a complete graph of dependencies amongst cells, tasks, and resources, laid out over time, which could be used to drive incremental resimulation. (If, when re-running a changed activity, it still produces all the same effects, then we need not resimulate anything else. If it's different in some ways but not others, we can propagate those changes only to those entities that were influenced.) This kind of dependency graph would also be useful for us, to get an idea of how interdependent -- or how decoupled -- the activities and cells in a simulation are. And if read-write dependencies become as performance-critical as we hope in this proposal, the graph will also be a major boon to modelers who are trying to optimize their system. As such, I think a really strong first step would be to implement the capability to generate a dependency graph from a simulation, so that we can get a better handle on the kinds of performance gains we'd realistically see -- and perhaps identify any other structure that could be exploited in the name of performance. |
Beta Was this translation helpful? Give feedback.
-
Currently simulation has no parallelism, even though each task (such as an activity) has its own thread. Only one task is allowed to proceed at a time, even though many activities in different subsystems could theoretically operate on completely disjoint sets of resources.
Given that we aren't going to make a backtracking engine (i.e. time warp), the limiting factor of how much parallelism we could achieve is Merlin's knowledge of what resources are going to be operated on, when, how, and by what activities. Currently, we have no knowledge, which forced the current design to have no parallel tasks at all. This is one extreme. At the other extreme, we could have the mission modeller register the operations each activity could make in between each delay, which would tell Merlin exactly when each resource would be queried or changed. This would be impractical for everyone involved, would limit the modeller's ability to use variable delays, and would just generally be a whole lotta work.
I'd like to outline a system in the middle that keeps track of resource usage at the activity level; i.e. Merlin would know what resources each activity is allowed to query or mutate, but not precisely when it will happen. Unfortunately this would be a nearly ground-up redesign of Merlin, but wouldn't require much work for mission modellers. And of course I wouldn't propose this if I didn't think the performance gains would be worth it.
Design
This would be best suited for Loom virtual threads, which are currently a preview feature in Java 19. This is because there will be a lot of context switching in between CPU-bound (not IO-bound) units of work, which would be much more expensive on OS threads. I also fiddled around with async
Future
s in Rust for a while to make sure it would work with that paradigm - and it would, but it would be pretty obnoxious with Java'sCompletableFuture
s.Mission model changes
Instead of getting the whole
MissionModel
to operate on, each activity effect model would declare in its@EffectModel
annotation which resources it will read and write, and be provided resource handles that allow reading and/or writing accordingly. This allows Merlin to know which activities can read/write which resources, and requires the activity to abide by the resources that it declares.I don't imagine any other changes to mission model code.
Clocks
A key part of any parallel simulation engine will be that each piece of computation might be happening at different simulation times. In this case, since this is not a backtracking algorithm, each thread has exactly one clock that tracks where it is in simulation time. I'll be referring to a hypothetical
Clock
class that each thread owns, and allows other threads to wait on them until they advance to a certain simulation time.Activities
Instead of an activity waiting on
delay
, they would wait (if needed) on resource queries (.get()
).delay
would simply advance the activity clock and notify any resource threads that were waiting on it (see below). This should still be compatible with incremental simulation.Resources
Ideal world
If our jobs were easy, no activities would cause dependencies between resources, meaning that
set
-like operations can be instantiated immediately in any order. SoresA.set(5)
is acceptable, and evenresA.add(5)
. ButresA.set(resB.get())
andif (resB.get() == 5) resA.set(5)
create a dependency betweenresA
andresB
. In this ideal world where we don't have to query resources, simulation could look like this (and be pretty performant):delay
s, their clocks advance.The point of all of this is that each resource can churn through its heap of operations in parallel with the others.
Reality
Turns out our jobs are not easy. Any time an activity needs to query a resource it needs to wait on the resource thread's clock to pass the activity clock, before reading a value. But, a) the above algorithm doesn't store values in memory, and b) let's be real, we want profile streaming. So instead we do this:
Limitations
res.add(5)
andres.set(res.get() + 5)
.add
is an operation that can be sent to the resource heap immediately.set(get() + 5)
will block the activity thread until the operator catches up, and then block the operator until the activity has a chance to resume and send the operation.state of charge
is read frequently by the majority of activities. I assume that most activities's charge behavior can be expressed without.get()
(as incharge.subtract(5)
orcharge.addRate(0.1)
), but if they can't, then we have a problem. We would still get parallelism, because other resources could progress in small increments on other cores, but state of charge would become a bottleneck.Benefits
Beta Was this translation helpful? Give feedback.
All reactions