-
Notifications
You must be signed in to change notification settings - Fork 214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Per vat priority (was: two run queues) #3465
Comments
In today's kernel meeting, we figured that a good starting point would be to have @dtribble pointed out that a lot of recent DEX rollouts have left users asking the question "what is the status of my trade?", and that this could provoke some interesting engineering goals. So one thought is:
Later, we'd give e.g. the AMM contract a way to distinguish between the ACK it sends back when the trade is complete (which runs on the user's txnID) from the price-change signal it sends to any subscribers (which, while caused by the user's action, is not really a part of it, and should be given a distinct txnID, or none at all). The txnID used for tracing causality is closely related to the handle we'd use to decide scheduling. Any messages on the run-queue with the same txnID should be using the same escalator, and drawing priority payments from the same account. It's not clear how we'd choose these scheduling parameters on the way in, but at this this hints at the datatypes we should be using to track them. We also discussed the idea that external parties should be able to pay to increase the escalator slope of existing messages, maybe just given the handle visible in the run-queue. This could be a separate type of cosmos message, which is delivered to swingset but not added to the run-queue. |
Does this design preserve order among messages successively sent on a single reference? |
This didn't make the metering phase. |
@mhofman and I came up with more of a plan yesterday:
We explored, but didn't find a satisfactory answer for, the question of how userspace should express priority preferences.
cc @erights (we'll do a proper presentation on this soon, but I figured you might like to hear about the work in progress) |
We also talked that we'll need some kind of way to check if an economic activity ends up entirely executed at higher priority without dropping down priority anywhere. The earlier discussion around "txnId" would be one way to accomplish this, but since we'll need a way to pause processing of lower priority messages for security reasons, we might be able to (ab)use that for sanity checks. We also mentioned that mailbox messages, since we may not know anything about them at this point, may need to be processed by an even higher priority queue. Once processed, Regarding promise subscriptions, I still have concerns that a vat may want to subscribe to a single promise with 2 different priorities, which if not discerned, would end up either dropping or raising the execution of the high / low reactions. The only way I see to how handle this is to consider a pair of [promiseId, priority] as a unique object for subscription purposes. |
Oh, huh. Yeah I'm pretty sure we only want one subscription per promise (well, per (promise, vat) pair). I think I'd be ok with |
The problem is this is technically an elevation of privilege. The vat code might also expect that this low priority resolution would only every run after its other high priority code is done (that is a guarantee should make). |
After discussion with @dtribble, the suggestion was to switch to a model where a vat runs at a single priority level to avoid the complexity of exporting objects and subscribing to promises at different priorities within a vat. It also removes the requirement that message queues are per object (which we may still want for loosening ordering expectations, but it becomes an orthogonal problem). However it does raise the issue that a lower priority sender may abuse its time slice to send multiple messages to a high priority vat (even if the target object in that vat may not do what would be considered high priority work). To solve that issue, we can introduce the concept of an "outbound queue", where message sends and promise resolutions are placed during a vat delivery. That outbound queue would be per vat, and be at the priority level of the vat. The messages would stay in that queue until picked by the kernel in FIFO order later. There is some relation between the inbound queue (deliveries into a vat) and outbound queue (pending messages from the vat): if there are pending outbound messages, those must be processed before any pending inbound messages. We can come up with the concept of "active" queue, where the outbound queue is active if it contains any messages, the inbound queue is active if it contains any messages and the outbound queue is empty. That makes the active state for inbound and outbound queue mutually exclusive. The vat itself can be considered active if there are messages is in either of the queues. A single outbound queue however doesn't work for the "system" vattp / comms vats. Since there is a single one of each, if we placed these vats at high priority, a low priority external sender would be able to send messages to a high priority vat (zoe, amm, etc.) without hitting any lower priority queue. One approach is to have 2 comms vat, one for each sender priority level. However that bundles all senders at a given priority into a single FIFO queue, which may not be desirable. An alternative is to allow the comms vat to create multiple outbound queues, one for each sender, at the priority level of the sender, and specify which outbound queue to use when performing syscalls. That makes the kernel free to pick messages from different senders in any order it wants. The onchain wallet, if implemented as a single vat, could use the same mechanism to generate messages in independent queues based on the wallet. This might also be the first step towards an activity concept. Now the problem is reduced to the following:
Which queue to service first is a matter of scheduling policy. We can start with a pseudo-random order to verify no stronger ordering is relied upon. Vattp and comms may run at an even higher priority level, to ensure external messages end up in the right outbound queues as early as possible. It's not clear at what priority level the onchain wallet should operate. |
We refined this a bit during this week's kernel meeting:
|
I think we should hold
That is the plan already as described in #4638, and the reason I split the run queue before changing the promise send behavior described in #4542.
Rejection is just a notify, so promise is resolved to error right then, and regular notification logic runs queuing onto inbound queues. |
That's plausible. It delays a vat's ability to hear its answer a bit longer (ideally the sender would be subscribed to the result promise before the message gets delivered and the promise gets resolved, so the promise can be deleted as quickly as possible). I'd want to see a trace of an overloaded kernel to get a sense for how much the delay matters.
Perfect.
Hm, rejection is more like a
The total work that could be provoked depends upon how deep the chain of pipelined messages is. My hunch is that all such messages will still be sitting behind the provoking message, so it's not an unbounded expansion of work (i.e. a vat can't arrange to cause a huge amount of kernel work to occur in a single crank with only a single outbound rejected message). If so, we're good. |
I suppose it may delay if the
Yes sorry I took a shortcut. For the result of a message send, it technically can become a fanout if the result promise was forwarded (except I believe you said liveslot is not currently capable of recognizing its result promises?).
That is the current behavior and I was not planning on changing it.
Right, all work would still have to be prearranged by other vats either subscribing or sending a messages to the shared promise. In all cases the reactions would be triggered by picking an event from an outbound queue. For regular promise settlement, it'd be from an explicit |
Mostly correct, userspace currently has no way to provoke liveslots into using a specific pre-existing promise reference ( Userspace does get back a Promise from |
Actually even just a pair of inbound and outbound queue per vat, the promise may be subscribed before the message is delivered:
So I don't think it matters what order messages make it out of a vat in this case. Anyway, I think we deviated, and the main concern was delay in notifies if subscriptions go through the outbound queue. |
I also realized an issue that might inhibit pipelining in too many cases. const p1 = E(obj).foo(); // obj is in comms vat, which enables pipelining
const p2 = E(p1).bar(); When If If I can think of three approaches to fix this:
The first sounds wrong, and the second sounds bad, so I'm leaning towards the third. |
Yeah I was thinking we could do second, but 3rd sounds better, and I believe is what I had in mind originally anyway about updating deciders, and making sure a decider can only be set once for now so we don't have to go through the inbound queues to pluck messages out when a decider changes. |
This moves some logic out of `deliverAndLogToVal` and into a higher-level dispatch function, where it can react to all failure modes in a single place. We pull "run-queue events" off the run-queue (currently named the "acceptance queue". Many of these events are aimed at a specific vat, including 'notify', but the primary one is 'send' and is aimed at a *target*, which is either a kernel object or a kernel promise. For these we use `routeSendEvent` to either queue the event on a promise queue, reject it if it can't be delivered, or deliver it to a specific vat. This is the part that will change with #3465: in that world, each vat-input-queue exists for a specific vat, so the routing step moves earlier (into a "routing crank" that pulls events from a vat-output-queue and possibly adds them to a vat-input-queue). Some kinds of deliveries care about metering (or don't), and some have opinions about what should happen if a crank gets unwound. These are now reported as additional properties in the DeliveryStatus object that is returned by this delivery path. `processDeliveryMessage` looks at these properties, plus the delivery status (if any) to decide what to do at the end of the crank. I think most of the behavior should be the same as before. One change is that the `runPolicy` will probably get more information about non-'send' cranks than before (e.g. `create-vat` might report metering). This refactoring should make it easier to implement #1848 vat-upgrade, as well as #3465 queueing changes.
This moves some logic out of `deliverAndLogToVal` and into a higher-level dispatch function, where it can react to all failure modes in a single place. We pull "run-queue events" off the run-queue (currently named the "acceptance queue". Many of these events are aimed at a specific vat, including 'notify', but the primary one is 'send' and is aimed at a *target* kref, which is either a kernel object or a kernel promise. For these we use `routeSendEvent` to either queue the event on a promise queue, reject it if it can't be delivered, or deliver it to a specific vat. This is the part that will change with #3465: in that world, each vat-input-queue exists for a specific vat, so the routing step moves earlier (into a "routing crank" that pulls events from a vat-output-queue and possibly adds them to a vat-input-queue). Some kinds of deliveries care about metering (or don't), and some have opinions about what should happen if a crank gets unwound. These are now reported as additional properties in the DeliveryStatus object that is returned by this delivery path. `processDeliveryMessage` looks at these properties, plus the delivery status (if any) to decide what to do at the end of the crank. I think most of the behavior should be the same as before. One change is that the `runPolicy` will probably get more information about non-'send' cranks than before (e.g. `create-vat` might report metering). This refactoring should make it easier to implement #1848 vat-upgrade, as well as #3465 queueing changes. refs #4687 (probably doesn't close it, but comes close)
This moves some logic out of `deliverAndLogToVal` and into a higher-level dispatch function, where it can react to all failure modes in a single place. We pull "run-queue events" off the run-queue (currently named the "acceptance queue". Many of these events are aimed at a specific vat, including 'notify', but the primary one is 'send' and is aimed at a *target* kref, which is either a kernel object or a kernel promise. For these we use `routeSendEvent` to either queue the event on a promise queue, reject it if it can't be delivered, or deliver it to a specific vat. This is the part that will change with #3465: in that world, each vat-input-queue exists for a specific vat, so the routing step moves earlier (into a "routing crank" that pulls events from a vat-output-queue and possibly adds them to a vat-input-queue). Some kinds of deliveries care about metering (or don't), and some have opinions about what should happen if a crank gets unwound. These are now reported as additional properties in the DeliveryStatus object that is returned by this delivery path. `processDeliveryMessage` looks at these properties, plus the delivery status (if any) to decide what to do at the end of the crank. I think most of the behavior should be the same as before. One change is that the `runPolicy` will probably get more information about non-'send' cranks than before (e.g. `create-vat` might report metering). This refactoring should make it easier to implement #1848 vat-upgrade, as well as #3465 queueing changes. closes #4687
This moves some logic out of `deliverAndLogToVal` and into a higher-level dispatch function, where it can react to all failure modes in a single place. We pull "run-queue events" off the run-queue (currently named the "acceptance queue". Many of these events are aimed at a specific vat, including 'notify', but the primary one is 'send' and is aimed at a *target* kref, which is either a kernel object or a kernel promise. For these we use `routeSendEvent` to either queue the event on a promise queue, reject it if it can't be delivered, or deliver it to a specific vat. This is the part that will change with #3465: in that world, each vat-input-queue exists for a specific vat, so the routing step moves earlier (into a "routing crank" that pulls events from a vat-output-queue and possibly adds them to a vat-input-queue). Some kinds of deliveries care about metering (or don't), and some have opinions about what should happen if a crank gets unwound. These are now reported as additional properties in the DeliveryStatus object that is returned by this delivery path. `processDeliveryMessage` looks at these properties, plus the delivery status (if any) to decide what to do at the end of the crank. I think most of the behavior should be the same as before. One change is that the `runPolicy` will probably get more information about non-'send' cranks than before (e.g. `create-vat` might report metering). This refactoring should make it easier to implement #1848 vat-upgrade, as well as #3465 queueing changes. closes #4687
This moves some logic out of `deliverAndLogToVal` and into a higher-level dispatch function, where it can react to all failure modes in a single place. We pull "run-queue events" off the run-queue (currently named the "acceptance queue". Many of these events are aimed at a specific vat, including 'notify', but the primary one is 'send' and is aimed at a *target* kref, which is either a kernel object or a kernel promise. For these we use `routeSendEvent` to either queue the event on a promise queue, reject it if it can't be delivered, or deliver it to a specific vat. This is the part that will change with #3465: in that world, each vat-input-queue exists for a specific vat, so the routing step moves earlier (into a "routing crank" that pulls events from a vat-output-queue and possibly adds them to a vat-input-queue). Some kinds of deliveries care about metering (or don't), and some have opinions about what should happen if a crank gets unwound. These are now reported as additional properties in the DeliveryStatus object that is returned by this delivery path. `processDeliveryMessage` looks at these properties, plus the delivery status (if any) to decide what to do at the end of the crank. I think most of the behavior should be the same as before. One change is that the `runPolicy` will probably get more information about non-'send' cranks than before (e.g. `create-vat` might report metering). This refactoring should make it easier to implement #1848 vat-upgrade, as well as #3465 queueing changes. closes #4687
This moves some logic out of `deliverAndLogToVal` and into a higher-level dispatch function, where it can react to all failure modes in a single place. We pull "run-queue events" off the run-queue (currently named the "acceptance queue". Many of these events are aimed at a specific vat, including 'notify', but the primary one is 'send' and is aimed at a *target* kref, which is either a kernel object or a kernel promise. For these we use `routeSendEvent` to either queue the event on a promise queue, reject it if it can't be delivered, or deliver it to a specific vat. This is the part that will change with #3465: in that world, each vat-input-queue exists for a specific vat, so the routing step moves earlier (into a "routing crank" that pulls events from a vat-output-queue and possibly adds them to a vat-input-queue). Some kinds of deliveries care about metering (or don't), and some have opinions about what should happen if a crank gets unwound. These are now reported as additional properties in the DeliveryStatus object that is returned by this delivery path. `processDeliveryMessage` looks at these properties, plus the delivery status (if any) to decide what to do at the end of the crank. I think most of the behavior should be the same as before. One change is that the `runPolicy` will probably get more information about non-'send' cranks than before (e.g. `create-vat` might report metering). This refactoring should make it easier to implement #1848 vat-upgrade, as well as #3465 queueing changes. closes #4687
What is the Problem Being Solved?
As a baby step towards #23, we're going to start with a simple two-level (high-priority + low-priority) scheduler. Combined with per-vat queues (#5025), this should provide a starting point to enable high priority processing of economy elements (#4318).
Description of the Design
Vats will be marked with a priority, either low or high.
The scheduler used by
controller.step()
/controller.run()
will then process the active vats with high priority before considering vats with lower priority. To ensure that high priority processing doesn't starve low priority work, the scheduler may occasionally pick a low priority vat.At first all vats will be created with the same priority. A later change will enable vats with different priorities.
The text was updated successfully, but these errors were encountered: