Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add transcript events: init, snapshot save/load, shutdown #7484

Merged
merged 7 commits into from
Apr 27, 2023

Conversation

warner
Copy link
Member

@warner warner commented Apr 23, 2023

This introduces four new pseudo-delivery events to the transcript:

  • 'initialize-worker': a new empty worker is created
  • 'load-snapshot': a worker is loaded from heap snapshot
  • 'save-snapshot': we tell the worker to write a heap snapshot
  • 'shutdown-worker': we stop the worker (e.g. during upgrade)

These events are not actually delivered to the worker: they are not VatDeliveryObjects. However many of them are implemented with commands to the worker (just not deliver() commands). The vat-warehouse records these events in the transcript to help
subsequent (manual/external) replay tools know what happened. Without them, we'd need to deduce e.g. the heap-snapshot writing schedule by counting deliveries and comparing them against
snapshotInitial/snapshotInterval .

The 'save-snapshot'/'load-snapshot' pair indicates what a replay would do. It does not mean that the vat-warehouse actually tore down the old worker and immediately replaced it with a new one (from snapshot). It might choose to do that, or the worker itself might choose to replace its XS engine instance with a fresh one, or it might keep using the old engine. The 'save-snapshot' command has side-effects (it does a forced GC), so it is important to keep track of when it happened.

The transcript is broken up into "spans", delimited by heap snapshots or upgrade-related shutdowns. To bring a worker up to date, we want to start a worker (either a blank one, or from a snapshot), and then replay the "current span".

With this change, the current span always starts either with 'initialize-worker' or with 'load-snapshot', telling us exactly what needs to be done. The span then contains all the deliveries that must be replayed. The current span will never include a 'save-snapshot' or 'shutdown-worker': the span is closed immediately after those events are added, so replay will never see them. But a tool which replays a historical span will see them at the end.

The types were improved to make TranscriptDelivery be a superset of VatDeliveryObject. We also record TranscriptDeliveryResult, which is currently a stripped down subset of VatDeliveryResult (just the "ok" status), except that save-snapshot includes the snapshot hash in its results. In the future, we'll probably record the deterministic subset of metering results (computrons, maybe something about memory allocation).

refs #7199
refs #6770

@warner warner added the SwingSet package: SwingSet label Apr 23, 2023
@warner warner self-assigned this Apr 23, 2023
@warner
Copy link
Member Author

warner commented Apr 23, 2023

w.r.t. #6770:

  • it records the schedule of heap-snapshot saves
  • it does not record the computrons included in a delivery (added in 397e401)

I think we're in agreement that computrons should be in-consensus, so adding a TranscriptDeliveryGenericResults.metering.computrons would be appropriate.

w.r.t. #7199

  • it adds a transcript entry when initializing the worker
  • it does not record the helper bundles (lockdown+supervisor) used to start a new worker (added in d9dbcbb)
  • it does not record the vat source code evaluated after the worker is started (added in 1fdb998)
  • it does not record the incarnation number anywhere
    • although you could deduce it by counting calls to initialize-worker

For these, the new initialize-worker event should include bundleIDs for all three. Note that the helper bundles are XS-specific, but all workers will have a vat source bundleID. We should divide the data appropriately, maybe matching options.workerOptions. It might be appropriate to just include workerOptions in the event directly, then add a vat-source bundleID.

I think the transcript (as recorded in swing-store) should only have bundleIDs, not the full bundles. However it might make sense for a tool like extract-transcript-from-kerneldb.js to pull any bundles the transcript needs from bundleStore and write them into the generated .sst file, making it a stand-alone transcript artifact that no longer needs a kernelDB to run.

@warner warner requested a review from mhofman April 23, 2023 09:35
@mhofman
Copy link
Member

mhofman commented Apr 23, 2023

I think the transcript (as recorded in swing-store) should only have bundleIDs, not the full bundles. However it might make sense for a tool like extract-transcript-from-kerneldb.js to pull any bundles the transcript needs from bundleStore and write them into the generated .sst file, making it a stand-alone transcript artifact that no longer needs a kernelDB to run.

Agreed, and this is roughly what #7434 does: introduce a new tool that extracts the bundles, and keep referencing bundleId in the extracted transcript file.

@warner warner force-pushed the 6770-pseudo-deliveries branch from 3ee143a to 9e67ee6 Compare April 24, 2023 01:49
@warner warner marked this pull request as ready for review April 24, 2023 01:50
@warner
Copy link
Member Author

warner commented Apr 24, 2023

@mhofman : best reviewed commit-by-commit. The commit comments have explanations of the intended changes.

@warner warner force-pushed the 6770-pseudo-deliveries branch from 9e67ee6 to 93e0a6a Compare April 24, 2023 01:52
@warner
Copy link
Member Author

warner commented Apr 24, 2023

cc @FUDCo

Copy link
Member

@mhofman mhofman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple minor nits. This is awesome!

packages/SwingSet/src/kernel/state/vatKeeper.js Outdated Show resolved Hide resolved
packages/SwingSet/src/kernel/state/vatKeeper.js Outdated Show resolved Hide resolved
packages/SwingSet/src/kernel/vat-warehouse.js Outdated Show resolved Hide resolved
packages/SwingSet/src/kernel/vat-warehouse.js Show resolved Hide resolved
packages/SwingSet/src/kernel/vat-warehouse.js Show resolved Hide resolved
packages/SwingSet/test/test-controller.js Show resolved Hide resolved
@warner warner force-pushed the 6770-pseudo-deliveries branch 3 times, most recently from 8ea57b6 to 719ee82 Compare April 26, 2023 00:37
@warner warner added automerge:rebase Automatically rebase updates, then merge and removed automerge:rebase Automatically rebase updates, then merge labels Apr 26, 2023
@warner
Copy link
Member Author

warner commented Apr 26, 2023

The deployment test failure revealed a bug: swingstore imports insist upon snapshotInfo.endPos === transcriptInfo.startPos, but now the save-snapshot event appears in the transcript, so the current span's startPos will be snapshotInfo.endPos + 1.

@warner warner force-pushed the 6770-pseudo-deliveries branch from 719ee82 to b91f3ad Compare April 26, 2023 05:35
@warner warner requested a review from FUDCo April 26, 2023 05:35
@warner
Copy link
Member Author

warner commented Apr 26, 2023

@FUDCo could you look at the last commit in particular? In the new approach, each old span looks like e.g.:

  • 4: load-snapshot
  • 5: message
  • 6: message
  • 7: save-snapshot

followed by a current span like:

  • 8: load-snapshot
  • 9: message

Which means the latest snapshot will be from .endPos = 7, but the latest span will have .startPos = 8. Previously, these two values were expected to be the same, because there was no save-snapshot event in the transcript (nor did it consume a deliveryNum, which is now better described as a transcriptNum).

This tripped a consistency check in the swingstore importer, but the only test that actually triggered it was the deployment-test (the one that runs at the very very end of a CI run, iff we're on the merge train). The swing-store test-exportImport.js has code which pretends to be the kernel (doing snapshot writes and span rollovers), but I missed that. If I'd properly modified that pretend-kernel code to increment the position counter in the same way the kernel now does, then test-exportImport.js would have caught this problem.

To reduce the chances I miss this again, I added a more direct test to swingset itself, which creates a state-sync export from a short-lived kernel, and then tries to load it back in. I tested that the new test fails if I revert the swingstore-importer consistency check to it's previous "positions must be equal" form.

Please double-check that I updated the test properly, and that you agree with my new consistency check.

@FUDCo
Copy link
Contributor

FUDCo commented Apr 26, 2023

Which means the latest snapshot will be from .endPos = 7, but the latest span will have .startPos = 8. Previously, these two values were expected to be the same, because there was no save-snapshot event in the transcript (nor did it consume a deliveryNum, which is now better described as a transcriptNum).

I haven't yet looked at the code (next in my queue) but I think this is a mistake. My take is there shouldn't be anything like that happening. The special entries you are adding to the transcript shouldn't be special from the transcriptStore's point of view; they're just entries. Remember that endPos is exclusive and startPos is inclusive, so the endPos of one (finished) span and the startPos of the subsequent span should always be equal.

Copy link
Contributor

@FUDCo FUDCo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add comments once I've read the code, but since @mhofman approved I'm preemptively clicking the "Request changes" button so this doesn't land until we sort out the startPos/endPos thing, which from your description is just broken.

@warner
Copy link
Member Author

warner commented Apr 26, 2023

take a look.. the check is comparing the snapshot endPos (which used to be the number just past the snapshot was taken, because it was the not-yet-used number at that moment, and is now the number of the snapshot, save-snapshot is now a fully-fledged member of the numberspace) against the current-transcript startPos. By asserting that they were equal, we were saying that there should be no deliveries after the snapshot was taken, which used to be true. But now, the snapshot should be the last "delivery" of the old transcript, which means the current transcript should start at one higher than the snapshot, hence the new +1 check.

Comment on lines +62 to +71
dtype === 'save-snapshot' ||
dtype === 'load-snapshot' ||
dtype === 'initialize-worker' ||
dtype === 'shutdown-worker'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might it be safer under maintenance to look for and then allow the kinds of entries that are allowed under replay rather than looking for and then rejecting the ones that are prohibited? It's one less special if that we need to remember to update if we add another special entry type.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. The list of ordinary deliveries lives in the VatDeliveryObject type in liveslots:

* @typedef { [tag: 'message', target: string, msg: Message]} VatDeliveryMessage
* @typedef { [vpid: string, isReject: boolean, data: SwingSetCapData ] } VatOneResolution
* @typedef { [tag: 'notify', resolutions: VatOneResolution[] ]} VatDeliveryNotify
* @typedef { [tag: 'dropExports', vrefs: string[] ]} VatDeliveryDropExports
* @typedef { [tag: 'retireExports', vrefs: string[] ]} VatDeliveryRetireExports
* @typedef { [tag: 'retireImports', vrefs: string[] ]} VatDeliveryRetireImports
* @typedef { [tag: 'changeVatOptions', options: Record<string, unknown> ]} VatDeliveryChangeVatOptions
* @typedef { [tag: 'startVat', vatParameters: SwingSetCapData ]} VatDeliveryStartVat
* @typedef { [tag: 'stopVat', disconnectObject: SwingSetCapData ]} VatDeliveryStopVat
* @typedef { [tag: 'bringOutYourDead' ]} VatDeliveryBringOutYourDead
* @typedef { VatDeliveryMessage | VatDeliveryNotify | VatDeliveryDropExports
* | VatDeliveryRetireExports | VatDeliveryRetireImports | VatDeliveryChangeVatOptions
* | VatDeliveryStartVat | VatDeliveryStopVat | VatDeliveryBringOutYourDead
* } VatDeliveryObject

and currently contains 9 types: message, notify, dropExports, retireExports, retireImports, changeVatOptions, startVat, stopVat, bringOutYourDead. To do it right, we'd need liveslots to export a (non-type) list of normal ones. Except that different vats can be using different versions of liveslots, so if we e.g. add one in the future, then this vat-warehouse.js check would need to fetch the right list (vatKeeper.getLiveslotsSettings??).

It feels a bit more future-proof to have swingset reject the pseudo-delivery types that swingset has defined (since no version of liveslots will know about any of these), than to have it only accept the normal-delivery types that liveslots (which version?) defines.

But it would feel a bit better if swingset had a concrete list of these pseudo-delivery types, rather than vat-warehouse having a special if statement that is supposed to know about them. I'm not sure where that list ought to live, though, they're nominally defined in types-internal.js:

* @typedef { [tag: 'initialize-worker', options: TranscriptDeliveryInitializeWorkerOptions] } TranscriptDeliveryInitializeWorker
* @typedef { [tag: 'save-snapshot'] } TranscriptDeliverySaveSnapshot
* @typedef { { snapshotID: string } } TranscriptDeliverySnapshotConfig
* @typedef { [tag: 'load-snapshot', config: TranscriptDeliverySnapshotConfig] } TranscriptDeliveryLoadSnapshot
* @typedef { [tag: 'shutdown-worker'] } TranscriptDeliveryShutdownWorker
* @typedef { VatDeliveryObject
* | TranscriptDeliveryInitializeWorker
* | TranscriptDeliverySaveSnapshot
* | TranscriptDeliveryLoadSnapshot
* | TranscriptDeliveryShutdownWorker
* } TranscriptDelivery

but of course types aren't available for introspection by code. I suppose I could add a real export to types-internal.js, with a list of names? Or a Set, which might allow for faster lookup?

Anyways, I'm inclined to defer this to the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't there simply be an isDeliverable predicate?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If isDeliverable would live in vat-warehouse.js, then that's basically what onlyRealDelivery is, except that it takes a transcriptEntry and returns a delivery object:

/**
 * @param {TranscriptEntry} transcriptEntry
 * @returns {VatDeliveryObject}
 */
function onlyRealDelivery(transcriptEntry) {
  const dtype = transcriptEntry.d[0];
  if (
    dtype === 'save-snapshot' ||
    dtype === 'load-snapshot' ||
    dtype === 'initialize-worker' ||
    dtype === 'shutdown-worker'
  ) {
    throw Fail`replay should not see ${dtype}`;
  }
  return transcriptEntry.d;
}

If it should live in liveslots, then we have the question of which (version of) liveslots it lives in. The kernel doesn't talk directly to liveslots, it talks to a vat which may or may not contain (some version of) liveslots. So there's no simple way for packages/liveslots to export a predicate that the kernel can query (no liveslots code is ever evaluated inside the kernel's Compartment).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect this to be a kernel predicate, as it's about purely kernel-side stuff that the kernel wants to avoid delivering to the vat. My concern was simply that an allow-list is safer than a deny-list if we introduce another kernel-side special pseudo-deliverable thingie in the future. If somebody does that and misses the need to update the onlyRealDelivery function here, then we could end up dropping something into a vat that could break it. The list of things that can be delivered into a vat is likely to be more stable, as there's no way to introduce a new actual deliverable type without coordination with the vat anyway.

To reiterate, this is purely a hygiene thing and doesn't touch on fundamental semantics.

@@ -1073,7 +1073,7 @@ export async function importSwingStore(exporter, dirPath = null, options = {}) {
transcriptInfo.startPos === 0 ||
Fail`missing current snapshot for vat ${q(vatID)}`;
} else {
snapshotInfo.endPos === transcriptInfo.startPos ||
snapshotInfo.endPos + 1 === transcriptInfo.startPos ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On deeper thought I'm reasonably convinced this is a mistake and that the old invariant check is the correct one. The thing that's adding the snapshot entry into the transcript should be incrementing the endPos value the same as any other addition to the transcript. After looking at the code that's adding these special entries, I'm puzzled by why that's not happening, but I think something is wrong there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gave this some thoughts, and I believe this change is logically correct. However it highlights a potential problem with this PR
When we create a snapshot, we record the current transcript position endPos.
Now with this PR when the snapshot is created, we also insert a transcript entry with the snapshot details, then roll over the span.
It used to be that the new span started at that endPos, but now it starts at that endPos + 1 because of the extra transcript entry.

Either we accept that difference (which has the nice property that the snapshot position is the position of the corresponding snapshot-save transcript entry), or we record the snapshot position as endPos + 1 instead to keep the snapshotPos === next transcript span start.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just don't get this. The endPos isn't part of the snapshot, so I don't see how the timing of the snapshot enters into the discussion at all.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So Brian and I had a conversation about this and I figured out the source of my confusion, which I'm recording here for the benefit of anybody else who might have the same confusion in the future. The confusion stemmed from the fact that the name endPos is being used in two different contexts, and Brian was referring to one context while I was interpreting it in the other one.

In context 1, the transcript span record, endPos is the (open) upper end of the range of transcript items that the span covers. In context 2, the snapshot record (note: NOT in the snapshot itself), endPos identifies the snapshot within a series of snapshots for a given vat. We chose to use the end position for this purpose because it uniquely identifies the snapshot relative to its vat, conveniently sorts in the temporal order that the snapshots were taken, and makes it easy to locate where the snapshot fits in the timeline that is the sequence of events in the transcript item table. In particular, vat replay relies on the latter because replay's model of the state of a vat is in terms of a point in time T where the snapshot captures all the state earlier than T and the transcript captures all the state changes later than T. The value endPos was, in effect, what we were storing as our representation of T, simply by noting the transcript's endPos value in the snapshot record when we make the snapshot. Consequently it made sense as a consistency check to verify that the snapshot's T (looking backwards) was the same as the T (looking forwards) that begins the following transcript span. Because this was the (prior) transcript span's end position, using the same field name, endPos also made sense. However, by recording the snapshot action itself as an event in the transcript, we end up incrementing the span's endPos value (in the transcript span record, context 1) but the value stored into the snapshot record (context 2) was the value that was captured at the time the snapshot was initiated, i.e., prior to the increment operation. This is what caused the off-by-one that motivated Brian to change the endPos consistency check in the way that I was objecting to.

In principle there are several ways to deal with the off-by-one. One would be what Brian did. Another would be to pass endPos + 1 into the function that writes the snapshot record or for that function to increment its copy of that value prior to storing it. Both of the latter things, however, would read a bit weird in the code, so I think the modified consistency check is the right choice. However, we would probably be well served to change the name of the field in the snapshot record from endPos to, say, snapshotPos to make it clear that what's being recorded is the time of the snapshot without entangling it with the transcript span record via name punning.

@warner warner force-pushed the 6770-pseudo-deliveries branch from b91f3ad to 0a33e8f Compare April 26, 2023 20:08
@FUDCo FUDCo self-requested a review April 26, 2023 20:14
Copy link
Contributor

@FUDCo FUDCo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point my concerns have been resolved to my satisfaction.

@warner warner force-pushed the 6770-pseudo-deliveries branch 6 times, most recently from 63521fb to ab644de Compare April 26, 2023 23:33
@warner warner added the automerge:rebase Automatically rebase updates, then merge label Apr 27, 2023
@warner warner force-pushed the 6770-pseudo-deliveries branch from ab644de to 6cad69c Compare April 27, 2023 00:42
warner added 7 commits April 27, 2023 01:12
These didn't need to be exposed in types-external.js
This introduces four new pseudo-delivery events to the transcript:

* 'initialize-worker': a new empty worker is created
* 'load-snapshot': a worker is loaded from heap snapshot
* 'save-snapshot': we tell the worker to write a heap snapshot
* 'shutdown-worker': we stop the worker (e.g. during upgrade)

These events are not actually delivered to the worker: they are not
VatDeliveryObjects. However many of them are implemented with commands
to the worker (but not `deliver()` commands). The vat-warehouse
records these events in the transcript to help subsequent
manual/external replay tools know what happened. Without them, we'd
need to deduce e.g. the heap-snapshot writing schedule by counting
deliveries and comparing them against snapshot initial/interval.

The 'save-snapshot'/'load-snapshot' pair indicates what a replay would
do. It does not mean that the vat-warehouse actually tore down the old
worker and immediately replaced it with a new one (from snapshot). It
might choose to do that, or the worker itself might choose to replace
its XS engine instance with a fresh one, or it might keep using the
old engine. The 'save-snapshot' command has side-effects (it does a
forced GC), so it is important to keep track of when it happened.

As before, the transcript is broken up into "spans", delimited by heap
snapshots or upgrade-related shutdowns. To bring a worker up to date,
we want to start a worker (either a blank one, or from a snapshot),
and then replay the "current span".

With this change, the current span always starts either with
'initialize-worker' or with 'load-snapshot', telling us exactly what
needs to be done. The span then contains all the deliveries that must
be replayed. Old spans will end with `save-snapshot` or
`shutdown-worker`, but the current span will never include one of
those: the span is closed immediately after those events are
added. When the kernel replays a transcript to bring a worker up to
date, that replay will never see 'save-snapshot' or
'shutdown-worker'. But an external tool which replays a historical
span will see them at the end.

The `initialize-worker` event contains `workerOptions` (which includes
which type of worker is being used, as well as helper bundle IDs like
lockdown and supervisor), as well as the `source.bundleID` for the vat
bundle.

The `save-snapshot` event results contain the `snapshotID` hash that
was generated. The `load-snapshot` event includes the `snapshotID` in
a record that could be extended with additional details in the
future (like an xsnap version).

The types were improved to make `TranscriptDelivery` be a superset of
`VatDeliveryObject`. We also record TranscriptDeliveryResult, which is
currently a stripped down subset of VatDeliveryResult (just the "ok"
status), plus the save-snapshot hash. In the future, we'll probably
record the deterministic subset of metering results (computrons, maybe
something about memory allocation).

In the slog, the `heap-snapshot-save` event details now contain
`snapshotID` instead of `hash`, to be consistent.

Previously vat-warehouse used `lastVatID` to track which vat received
a delivery most recently, and `saveSnapshot()` used that to decide
which vat requires a snapshot. This commit changes that path to be
more explicit, and removes `lastVatID`.

refs #7199
refs #6770
Any delivery that results in `metering.compute` (i.e. anything to an
xsnap worker) will record that number as `metering.computrons` in the
transcript delivery results.

Our standard replay code doesn't do anything with this:
`replayTranscript` ignores the old results. But it gives us something
to compare against during external/diagnostic replays.
`importSwingStore()` has a consistency check which, for each vat,
compares the `endPos` of the current snapshot against the `startPos`
of the current transcript span.

Now that the `save-snapshot` pseudo-entry is the last one of a
transcript, these two values are expected to differ by one.

Fixed the check, updated swingstore's test-exportImport.js to behave
more like the real swingset, and added a new swingset test to exercise
the whole run/export/import/run cycle, so we can discover this sort of
thing locally in swingset's tests. Previously this only caused a
failure of the "deployment-test", which doesn't run until all other CI
has passed and the PR is at the front of the merge queue.
This should reduce the chances of confusing it with
`transcriptSpan.endPos`. The two numbers are related: `snapPos` is the
same as `endPos` in the brief moment when we've just written out the
snapshot, but we immediately append a `save-snapshot` event, causing
`endPos` to increment. The `endPos` of the transcript span, being an
exclusive-right-edge kind of number, will be one larger than the last
item in the span, which means it will also be one larger than the
`snapPos`.
@warner warner force-pushed the 6770-pseudo-deliveries branch from 6cad69c to d59d930 Compare April 27, 2023 01:12
@mergify mergify bot merged commit e566049 into master Apr 27, 2023
@mergify mergify bot deleted the 6770-pseudo-deliveries branch April 27, 2023 01:52
warner added a commit that referenced this pull request Apr 29, 2023
When I added pseudo-deliveries to the transcript in #7484, I forgot to
update the slog entry `deliveryNum` counter to match. As result, the
`startVat` was deliveryNum 0 but transcript position 1 (since 0 is the
`initialize-worker`). The gap grew by 2 after every snapshot save/load
event pair.

We don't slog the pseudo-deliveries as `.type: 'deliver'` (they're
recorded other ways). While it will be weird to have gaps in the
slogfile's `deliveryNum` sequence, I think it's better to maintain
easy correlation between the slog and the transcript.

So this changes `vatKeeper.nextDeliveryNum()`. Previously, that
function maintained a separate counter (in `kvStore`) for each vat,
which was only used by slog entries. Now, it queries the
transcriptStore, with `getCurrentSpanBounds().endPos`. This is only
incremented when we add an entry to the transcript, so the tests of
`nextDeliveryNum()`'s auto-increment feature were updated too.

closes #7549
mergify bot pushed a commit that referenced this pull request Apr 30, 2023
When I added pseudo-deliveries to the transcript in #7484, I forgot to
update the slog entry `deliveryNum` counter to match. As result, the
`startVat` was deliveryNum 0 but transcript position 1 (since 0 is the
`initialize-worker`). The gap grew by 2 after every snapshot save/load
event pair.

We don't slog the pseudo-deliveries as `.type: 'deliver'` (they're
recorded other ways). While it will be weird to have gaps in the
slogfile's `deliveryNum` sequence, I think it's better to maintain
easy correlation between the slog and the transcript.

So this changes `vatKeeper.nextDeliveryNum()`. Previously, that
function maintained a separate counter (in `kvStore`) for each vat,
which was only used by slog entries. Now, it queries the
transcriptStore, with `getCurrentSpanBounds().endPos`. This is only
incremented when we add an entry to the transcript, so the tests of
`nextDeliveryNum()`'s auto-increment feature were updated too.

closes #7549
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
automerge:rebase Automatically rebase updates, then merge SwingSet package: SwingSet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants