Capture start/upgrade information in transcript #7199

mhofman · 2023-03-21T15:53:17Z

What is the Problem Being Solved?

In #6775 we decided to keep historical transcript metadata so that a Manchurian style replay (#1691) would be possible.

Such a replay may only be necessary for the latest incarnation of a vat. Or possibly the latest N incarnations

However with an upgradable vat, we currently lose the information of the source used to start previous vat incarnations, whether that's the lockdown and supervisor bundles (which will become upgradable), or the vat bundle itself. Since #7026, we also do not make it possible to discover at what position in the transcript the latest incarnation started from. We have also never stored the transcript positions of previous vat upgrades.

Description of the Design

Add a synthetic create-vat transcript entry as the first "delivery" of a span representing a new incarnation. This would reference the bundle ids of lockdown, supervisor and vat source.
Add metadata to the transcript spans indicating the incarnation number corresponding to the span, so that it's possible to get a list of spans representing the transcript of a single incarnation. (add incarnation to transcriptStore spans and items #7482)
Possibly add the same incarnation number to the snapstore. This is not strictly necessary, but could help future house keeping where we decide to in consensus drop the snapshot metadata associated to previous incarnations. This technically can be derived from the incarnation data we would add to the transcript spans (through position correlation)

Security Considerations

None I can think of, besides this information needing to survive export / import.

Scaling Considerations

None

Test Plan

TBD

The text was updated successfully, but these errors were encountered:

mhofman · 2023-03-21T15:54:06Z

One of the use cases is the extract-transcript-from-kernel-db tool, and is this issue is thus tangentially related to #6770

This introduces four new pseudo-delivery events to the transcript: * 'initialize-worker': a new empty worker is created * 'load-snapshot': a worker is loaded from heap snapshot * 'save-snapshot': we tell the worker to write a heap snapshot * 'shutdown-worker': we stop the worker (e.g. during upgrade) These events are not actually delivered to the worker: they are not VatDeliveryObjects. However many of them are implemented with commands to the worker (just not `deliver()` commands). The vat-warehouse records these events in the transcript to help subsequent (manual/external) replay tools know what happened. Without them, we'd need to deduce e.g. the heap-snapshot writing schedule by counting deliveries and comparing them against snapshotInitial/snapshotInterval . The 'save-snapshot'/'load-snapshot' pair indicates what a replay would do. It does not mean that the vat-warehouse actually tore down the old worker and immediately replaced it with a new one (from snapshot). It might choose to do that, or the worker itself might choose to replace its XS engine instance with a fresh one, or it might keep using the old engine. The 'save-snapshot' command has side-effects (it does a forced GC), so it is important to keep track of when it happened. The transcript is broken up into "spans", delimited by heap snapshots or upgrade-related shutdowns. To bring a worker up to date, we want to start a worker (either a blank one, or from a snapshot), and then replay the "current span". With this change, the current span always starts either with 'initialize-worker' or with 'load-snapshot', telling us exactly what needs to be done. The span then contains all the deliveries that must be replayed. The current span will never include a 'save-snapshot' or 'shutdown-worker': the span is closed immediately after those events are added, so replay will never see them. But a tool which replays a historical span will see them at the end. The types were improved to make `TranscriptDelivery` be a superset of `VatDeliveryObject`. We also record TranscriptDeliveryResult, which is currently a stripped down subset of VatDeliveryResult (just the "ok" status), except that save-snapshot includes the snapshot hash in its results. In the future, we'll probably record the deterministic subset of metering results (computrons, maybe something about memory allocation). refs #7199 refs #6770

This introduces four new pseudo-delivery events to the transcript: * 'initialize-worker': a new empty worker is created * 'load-snapshot': a worker is loaded from heap snapshot * 'save-snapshot': we tell the worker to write a heap snapshot * 'shutdown-worker': we stop the worker (e.g. during upgrade) These events are not actually delivered to the worker: they are not VatDeliveryObjects. However many of them are implemented with commands to the worker (but not `deliver()` commands). The vat-warehouse records these events in the transcript to help subsequent manual/external replay tools know what happened. Without them, we'd need to deduce e.g. the heap-snapshot writing schedule by counting deliveries and comparing them against snapshot initial/interval. The 'save-snapshot'/'load-snapshot' pair indicates what a replay would do. It does not mean that the vat-warehouse actually tore down the old worker and immediately replaced it with a new one (from snapshot). It might choose to do that, or the worker itself might choose to replace its XS engine instance with a fresh one, or it might keep using the old engine. The 'save-snapshot' command has side-effects (it does a forced GC), so it is important to keep track of when it happened. As before, the transcript is broken up into "spans", delimited by heap snapshots or upgrade-related shutdowns. To bring a worker up to date, we want to start a worker (either a blank one, or from a snapshot), and then replay the "current span". With this change, the current span always starts either with 'initialize-worker' or with 'load-snapshot', telling us exactly what needs to be done. The span then contains all the deliveries that must be replayed. Old spans will end with `save-snapshot` or `shutdown-worker`, but the current span will never include one of those: the span is closed immediately after those events are added. When the kernel replays a transcript to bring a worker up to date, that replay will never see 'save-snapshot' or 'shutdown-worker'. But an external tool which replays a historical span will see them at the end. The `initialize-worker` event contains `workerOptions` (which includes which type of worker is being used, as well as helper bundle IDs like lockdown and supervisor), as well as the `source.bundleID` for the vat bundle. The `save-snapshot` event results contain the `snapshotID` hash that was generated. The `load-snapshot` event includes the `snapshotID` in a record that could be extended with additional details in the future (like an xsnap version). The types were improved to make `TranscriptDelivery` be a superset of `VatDeliveryObject`. We also record TranscriptDeliveryResult, which is currently a stripped down subset of VatDeliveryResult (just the "ok" status), plus the save-snapshot hash. In the future, we'll probably record the deterministic subset of metering results (computrons, maybe something about memory allocation). In the slog, the `heap-snapshot-save` event details now contain `snapshotID` instead of `hash`, to be consistent. refs #7199 refs #6770

This introduces four new pseudo-delivery events to the transcript: * 'initialize-worker': a new empty worker is created * 'load-snapshot': a worker is loaded from heap snapshot * 'save-snapshot': we tell the worker to write a heap snapshot * 'shutdown-worker': we stop the worker (e.g. during upgrade) These events are not actually delivered to the worker: they are not VatDeliveryObjects. However many of them are implemented with commands to the worker (but not `deliver()` commands). The vat-warehouse records these events in the transcript to help subsequent manual/external replay tools know what happened. Without them, we'd need to deduce e.g. the heap-snapshot writing schedule by counting deliveries and comparing them against snapshot initial/interval. The 'save-snapshot'/'load-snapshot' pair indicates what a replay would do. It does not mean that the vat-warehouse actually tore down the old worker and immediately replaced it with a new one (from snapshot). It might choose to do that, or the worker itself might choose to replace its XS engine instance with a fresh one, or it might keep using the old engine. The 'save-snapshot' command has side-effects (it does a forced GC), so it is important to keep track of when it happened. As before, the transcript is broken up into "spans", delimited by heap snapshots or upgrade-related shutdowns. To bring a worker up to date, we want to start a worker (either a blank one, or from a snapshot), and then replay the "current span". With this change, the current span always starts either with 'initialize-worker' or with 'load-snapshot', telling us exactly what needs to be done. The span then contains all the deliveries that must be replayed. Old spans will end with `save-snapshot` or `shutdown-worker`, but the current span will never include one of those: the span is closed immediately after those events are added. When the kernel replays a transcript to bring a worker up to date, that replay will never see 'save-snapshot' or 'shutdown-worker'. But an external tool which replays a historical span will see them at the end. The `initialize-worker` event contains `workerOptions` (which includes which type of worker is being used, as well as helper bundle IDs like lockdown and supervisor), as well as the `source.bundleID` for the vat bundle. The `save-snapshot` event results contain the `snapshotID` hash that was generated. The `load-snapshot` event includes the `snapshotID` in a record that could be extended with additional details in the future (like an xsnap version). The types were improved to make `TranscriptDelivery` be a superset of `VatDeliveryObject`. We also record TranscriptDeliveryResult, which is currently a stripped down subset of VatDeliveryResult (just the "ok" status), plus the save-snapshot hash. In the future, we'll probably record the deterministic subset of metering results (computrons, maybe something about memory allocation). In the slog, the `heap-snapshot-save` event details now contain `snapshotID` instead of `hash`, to be consistent. Previously vat-warehouse used `lastVatID` to track which vat received a delivery most recently, and `saveSnapshot()` used that to decide which vat requires a snapshot. This commit changes that path to be more explicit, and removes `lastVatID`. refs #7199 refs #6770

warner · 2023-04-28T18:26:03Z

In #7484 we added the initialize-worker pseudo-event to the transcript, which includes the source.bundleID (i.e. bundle hash), and the complete workerOptions (both type: 'xsnap' and the helper bundleIDs for lockdown/supervisor).

#7506 added incarnation to both the transcript-span records and the transcript items themselves. There's no swingstore API for using them, but it would be trivial to SELECT * FROM transcriptItems WHERE vatID=? AND incarnation=?.

We did not add the incarnation ID to the snapstore. The approach that fits with my plan for transcripts is more like:

first = SELECT * FROM transcriptItems WHERE vatID=? AND incarnation=?
const { d } = JSON.parse(first)
const [ type, loadConfig ] = d
assert.equal(type, 'load-snapshot')
const { snapshotID } = loadConfig

We still record the snapPos with each snapshot row, so an alternative approach is to SELECT startPos FROM transcriptSpans WHERE .., compute snapPos = startPos - 1, and then SELECT * FROM snapshots WHERE vatID=? AND snapPos=?.

Given that, I'm declaring victory on this ticket.

mhofman added enhancement New feature or request SwingSet package: SwingSet labels Mar 21, 2023

mhofman assigned warner Mar 21, 2023

mhofman mentioned this issue Mar 21, 2023

make bundleStore accept nestedEvaluate bundles too #7190

Closed

warner mentioned this issue Apr 23, 2023

add transcript events: init, snapshot save/load, shutdown #7484

Merged

ivanlei added vaults_triage DO NOT USE and removed vaults_triage DO NOT USE labels Apr 27, 2023

ivanlei added this to the Vaults EVP milestone Apr 27, 2023

warner closed this as completed Apr 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Capture start/upgrade information in transcript #7199

Capture start/upgrade information in transcript #7199

mhofman commented Mar 21, 2023 •

edited by warner

Loading

mhofman commented Mar 21, 2023

warner commented Apr 28, 2023

Capture start/upgrade information in transcript #7199

Capture start/upgrade information in transcript #7199

Comments

mhofman commented Mar 21, 2023 • edited by warner Loading

What is the Problem Being Solved?

Description of the Design

Security Considerations

Scaling Considerations

Test Plan

mhofman commented Mar 21, 2023

warner commented Apr 28, 2023

mhofman commented Mar 21, 2023 •

edited by warner

Loading