Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Capture start/upgrade information in transcript #7199

Closed
mhofman opened this issue Mar 21, 2023 · 2 comments
Closed

Capture start/upgrade information in transcript #7199

mhofman opened this issue Mar 21, 2023 · 2 comments
Assignees
Labels
enhancement New feature or request SwingSet package: SwingSet vaults_triage DO NOT USE
Milestone

Comments

@mhofman
Copy link
Member

mhofman commented Mar 21, 2023

What is the Problem Being Solved?

In #6775 we decided to keep historical transcript metadata so that a Manchurian style replay (#1691) would be possible.

Such a replay may only be necessary for the latest incarnation of a vat. Or possibly the latest N incarnations

However with an upgradable vat, we currently lose the information of the source used to start previous vat incarnations, whether that's the lockdown and supervisor bundles (which will become upgradable), or the vat bundle itself. Since #7026, we also do not make it possible to discover at what position in the transcript the latest incarnation started from. We have also never stored the transcript positions of previous vat upgrades.

Description of the Design

  • Add a synthetic create-vat transcript entry as the first "delivery" of a span representing a new incarnation. This would reference the bundle ids of lockdown, supervisor and vat source.
  • Add metadata to the transcript spans indicating the incarnation number corresponding to the span, so that it's possible to get a list of spans representing the transcript of a single incarnation. (add incarnation to transcriptStore spans and items #7482)
  • Possibly add the same incarnation number to the snapstore. This is not strictly necessary, but could help future house keeping where we decide to in consensus drop the snapshot metadata associated to previous incarnations. This technically can be derived from the incarnation data we would add to the transcript spans (through position correlation)

Security Considerations

None I can think of, besides this information needing to survive export / import.

Scaling Considerations

None

Test Plan

TBD

@mhofman mhofman added enhancement New feature or request SwingSet package: SwingSet labels Mar 21, 2023
@mhofman
Copy link
Member Author

mhofman commented Mar 21, 2023

One of the use cases is the extract-transcript-from-kernel-db tool, and is this issue is thus tangentially related to #6770

warner added a commit that referenced this issue Apr 23, 2023
This introduces four new pseudo-delivery events to the transcript:

* 'initialize-worker': a new empty worker is created
* 'load-snapshot': a worker is loaded from heap snapshot
* 'save-snapshot': we tell the worker to write a heap snapshot
* 'shutdown-worker': we stop the worker (e.g. during upgrade)

These events are not actually delivered to the worker: they are not
VatDeliveryObjects. However many of them are implemented with commands
to the worker (just not `deliver()` commands). The vat-warehouse
records these events in the transcript to help
subsequent (manual/external) replay tools know what happened. Without
them, we'd need to deduce e.g. the heap-snapshot writing schedule by
counting deliveries and comparing them against
snapshotInitial/snapshotInterval .

The 'save-snapshot'/'load-snapshot' pair indicates what a replay would
do. It does not mean that the vat-warehouse actually tore down the old
worker and immediately replaced it with a new one (from snapshot). It
might choose to do that, or the worker itself might choose to replace
its XS engine instance with a fresh one, or it might keep using the
old engine. The 'save-snapshot' command has side-effects (it does a
forced GC), so it is important to keep track of when it happened.

The transcript is broken up into "spans", delimited by heap snapshots
or upgrade-related shutdowns. To bring a worker up to date, we want to
start a worker (either a blank one, or from a snapshot), and then
replay the "current span".

With this change, the current span always starts either with
'initialize-worker' or with 'load-snapshot', telling us exactly what
needs to be done. The span then contains all the deliveries that must
be replayed. The current span will never include a 'save-snapshot' or
'shutdown-worker': the span is closed immediately after those events
are added, so replay will never see them. But a tool which replays a
historical span will see them at the end.

The types were improved to make `TranscriptDelivery` be a superset of
`VatDeliveryObject`. We also record TranscriptDeliveryResult, which is
currently a stripped down subset of VatDeliveryResult (just the "ok"
status), except that save-snapshot includes the snapshot hash in its
results. In the future, we'll probably record the deterministic subset
of metering results (computrons, maybe something about memory
allocation).

refs #7199
refs #6770
warner added a commit that referenced this issue Apr 24, 2023
This introduces four new pseudo-delivery events to the transcript:

* 'initialize-worker': a new empty worker is created
* 'load-snapshot': a worker is loaded from heap snapshot
* 'save-snapshot': we tell the worker to write a heap snapshot
* 'shutdown-worker': we stop the worker (e.g. during upgrade)

These events are not actually delivered to the worker: they are not
VatDeliveryObjects. However many of them are implemented with commands
to the worker (but not `deliver()` commands). The vat-warehouse
records these events in the transcript to help subsequent
manual/external replay tools know what happened. Without them, we'd
need to deduce e.g. the heap-snapshot writing schedule by counting
deliveries and comparing them against snapshot initial/interval.

The 'save-snapshot'/'load-snapshot' pair indicates what a replay would
do. It does not mean that the vat-warehouse actually tore down the old
worker and immediately replaced it with a new one (from snapshot). It
might choose to do that, or the worker itself might choose to replace
its XS engine instance with a fresh one, or it might keep using the
old engine. The 'save-snapshot' command has side-effects (it does a
forced GC), so it is important to keep track of when it happened.

As before, the transcript is broken up into "spans", delimited by heap
snapshots or upgrade-related shutdowns. To bring a worker up to date,
we want to start a worker (either a blank one, or from a snapshot),
and then replay the "current span".

With this change, the current span always starts either with
'initialize-worker' or with 'load-snapshot', telling us exactly what
needs to be done. The span then contains all the deliveries that must
be replayed. Old spans will end with `save-snapshot` or
`shutdown-worker`, but the current span will never include one of
those: the span is closed immediately after those events are
added. When the kernel replays a transcript to bring a worker up to
date, that replay will never see 'save-snapshot' or
'shutdown-worker'. But an external tool which replays a historical
span will see them at the end.

The `initialize-worker` event contains `workerOptions` (which includes
which type of worker is being used, as well as helper bundle IDs like
lockdown and supervisor), as well as the `source.bundleID` for the vat
bundle.

The `save-snapshot` event results contain the `snapshotID` hash that
was generated. The `load-snapshot` event includes the `snapshotID` in
a record that could be extended with additional details in the
future (like an xsnap version).

The types were improved to make `TranscriptDelivery` be a superset of
`VatDeliveryObject`. We also record TranscriptDeliveryResult, which is
currently a stripped down subset of VatDeliveryResult (just the "ok"
status), plus the save-snapshot hash. In the future, we'll probably
record the deterministic subset of metering results (computrons, maybe
something about memory allocation).

In the slog, the `heap-snapshot-save` event details now contain
`snapshotID` instead of `hash`, to be consistent.

refs #7199
refs #6770
warner added a commit that referenced this issue Apr 24, 2023
This introduces four new pseudo-delivery events to the transcript:

* 'initialize-worker': a new empty worker is created
* 'load-snapshot': a worker is loaded from heap snapshot
* 'save-snapshot': we tell the worker to write a heap snapshot
* 'shutdown-worker': we stop the worker (e.g. during upgrade)

These events are not actually delivered to the worker: they are not
VatDeliveryObjects. However many of them are implemented with commands
to the worker (but not `deliver()` commands). The vat-warehouse
records these events in the transcript to help subsequent
manual/external replay tools know what happened. Without them, we'd
need to deduce e.g. the heap-snapshot writing schedule by counting
deliveries and comparing them against snapshot initial/interval.

The 'save-snapshot'/'load-snapshot' pair indicates what a replay would
do. It does not mean that the vat-warehouse actually tore down the old
worker and immediately replaced it with a new one (from snapshot). It
might choose to do that, or the worker itself might choose to replace
its XS engine instance with a fresh one, or it might keep using the
old engine. The 'save-snapshot' command has side-effects (it does a
forced GC), so it is important to keep track of when it happened.

As before, the transcript is broken up into "spans", delimited by heap
snapshots or upgrade-related shutdowns. To bring a worker up to date,
we want to start a worker (either a blank one, or from a snapshot),
and then replay the "current span".

With this change, the current span always starts either with
'initialize-worker' or with 'load-snapshot', telling us exactly what
needs to be done. The span then contains all the deliveries that must
be replayed. Old spans will end with `save-snapshot` or
`shutdown-worker`, but the current span will never include one of
those: the span is closed immediately after those events are
added. When the kernel replays a transcript to bring a worker up to
date, that replay will never see 'save-snapshot' or
'shutdown-worker'. But an external tool which replays a historical
span will see them at the end.

The `initialize-worker` event contains `workerOptions` (which includes
which type of worker is being used, as well as helper bundle IDs like
lockdown and supervisor), as well as the `source.bundleID` for the vat
bundle.

The `save-snapshot` event results contain the `snapshotID` hash that
was generated. The `load-snapshot` event includes the `snapshotID` in
a record that could be extended with additional details in the
future (like an xsnap version).

The types were improved to make `TranscriptDelivery` be a superset of
`VatDeliveryObject`. We also record TranscriptDeliveryResult, which is
currently a stripped down subset of VatDeliveryResult (just the "ok"
status), plus the save-snapshot hash. In the future, we'll probably
record the deterministic subset of metering results (computrons, maybe
something about memory allocation).

In the slog, the `heap-snapshot-save` event details now contain
`snapshotID` instead of `hash`, to be consistent.

Previously vat-warehouse used `lastVatID` to track which vat received
a delivery most recently, and `saveSnapshot()` used that to decide
which vat requires a snapshot. This commit changes that path to be
more explicit, and removes `lastVatID`.

refs #7199
refs #6770
warner added a commit that referenced this issue Apr 25, 2023
This introduces four new pseudo-delivery events to the transcript:

* 'initialize-worker': a new empty worker is created
* 'load-snapshot': a worker is loaded from heap snapshot
* 'save-snapshot': we tell the worker to write a heap snapshot
* 'shutdown-worker': we stop the worker (e.g. during upgrade)

These events are not actually delivered to the worker: they are not
VatDeliveryObjects. However many of them are implemented with commands
to the worker (but not `deliver()` commands). The vat-warehouse
records these events in the transcript to help subsequent
manual/external replay tools know what happened. Without them, we'd
need to deduce e.g. the heap-snapshot writing schedule by counting
deliveries and comparing them against snapshot initial/interval.

The 'save-snapshot'/'load-snapshot' pair indicates what a replay would
do. It does not mean that the vat-warehouse actually tore down the old
worker and immediately replaced it with a new one (from snapshot). It
might choose to do that, or the worker itself might choose to replace
its XS engine instance with a fresh one, or it might keep using the
old engine. The 'save-snapshot' command has side-effects (it does a
forced GC), so it is important to keep track of when it happened.

As before, the transcript is broken up into "spans", delimited by heap
snapshots or upgrade-related shutdowns. To bring a worker up to date,
we want to start a worker (either a blank one, or from a snapshot),
and then replay the "current span".

With this change, the current span always starts either with
'initialize-worker' or with 'load-snapshot', telling us exactly what
needs to be done. The span then contains all the deliveries that must
be replayed. Old spans will end with `save-snapshot` or
`shutdown-worker`, but the current span will never include one of
those: the span is closed immediately after those events are
added. When the kernel replays a transcript to bring a worker up to
date, that replay will never see 'save-snapshot' or
'shutdown-worker'. But an external tool which replays a historical
span will see them at the end.

The `initialize-worker` event contains `workerOptions` (which includes
which type of worker is being used, as well as helper bundle IDs like
lockdown and supervisor), as well as the `source.bundleID` for the vat
bundle.

The `save-snapshot` event results contain the `snapshotID` hash that
was generated. The `load-snapshot` event includes the `snapshotID` in
a record that could be extended with additional details in the
future (like an xsnap version).

The types were improved to make `TranscriptDelivery` be a superset of
`VatDeliveryObject`. We also record TranscriptDeliveryResult, which is
currently a stripped down subset of VatDeliveryResult (just the "ok"
status), plus the save-snapshot hash. In the future, we'll probably
record the deterministic subset of metering results (computrons, maybe
something about memory allocation).

In the slog, the `heap-snapshot-save` event details now contain
`snapshotID` instead of `hash`, to be consistent.

Previously vat-warehouse used `lastVatID` to track which vat received
a delivery most recently, and `saveSnapshot()` used that to decide
which vat requires a snapshot. This commit changes that path to be
more explicit, and removes `lastVatID`.

refs #7199
refs #6770
warner added a commit that referenced this issue Apr 26, 2023
This introduces four new pseudo-delivery events to the transcript:

* 'initialize-worker': a new empty worker is created
* 'load-snapshot': a worker is loaded from heap snapshot
* 'save-snapshot': we tell the worker to write a heap snapshot
* 'shutdown-worker': we stop the worker (e.g. during upgrade)

These events are not actually delivered to the worker: they are not
VatDeliveryObjects. However many of them are implemented with commands
to the worker (but not `deliver()` commands). The vat-warehouse
records these events in the transcript to help subsequent
manual/external replay tools know what happened. Without them, we'd
need to deduce e.g. the heap-snapshot writing schedule by counting
deliveries and comparing them against snapshot initial/interval.

The 'save-snapshot'/'load-snapshot' pair indicates what a replay would
do. It does not mean that the vat-warehouse actually tore down the old
worker and immediately replaced it with a new one (from snapshot). It
might choose to do that, or the worker itself might choose to replace
its XS engine instance with a fresh one, or it might keep using the
old engine. The 'save-snapshot' command has side-effects (it does a
forced GC), so it is important to keep track of when it happened.

As before, the transcript is broken up into "spans", delimited by heap
snapshots or upgrade-related shutdowns. To bring a worker up to date,
we want to start a worker (either a blank one, or from a snapshot),
and then replay the "current span".

With this change, the current span always starts either with
'initialize-worker' or with 'load-snapshot', telling us exactly what
needs to be done. The span then contains all the deliveries that must
be replayed. Old spans will end with `save-snapshot` or
`shutdown-worker`, but the current span will never include one of
those: the span is closed immediately after those events are
added. When the kernel replays a transcript to bring a worker up to
date, that replay will never see 'save-snapshot' or
'shutdown-worker'. But an external tool which replays a historical
span will see them at the end.

The `initialize-worker` event contains `workerOptions` (which includes
which type of worker is being used, as well as helper bundle IDs like
lockdown and supervisor), as well as the `source.bundleID` for the vat
bundle.

The `save-snapshot` event results contain the `snapshotID` hash that
was generated. The `load-snapshot` event includes the `snapshotID` in
a record that could be extended with additional details in the
future (like an xsnap version).

The types were improved to make `TranscriptDelivery` be a superset of
`VatDeliveryObject`. We also record TranscriptDeliveryResult, which is
currently a stripped down subset of VatDeliveryResult (just the "ok"
status), plus the save-snapshot hash. In the future, we'll probably
record the deterministic subset of metering results (computrons, maybe
something about memory allocation).

In the slog, the `heap-snapshot-save` event details now contain
`snapshotID` instead of `hash`, to be consistent.

Previously vat-warehouse used `lastVatID` to track which vat received
a delivery most recently, and `saveSnapshot()` used that to decide
which vat requires a snapshot. This commit changes that path to be
more explicit, and removes `lastVatID`.

refs #7199
refs #6770
warner added a commit that referenced this issue Apr 26, 2023
This introduces four new pseudo-delivery events to the transcript:

* 'initialize-worker': a new empty worker is created
* 'load-snapshot': a worker is loaded from heap snapshot
* 'save-snapshot': we tell the worker to write a heap snapshot
* 'shutdown-worker': we stop the worker (e.g. during upgrade)

These events are not actually delivered to the worker: they are not
VatDeliveryObjects. However many of them are implemented with commands
to the worker (but not `deliver()` commands). The vat-warehouse
records these events in the transcript to help subsequent
manual/external replay tools know what happened. Without them, we'd
need to deduce e.g. the heap-snapshot writing schedule by counting
deliveries and comparing them against snapshot initial/interval.

The 'save-snapshot'/'load-snapshot' pair indicates what a replay would
do. It does not mean that the vat-warehouse actually tore down the old
worker and immediately replaced it with a new one (from snapshot). It
might choose to do that, or the worker itself might choose to replace
its XS engine instance with a fresh one, or it might keep using the
old engine. The 'save-snapshot' command has side-effects (it does a
forced GC), so it is important to keep track of when it happened.

As before, the transcript is broken up into "spans", delimited by heap
snapshots or upgrade-related shutdowns. To bring a worker up to date,
we want to start a worker (either a blank one, or from a snapshot),
and then replay the "current span".

With this change, the current span always starts either with
'initialize-worker' or with 'load-snapshot', telling us exactly what
needs to be done. The span then contains all the deliveries that must
be replayed. Old spans will end with `save-snapshot` or
`shutdown-worker`, but the current span will never include one of
those: the span is closed immediately after those events are
added. When the kernel replays a transcript to bring a worker up to
date, that replay will never see 'save-snapshot' or
'shutdown-worker'. But an external tool which replays a historical
span will see them at the end.

The `initialize-worker` event contains `workerOptions` (which includes
which type of worker is being used, as well as helper bundle IDs like
lockdown and supervisor), as well as the `source.bundleID` for the vat
bundle.

The `save-snapshot` event results contain the `snapshotID` hash that
was generated. The `load-snapshot` event includes the `snapshotID` in
a record that could be extended with additional details in the
future (like an xsnap version).

The types were improved to make `TranscriptDelivery` be a superset of
`VatDeliveryObject`. We also record TranscriptDeliveryResult, which is
currently a stripped down subset of VatDeliveryResult (just the "ok"
status), plus the save-snapshot hash. In the future, we'll probably
record the deterministic subset of metering results (computrons, maybe
something about memory allocation).

In the slog, the `heap-snapshot-save` event details now contain
`snapshotID` instead of `hash`, to be consistent.

Previously vat-warehouse used `lastVatID` to track which vat received
a delivery most recently, and `saveSnapshot()` used that to decide
which vat requires a snapshot. This commit changes that path to be
more explicit, and removes `lastVatID`.

refs #7199
refs #6770
warner added a commit that referenced this issue Apr 26, 2023
This introduces four new pseudo-delivery events to the transcript:

* 'initialize-worker': a new empty worker is created
* 'load-snapshot': a worker is loaded from heap snapshot
* 'save-snapshot': we tell the worker to write a heap snapshot
* 'shutdown-worker': we stop the worker (e.g. during upgrade)

These events are not actually delivered to the worker: they are not
VatDeliveryObjects. However many of them are implemented with commands
to the worker (but not `deliver()` commands). The vat-warehouse
records these events in the transcript to help subsequent
manual/external replay tools know what happened. Without them, we'd
need to deduce e.g. the heap-snapshot writing schedule by counting
deliveries and comparing them against snapshot initial/interval.

The 'save-snapshot'/'load-snapshot' pair indicates what a replay would
do. It does not mean that the vat-warehouse actually tore down the old
worker and immediately replaced it with a new one (from snapshot). It
might choose to do that, or the worker itself might choose to replace
its XS engine instance with a fresh one, or it might keep using the
old engine. The 'save-snapshot' command has side-effects (it does a
forced GC), so it is important to keep track of when it happened.

As before, the transcript is broken up into "spans", delimited by heap
snapshots or upgrade-related shutdowns. To bring a worker up to date,
we want to start a worker (either a blank one, or from a snapshot),
and then replay the "current span".

With this change, the current span always starts either with
'initialize-worker' or with 'load-snapshot', telling us exactly what
needs to be done. The span then contains all the deliveries that must
be replayed. Old spans will end with `save-snapshot` or
`shutdown-worker`, but the current span will never include one of
those: the span is closed immediately after those events are
added. When the kernel replays a transcript to bring a worker up to
date, that replay will never see 'save-snapshot' or
'shutdown-worker'. But an external tool which replays a historical
span will see them at the end.

The `initialize-worker` event contains `workerOptions` (which includes
which type of worker is being used, as well as helper bundle IDs like
lockdown and supervisor), as well as the `source.bundleID` for the vat
bundle.

The `save-snapshot` event results contain the `snapshotID` hash that
was generated. The `load-snapshot` event includes the `snapshotID` in
a record that could be extended with additional details in the
future (like an xsnap version).

The types were improved to make `TranscriptDelivery` be a superset of
`VatDeliveryObject`. We also record TranscriptDeliveryResult, which is
currently a stripped down subset of VatDeliveryResult (just the "ok"
status), plus the save-snapshot hash. In the future, we'll probably
record the deterministic subset of metering results (computrons, maybe
something about memory allocation).

In the slog, the `heap-snapshot-save` event details now contain
`snapshotID` instead of `hash`, to be consistent.

Previously vat-warehouse used `lastVatID` to track which vat received
a delivery most recently, and `saveSnapshot()` used that to decide
which vat requires a snapshot. This commit changes that path to be
more explicit, and removes `lastVatID`.

refs #7199
refs #6770
warner added a commit that referenced this issue Apr 26, 2023
This introduces four new pseudo-delivery events to the transcript:

* 'initialize-worker': a new empty worker is created
* 'load-snapshot': a worker is loaded from heap snapshot
* 'save-snapshot': we tell the worker to write a heap snapshot
* 'shutdown-worker': we stop the worker (e.g. during upgrade)

These events are not actually delivered to the worker: they are not
VatDeliveryObjects. However many of them are implemented with commands
to the worker (but not `deliver()` commands). The vat-warehouse
records these events in the transcript to help subsequent
manual/external replay tools know what happened. Without them, we'd
need to deduce e.g. the heap-snapshot writing schedule by counting
deliveries and comparing them against snapshot initial/interval.

The 'save-snapshot'/'load-snapshot' pair indicates what a replay would
do. It does not mean that the vat-warehouse actually tore down the old
worker and immediately replaced it with a new one (from snapshot). It
might choose to do that, or the worker itself might choose to replace
its XS engine instance with a fresh one, or it might keep using the
old engine. The 'save-snapshot' command has side-effects (it does a
forced GC), so it is important to keep track of when it happened.

As before, the transcript is broken up into "spans", delimited by heap
snapshots or upgrade-related shutdowns. To bring a worker up to date,
we want to start a worker (either a blank one, or from a snapshot),
and then replay the "current span".

With this change, the current span always starts either with
'initialize-worker' or with 'load-snapshot', telling us exactly what
needs to be done. The span then contains all the deliveries that must
be replayed. Old spans will end with `save-snapshot` or
`shutdown-worker`, but the current span will never include one of
those: the span is closed immediately after those events are
added. When the kernel replays a transcript to bring a worker up to
date, that replay will never see 'save-snapshot' or
'shutdown-worker'. But an external tool which replays a historical
span will see them at the end.

The `initialize-worker` event contains `workerOptions` (which includes
which type of worker is being used, as well as helper bundle IDs like
lockdown and supervisor), as well as the `source.bundleID` for the vat
bundle.

The `save-snapshot` event results contain the `snapshotID` hash that
was generated. The `load-snapshot` event includes the `snapshotID` in
a record that could be extended with additional details in the
future (like an xsnap version).

The types were improved to make `TranscriptDelivery` be a superset of
`VatDeliveryObject`. We also record TranscriptDeliveryResult, which is
currently a stripped down subset of VatDeliveryResult (just the "ok"
status), plus the save-snapshot hash. In the future, we'll probably
record the deterministic subset of metering results (computrons, maybe
something about memory allocation).

In the slog, the `heap-snapshot-save` event details now contain
`snapshotID` instead of `hash`, to be consistent.

Previously vat-warehouse used `lastVatID` to track which vat received
a delivery most recently, and `saveSnapshot()` used that to decide
which vat requires a snapshot. This commit changes that path to be
more explicit, and removes `lastVatID`.

refs #7199
refs #6770
warner added a commit that referenced this issue Apr 26, 2023
This introduces four new pseudo-delivery events to the transcript:

* 'initialize-worker': a new empty worker is created
* 'load-snapshot': a worker is loaded from heap snapshot
* 'save-snapshot': we tell the worker to write a heap snapshot
* 'shutdown-worker': we stop the worker (e.g. during upgrade)

These events are not actually delivered to the worker: they are not
VatDeliveryObjects. However many of them are implemented with commands
to the worker (but not `deliver()` commands). The vat-warehouse
records these events in the transcript to help subsequent
manual/external replay tools know what happened. Without them, we'd
need to deduce e.g. the heap-snapshot writing schedule by counting
deliveries and comparing them against snapshot initial/interval.

The 'save-snapshot'/'load-snapshot' pair indicates what a replay would
do. It does not mean that the vat-warehouse actually tore down the old
worker and immediately replaced it with a new one (from snapshot). It
might choose to do that, or the worker itself might choose to replace
its XS engine instance with a fresh one, or it might keep using the
old engine. The 'save-snapshot' command has side-effects (it does a
forced GC), so it is important to keep track of when it happened.

As before, the transcript is broken up into "spans", delimited by heap
snapshots or upgrade-related shutdowns. To bring a worker up to date,
we want to start a worker (either a blank one, or from a snapshot),
and then replay the "current span".

With this change, the current span always starts either with
'initialize-worker' or with 'load-snapshot', telling us exactly what
needs to be done. The span then contains all the deliveries that must
be replayed. Old spans will end with `save-snapshot` or
`shutdown-worker`, but the current span will never include one of
those: the span is closed immediately after those events are
added. When the kernel replays a transcript to bring a worker up to
date, that replay will never see 'save-snapshot' or
'shutdown-worker'. But an external tool which replays a historical
span will see them at the end.

The `initialize-worker` event contains `workerOptions` (which includes
which type of worker is being used, as well as helper bundle IDs like
lockdown and supervisor), as well as the `source.bundleID` for the vat
bundle.

The `save-snapshot` event results contain the `snapshotID` hash that
was generated. The `load-snapshot` event includes the `snapshotID` in
a record that could be extended with additional details in the
future (like an xsnap version).

The types were improved to make `TranscriptDelivery` be a superset of
`VatDeliveryObject`. We also record TranscriptDeliveryResult, which is
currently a stripped down subset of VatDeliveryResult (just the "ok"
status), plus the save-snapshot hash. In the future, we'll probably
record the deterministic subset of metering results (computrons, maybe
something about memory allocation).

In the slog, the `heap-snapshot-save` event details now contain
`snapshotID` instead of `hash`, to be consistent.

Previously vat-warehouse used `lastVatID` to track which vat received
a delivery most recently, and `saveSnapshot()` used that to decide
which vat requires a snapshot. This commit changes that path to be
more explicit, and removes `lastVatID`.

refs #7199
refs #6770
warner added a commit that referenced this issue Apr 26, 2023
This introduces four new pseudo-delivery events to the transcript:

* 'initialize-worker': a new empty worker is created
* 'load-snapshot': a worker is loaded from heap snapshot
* 'save-snapshot': we tell the worker to write a heap snapshot
* 'shutdown-worker': we stop the worker (e.g. during upgrade)

These events are not actually delivered to the worker: they are not
VatDeliveryObjects. However many of them are implemented with commands
to the worker (but not `deliver()` commands). The vat-warehouse
records these events in the transcript to help subsequent
manual/external replay tools know what happened. Without them, we'd
need to deduce e.g. the heap-snapshot writing schedule by counting
deliveries and comparing them against snapshot initial/interval.

The 'save-snapshot'/'load-snapshot' pair indicates what a replay would
do. It does not mean that the vat-warehouse actually tore down the old
worker and immediately replaced it with a new one (from snapshot). It
might choose to do that, or the worker itself might choose to replace
its XS engine instance with a fresh one, or it might keep using the
old engine. The 'save-snapshot' command has side-effects (it does a
forced GC), so it is important to keep track of when it happened.

As before, the transcript is broken up into "spans", delimited by heap
snapshots or upgrade-related shutdowns. To bring a worker up to date,
we want to start a worker (either a blank one, or from a snapshot),
and then replay the "current span".

With this change, the current span always starts either with
'initialize-worker' or with 'load-snapshot', telling us exactly what
needs to be done. The span then contains all the deliveries that must
be replayed. Old spans will end with `save-snapshot` or
`shutdown-worker`, but the current span will never include one of
those: the span is closed immediately after those events are
added. When the kernel replays a transcript to bring a worker up to
date, that replay will never see 'save-snapshot' or
'shutdown-worker'. But an external tool which replays a historical
span will see them at the end.

The `initialize-worker` event contains `workerOptions` (which includes
which type of worker is being used, as well as helper bundle IDs like
lockdown and supervisor), as well as the `source.bundleID` for the vat
bundle.

The `save-snapshot` event results contain the `snapshotID` hash that
was generated. The `load-snapshot` event includes the `snapshotID` in
a record that could be extended with additional details in the
future (like an xsnap version).

The types were improved to make `TranscriptDelivery` be a superset of
`VatDeliveryObject`. We also record TranscriptDeliveryResult, which is
currently a stripped down subset of VatDeliveryResult (just the "ok"
status), plus the save-snapshot hash. In the future, we'll probably
record the deterministic subset of metering results (computrons, maybe
something about memory allocation).

In the slog, the `heap-snapshot-save` event details now contain
`snapshotID` instead of `hash`, to be consistent.

Previously vat-warehouse used `lastVatID` to track which vat received
a delivery most recently, and `saveSnapshot()` used that to decide
which vat requires a snapshot. This commit changes that path to be
more explicit, and removes `lastVatID`.

refs #7199
refs #6770
warner added a commit that referenced this issue Apr 26, 2023
This introduces four new pseudo-delivery events to the transcript:

* 'initialize-worker': a new empty worker is created
* 'load-snapshot': a worker is loaded from heap snapshot
* 'save-snapshot': we tell the worker to write a heap snapshot
* 'shutdown-worker': we stop the worker (e.g. during upgrade)

These events are not actually delivered to the worker: they are not
VatDeliveryObjects. However many of them are implemented with commands
to the worker (but not `deliver()` commands). The vat-warehouse
records these events in the transcript to help subsequent
manual/external replay tools know what happened. Without them, we'd
need to deduce e.g. the heap-snapshot writing schedule by counting
deliveries and comparing them against snapshot initial/interval.

The 'save-snapshot'/'load-snapshot' pair indicates what a replay would
do. It does not mean that the vat-warehouse actually tore down the old
worker and immediately replaced it with a new one (from snapshot). It
might choose to do that, or the worker itself might choose to replace
its XS engine instance with a fresh one, or it might keep using the
old engine. The 'save-snapshot' command has side-effects (it does a
forced GC), so it is important to keep track of when it happened.

As before, the transcript is broken up into "spans", delimited by heap
snapshots or upgrade-related shutdowns. To bring a worker up to date,
we want to start a worker (either a blank one, or from a snapshot),
and then replay the "current span".

With this change, the current span always starts either with
'initialize-worker' or with 'load-snapshot', telling us exactly what
needs to be done. The span then contains all the deliveries that must
be replayed. Old spans will end with `save-snapshot` or
`shutdown-worker`, but the current span will never include one of
those: the span is closed immediately after those events are
added. When the kernel replays a transcript to bring a worker up to
date, that replay will never see 'save-snapshot' or
'shutdown-worker'. But an external tool which replays a historical
span will see them at the end.

The `initialize-worker` event contains `workerOptions` (which includes
which type of worker is being used, as well as helper bundle IDs like
lockdown and supervisor), as well as the `source.bundleID` for the vat
bundle.

The `save-snapshot` event results contain the `snapshotID` hash that
was generated. The `load-snapshot` event includes the `snapshotID` in
a record that could be extended with additional details in the
future (like an xsnap version).

The types were improved to make `TranscriptDelivery` be a superset of
`VatDeliveryObject`. We also record TranscriptDeliveryResult, which is
currently a stripped down subset of VatDeliveryResult (just the "ok"
status), plus the save-snapshot hash. In the future, we'll probably
record the deterministic subset of metering results (computrons, maybe
something about memory allocation).

In the slog, the `heap-snapshot-save` event details now contain
`snapshotID` instead of `hash`, to be consistent.

Previously vat-warehouse used `lastVatID` to track which vat received
a delivery most recently, and `saveSnapshot()` used that to decide
which vat requires a snapshot. This commit changes that path to be
more explicit, and removes `lastVatID`.

refs #7199
refs #6770
warner added a commit that referenced this issue Apr 26, 2023
This introduces four new pseudo-delivery events to the transcript:

* 'initialize-worker': a new empty worker is created
* 'load-snapshot': a worker is loaded from heap snapshot
* 'save-snapshot': we tell the worker to write a heap snapshot
* 'shutdown-worker': we stop the worker (e.g. during upgrade)

These events are not actually delivered to the worker: they are not
VatDeliveryObjects. However many of them are implemented with commands
to the worker (but not `deliver()` commands). The vat-warehouse
records these events in the transcript to help subsequent
manual/external replay tools know what happened. Without them, we'd
need to deduce e.g. the heap-snapshot writing schedule by counting
deliveries and comparing them against snapshot initial/interval.

The 'save-snapshot'/'load-snapshot' pair indicates what a replay would
do. It does not mean that the vat-warehouse actually tore down the old
worker and immediately replaced it with a new one (from snapshot). It
might choose to do that, or the worker itself might choose to replace
its XS engine instance with a fresh one, or it might keep using the
old engine. The 'save-snapshot' command has side-effects (it does a
forced GC), so it is important to keep track of when it happened.

As before, the transcript is broken up into "spans", delimited by heap
snapshots or upgrade-related shutdowns. To bring a worker up to date,
we want to start a worker (either a blank one, or from a snapshot),
and then replay the "current span".

With this change, the current span always starts either with
'initialize-worker' or with 'load-snapshot', telling us exactly what
needs to be done. The span then contains all the deliveries that must
be replayed. Old spans will end with `save-snapshot` or
`shutdown-worker`, but the current span will never include one of
those: the span is closed immediately after those events are
added. When the kernel replays a transcript to bring a worker up to
date, that replay will never see 'save-snapshot' or
'shutdown-worker'. But an external tool which replays a historical
span will see them at the end.

The `initialize-worker` event contains `workerOptions` (which includes
which type of worker is being used, as well as helper bundle IDs like
lockdown and supervisor), as well as the `source.bundleID` for the vat
bundle.

The `save-snapshot` event results contain the `snapshotID` hash that
was generated. The `load-snapshot` event includes the `snapshotID` in
a record that could be extended with additional details in the
future (like an xsnap version).

The types were improved to make `TranscriptDelivery` be a superset of
`VatDeliveryObject`. We also record TranscriptDeliveryResult, which is
currently a stripped down subset of VatDeliveryResult (just the "ok"
status), plus the save-snapshot hash. In the future, we'll probably
record the deterministic subset of metering results (computrons, maybe
something about memory allocation).

In the slog, the `heap-snapshot-save` event details now contain
`snapshotID` instead of `hash`, to be consistent.

Previously vat-warehouse used `lastVatID` to track which vat received
a delivery most recently, and `saveSnapshot()` used that to decide
which vat requires a snapshot. This commit changes that path to be
more explicit, and removes `lastVatID`.

refs #7199
refs #6770
warner added a commit that referenced this issue Apr 26, 2023
This introduces four new pseudo-delivery events to the transcript:

* 'initialize-worker': a new empty worker is created
* 'load-snapshot': a worker is loaded from heap snapshot
* 'save-snapshot': we tell the worker to write a heap snapshot
* 'shutdown-worker': we stop the worker (e.g. during upgrade)

These events are not actually delivered to the worker: they are not
VatDeliveryObjects. However many of them are implemented with commands
to the worker (but not `deliver()` commands). The vat-warehouse
records these events in the transcript to help subsequent
manual/external replay tools know what happened. Without them, we'd
need to deduce e.g. the heap-snapshot writing schedule by counting
deliveries and comparing them against snapshot initial/interval.

The 'save-snapshot'/'load-snapshot' pair indicates what a replay would
do. It does not mean that the vat-warehouse actually tore down the old
worker and immediately replaced it with a new one (from snapshot). It
might choose to do that, or the worker itself might choose to replace
its XS engine instance with a fresh one, or it might keep using the
old engine. The 'save-snapshot' command has side-effects (it does a
forced GC), so it is important to keep track of when it happened.

As before, the transcript is broken up into "spans", delimited by heap
snapshots or upgrade-related shutdowns. To bring a worker up to date,
we want to start a worker (either a blank one, or from a snapshot),
and then replay the "current span".

With this change, the current span always starts either with
'initialize-worker' or with 'load-snapshot', telling us exactly what
needs to be done. The span then contains all the deliveries that must
be replayed. Old spans will end with `save-snapshot` or
`shutdown-worker`, but the current span will never include one of
those: the span is closed immediately after those events are
added. When the kernel replays a transcript to bring a worker up to
date, that replay will never see 'save-snapshot' or
'shutdown-worker'. But an external tool which replays a historical
span will see them at the end.

The `initialize-worker` event contains `workerOptions` (which includes
which type of worker is being used, as well as helper bundle IDs like
lockdown and supervisor), as well as the `source.bundleID` for the vat
bundle.

The `save-snapshot` event results contain the `snapshotID` hash that
was generated. The `load-snapshot` event includes the `snapshotID` in
a record that could be extended with additional details in the
future (like an xsnap version).

The types were improved to make `TranscriptDelivery` be a superset of
`VatDeliveryObject`. We also record TranscriptDeliveryResult, which is
currently a stripped down subset of VatDeliveryResult (just the "ok"
status), plus the save-snapshot hash. In the future, we'll probably
record the deterministic subset of metering results (computrons, maybe
something about memory allocation).

In the slog, the `heap-snapshot-save` event details now contain
`snapshotID` instead of `hash`, to be consistent.

Previously vat-warehouse used `lastVatID` to track which vat received
a delivery most recently, and `saveSnapshot()` used that to decide
which vat requires a snapshot. This commit changes that path to be
more explicit, and removes `lastVatID`.

refs #7199
refs #6770
warner added a commit that referenced this issue Apr 27, 2023
This introduces four new pseudo-delivery events to the transcript:

* 'initialize-worker': a new empty worker is created
* 'load-snapshot': a worker is loaded from heap snapshot
* 'save-snapshot': we tell the worker to write a heap snapshot
* 'shutdown-worker': we stop the worker (e.g. during upgrade)

These events are not actually delivered to the worker: they are not
VatDeliveryObjects. However many of them are implemented with commands
to the worker (but not `deliver()` commands). The vat-warehouse
records these events in the transcript to help subsequent
manual/external replay tools know what happened. Without them, we'd
need to deduce e.g. the heap-snapshot writing schedule by counting
deliveries and comparing them against snapshot initial/interval.

The 'save-snapshot'/'load-snapshot' pair indicates what a replay would
do. It does not mean that the vat-warehouse actually tore down the old
worker and immediately replaced it with a new one (from snapshot). It
might choose to do that, or the worker itself might choose to replace
its XS engine instance with a fresh one, or it might keep using the
old engine. The 'save-snapshot' command has side-effects (it does a
forced GC), so it is important to keep track of when it happened.

As before, the transcript is broken up into "spans", delimited by heap
snapshots or upgrade-related shutdowns. To bring a worker up to date,
we want to start a worker (either a blank one, or from a snapshot),
and then replay the "current span".

With this change, the current span always starts either with
'initialize-worker' or with 'load-snapshot', telling us exactly what
needs to be done. The span then contains all the deliveries that must
be replayed. Old spans will end with `save-snapshot` or
`shutdown-worker`, but the current span will never include one of
those: the span is closed immediately after those events are
added. When the kernel replays a transcript to bring a worker up to
date, that replay will never see 'save-snapshot' or
'shutdown-worker'. But an external tool which replays a historical
span will see them at the end.

The `initialize-worker` event contains `workerOptions` (which includes
which type of worker is being used, as well as helper bundle IDs like
lockdown and supervisor), as well as the `source.bundleID` for the vat
bundle.

The `save-snapshot` event results contain the `snapshotID` hash that
was generated. The `load-snapshot` event includes the `snapshotID` in
a record that could be extended with additional details in the
future (like an xsnap version).

The types were improved to make `TranscriptDelivery` be a superset of
`VatDeliveryObject`. We also record TranscriptDeliveryResult, which is
currently a stripped down subset of VatDeliveryResult (just the "ok"
status), plus the save-snapshot hash. In the future, we'll probably
record the deterministic subset of metering results (computrons, maybe
something about memory allocation).

In the slog, the `heap-snapshot-save` event details now contain
`snapshotID` instead of `hash`, to be consistent.

Previously vat-warehouse used `lastVatID` to track which vat received
a delivery most recently, and `saveSnapshot()` used that to decide
which vat requires a snapshot. This commit changes that path to be
more explicit, and removes `lastVatID`.

refs #7199
refs #6770
warner added a commit that referenced this issue Apr 27, 2023
This introduces four new pseudo-delivery events to the transcript:

* 'initialize-worker': a new empty worker is created
* 'load-snapshot': a worker is loaded from heap snapshot
* 'save-snapshot': we tell the worker to write a heap snapshot
* 'shutdown-worker': we stop the worker (e.g. during upgrade)

These events are not actually delivered to the worker: they are not
VatDeliveryObjects. However many of them are implemented with commands
to the worker (but not `deliver()` commands). The vat-warehouse
records these events in the transcript to help subsequent
manual/external replay tools know what happened. Without them, we'd
need to deduce e.g. the heap-snapshot writing schedule by counting
deliveries and comparing them against snapshot initial/interval.

The 'save-snapshot'/'load-snapshot' pair indicates what a replay would
do. It does not mean that the vat-warehouse actually tore down the old
worker and immediately replaced it with a new one (from snapshot). It
might choose to do that, or the worker itself might choose to replace
its XS engine instance with a fresh one, or it might keep using the
old engine. The 'save-snapshot' command has side-effects (it does a
forced GC), so it is important to keep track of when it happened.

As before, the transcript is broken up into "spans", delimited by heap
snapshots or upgrade-related shutdowns. To bring a worker up to date,
we want to start a worker (either a blank one, or from a snapshot),
and then replay the "current span".

With this change, the current span always starts either with
'initialize-worker' or with 'load-snapshot', telling us exactly what
needs to be done. The span then contains all the deliveries that must
be replayed. Old spans will end with `save-snapshot` or
`shutdown-worker`, but the current span will never include one of
those: the span is closed immediately after those events are
added. When the kernel replays a transcript to bring a worker up to
date, that replay will never see 'save-snapshot' or
'shutdown-worker'. But an external tool which replays a historical
span will see them at the end.

The `initialize-worker` event contains `workerOptions` (which includes
which type of worker is being used, as well as helper bundle IDs like
lockdown and supervisor), as well as the `source.bundleID` for the vat
bundle.

The `save-snapshot` event results contain the `snapshotID` hash that
was generated. The `load-snapshot` event includes the `snapshotID` in
a record that could be extended with additional details in the
future (like an xsnap version).

The types were improved to make `TranscriptDelivery` be a superset of
`VatDeliveryObject`. We also record TranscriptDeliveryResult, which is
currently a stripped down subset of VatDeliveryResult (just the "ok"
status), plus the save-snapshot hash. In the future, we'll probably
record the deterministic subset of metering results (computrons, maybe
something about memory allocation).

In the slog, the `heap-snapshot-save` event details now contain
`snapshotID` instead of `hash`, to be consistent.

Previously vat-warehouse used `lastVatID` to track which vat received
a delivery most recently, and `saveSnapshot()` used that to decide
which vat requires a snapshot. This commit changes that path to be
more explicit, and removes `lastVatID`.

refs #7199
refs #6770
@ivanlei ivanlei added vaults_triage DO NOT USE and removed vaults_triage DO NOT USE labels Apr 27, 2023
@ivanlei ivanlei added this to the Vaults EVP milestone Apr 27, 2023
@warner
Copy link
Member

warner commented Apr 28, 2023

In #7484 we added the initialize-worker pseudo-event to the transcript, which includes the source.bundleID (i.e. bundle hash), and the complete workerOptions (both type: 'xsnap' and the helper bundleIDs for lockdown/supervisor).

#7506 added incarnation to both the transcript-span records and the transcript items themselves. There's no swingstore API for using them, but it would be trivial to SELECT * FROM transcriptItems WHERE vatID=? AND incarnation=?.

We did not add the incarnation ID to the snapstore. The approach that fits with my plan for transcripts is more like:

  • first = SELECT * FROM transcriptItems WHERE vatID=? AND incarnation=?
  • const { d } = JSON.parse(first)
  • const [ type, loadConfig ] = d
  • assert.equal(type, 'load-snapshot')
  • const { snapshotID } = loadConfig

We still record the snapPos with each snapshot row, so an alternative approach is to SELECT startPos FROM transcriptSpans WHERE .., compute snapPos = startPos - 1, and then SELECT * FROM snapshots WHERE vatID=? AND snapPos=?.

Given that, I'm declaring victory on this ticket.

@warner warner closed this as completed Apr 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request SwingSet package: SwingSet vaults_triage DO NOT USE
Projects
None yet
Development

No branches or pull requests

3 participants