pvf: Update docs for PVF artifacts #6551

mrcnski · 2023-01-13T14:21:02Z

Closes #3581 (no fix needed there, see comments. Just updating the docs).

burdges · 2023-01-13T15:50:03Z

node/core/pvf/src/artifacts.rs

+//!
+//!	# Lifecycle of an artifact
+//!
+//! 1. During node start-up, the artifacts cache is cleaned up.


Do we need to re-prepare all PVF artifacts on each node restart? Or does "clean up" mean the artifacts' build host version matches the current host version?

Yes, all local artifacts are cleared - we'll re-prepare the PVF the first time a new execute request comes in. I'll update the doc to make it more clear.

We should imho avoid clearing the artifacts cache when the host did not change. We could recompile only the parachain blocks when the host version did change, then lazily recompile the parathread ones.

Replying here to keep the threads focused. Yeah, thinking about it, I actually don't see why we want to clear the artifacts cache. On start-up, we could instead re-populate the Artifacts table from the compiled artifacts on disk -- the PVF hash should already be in the filename -- and re-start the 24-hour TTL timers for each artifact. Or we could even use the system's last-modified/accessed metadata for the files (with some sanity checks). Then instead of lazily re-compiling the PVFs, we would lazily delete the ones we end up not needing, which seems a lot more efficient. 🙂

@s0me0ne-unkn0wn Would this need coordination with your execution environment PR?

Ah, I didn't see the last comment yet and continued in that thread. Yes, I forgot about 24h timer, it solves the problem. Do I understand correctly that the proposal is to keep artifacts only if the node is not upgraded? In that case, it might work. But we should always purge the artifacts if the node is upgraded.

Would this need coordination with your execution environment PR?

Special coordination is not needed, I'm already used to merging master to that branch three times a week 😅

Ah, I didn't see the last comment yet and continued in that thread. Yes, I forgot about 24h timer, it solves the problem. Do I understand correctly that the proposal is to keep artifacts only if the node is not upgraded? In that case, it might work. But we should always purge the artifacts if the node is upgraded.

Yes, makes sense to me!

Raised https://github.com/paritytech/polkadot/issues/6940 to continue this discussion.

burdges · 2023-01-13T15:55:13Z

node/core/pvf/src/artifacts.rs

+//!    1. If the artifact is already being processed, we add another execution request to the
+//!       existing preparation job, without starting a new one.
+//!
+//!    2. Note that if the state is [`ArtifactState::FailedToProcess`], we usually do not retry


We do not currently have any "I'm taking a long time" messages, so if we send out approval assignments but do artifact builds lazily, then we'll cause no shows, given that builds can take more than the 12? second no show time out.

In theory, we could send messages for "building artifact" and/or "It's slow but I'm here", but @rphmeier wanted to avoid complicating the approval process with such messages, probably a wise decision. We therefore need PVF artifacts to be built in advance, or else we suck up the risk of correlated artifact builds creating de fact escalations.

Indeed, I was wondering about this a couple of times already myself. I think for the time being, preparation is usually pretty fast so there are no issues.

The problem with preparation in advance is, that this will likely result in wasted effort in case of parathreads. As all validators would need to prepare a PVF, although only 30 approval checkers will actually need it. Might be fine.

Other options:

Gather some data on actual preparation times - if they turn out to be very low, we could just enforce a reasonably low value in pre-checking.

Interpret parathread PVFs instead of compiling. Given that they are expected to be executed only rarely, the compiliation effort is likely not worth the trouble anyway. (And for parachains prepare in advance)

We should imho avoid clearing the artifacts cache when the host did not change. We could recompile only the parachain blocks when the host version did change, then lazily recompile the parathread ones.

We should've timings of course, but we'll never stop people building wasm blobs that screw up build times intentionally.

Interpreting kinda works. We have consensus upon who gets compiled vs interpreted, so interpreted then runs with different approval time parameters. We could similarly adjust approval parameters to include recompiling parathreads each block. This makes parathreads more expensive and second class though. We could've parathreads that "buy" being compiled in advance like parachains.

We do still have everyone compile the parathread when the PVF initially gets uploaded though, yes? I'd think this suggests parathreads and parachains should be all be precompiled, which just makes uploading a PVF more expensive. Implicitly then host upgrades become relatively more expensive, but this makes sense too.

We should imho avoid clearing the artifacts cache when the host did not change.

We'd need an intelligent garbage collector, then. Imagine the node is restarted and has a hundred artifacts in the cache. How do we know which ones we will use and which are stale?

We should've timings of course, but we'll never stop people building wasm blobs that screw up build times intentionally.

Why? They would not pass the pre-checking phase, assuming:

Gather some data on actual preparation times - if they turn out to be very low, we could just enforce a reasonably low value in pre-checking.

We'd need to garbage collect PVFs when parachains deregister or when the PVF gets superseded by later PVFs. We could do pre-checking after a PVF upload gets finalized, so then we avoid fork concerns and each parachan has at most two PVFs in the cache. We'd loose the ability to upgrade parachain PVFs when finality stalls though, so system parachain could require some escape hatch here.

As always we pay for optimizations with complexity. Ain't clear how far this should go right now of course. We could stick with the current proposal for now, but make an issue for smarter PVF garbage collection in future.

Why? They would not pass the pre-checking phase, assuming:

It'll be possible to pass the pre-checking but be quite slow compared with average PVF builds.

It'll occasionally be possible to pass the pre-checking on one host, but be abysmal on some host upgrade in the pipeline. We could imho ignore this risk though, so yeah maybe you're right..

As always we pay for optimizations with complexity. Ain't clear how far this should go right now of course. We could stick with the current proposal for now, but make an issue for smarter PVF garbage collection in future.

That does sound like it would introduce some complexity. I guess my question would be, is it really necessary? How much disk space does each compiled PVF actually require? How bad is it to keep old artifacts around? So far, it seems that we have not had issues with the 24-hour TTL of artifacts AFAIK, so my grug brain thinks that we shouldn't introduce unnecessary optimizations. For parathreads I could see the artifacts needing to stay around for longer - but in that case I would just have a longer TTL for those, and not worry about the extra used disk space. 😛

All this complexity comes from one scenario: An adversary can create many relay chain forks, so they can upload one artifact on each fork. We support all forks until we know which fork survives.

We've rough consensus on artifact age so we could use duration as a proxy for finality though: We retain all artifacts compatible with the current host, so long as either the artifact is active on some relay chain fork, or else the artifact is less than 24 hours old.

We add some abandon artifact call for artifacts uploaded but not activated, or else force activation at some block height, or something like that.

It's messy to create many relay chain forks without equivocation, so the attack might already result in slashing, which maybe suffices. If you've a run of blocks, then you could've some forks without equivocations, but not too many.

It's maybe just easier to wait for finality and have some override for system parachains, not sure.

Raised https://github.com/paritytech/polkadot/issues/6941 to continue this discussion.

burdges · 2023-01-13T15:56:38Z

Above my comments are not critiques of the text of course, but this is a convenient place to have a discussion that clarifies some design points, and maybe that discussion will be useful here.

mrcnski · 2023-03-23T12:32:11Z

Since the discussions here seem to have stalled, I have raised a couple of issues to unblock this PR.

This is a small PR that aims to document the current state of affairs, so discussion about future design can happen in those issues.

mrcnski · 2023-04-18T15:21:32Z

Bump. :)

mrcnski · 2023-04-19T16:07:52Z

bot merge

* master: (30 commits) update rocksdb to 0.20.1 (#7113) Reduce base proof size weight component to zero (#7081) PVF: Move PVF workers into separate crate (#7101) Companion for #13923 (#7111) update safe call filter (#7080) PVF: Don't dispute on missing artifact (#7011) XCM: Properly set the pricing for the DMP router (#6843) pvf: Update docs for PVF artifacts (#6551) Bump syn from 2.0.14 to 2.0.15 (#7093) Companion for substrate#13771 (#6983) Added Dwellir Nigeria bootnodes. (#7097) Companion for Substrate #13889 (#7063) Switch to DNS name based bootnodes for Rococo (#7040) companion for substrate#13883 (#7059) [xcm] Added `UnpaidExecution` instruction to `UnpaidRemoteExporter` (#7091) Bump serde_json from 1.0.85 to 1.0.96 (#7072) Bump hex-literal from 0.3.4 to 0.4.1 (#7071) Small simplification (#7089) [XCM - UnpaidRemoteExporter] Remove unreachable code (#7088) sync versions with current release (#7083) ...

* master: (39 commits) malus: dont panic on missing validation data (#6952) Offences Migration v1: Removes `ReportsByKindIndex` (#7114) Fix stalling dispute coordinator. (#7125) Fix rolling session window (#7126) [ci] Update buildah command and version (#7128) Bump assigned_slots params (#6991) XCM: Remote account converter (#6662) Rework `dispute-coordinator` to use `RuntimeInfo` for obtaining session information instead of `RollingSessionWindow` (#6968) Revert default proof size back to 64 KB (#7115) update rocksdb to 0.20.1 (#7113) Reduce base proof size weight component to zero (#7081) PVF: Move PVF workers into separate crate (#7101) Companion for #13923 (#7111) update safe call filter (#7080) PVF: Don't dispute on missing artifact (#7011) XCM: Properly set the pricing for the DMP router (#6843) pvf: Update docs for PVF artifacts (#6551) Bump syn from 2.0.14 to 2.0.15 (#7093) Companion for substrate#13771 (#6983) Added Dwellir Nigeria bootnodes. (#7097) ...

pvf: Update docs for PVF artifacts

6c645c6

mrcnski requested review from pepyakin, slumber and s0me0ne-unkn0wn January 13, 2023 14:21

github-actions bot added the A0-pleasereview label Jan 13, 2023

burdges reviewed Jan 13, 2023

View reviewed changes

pvf: Clarify doc re. node start-up

f8c94e8

the-right-joyce removed the A0-pleasereview label Jan 20, 2023

mrcnski mentioned this pull request Mar 3, 2023

Preparation: Do not announce an approval assignment if the code is not prepared paritytech/polkadot-sdk#910

Open

This was referenced Aug 24, 2023

PVF: Avoid clearing the artifacts cache on restart paritytech/polkadot-sdk#685

Closed

PVF: Preparation in advance paritytech/polkadot-sdk#684

Closed

Merge branch 'master' into mrcnski/pvf-update-artifact-docs

adab969

mrcnski requested review from burdges and eskimor and removed request for burdges March 23, 2023 12:32

eskimor approved these changes Apr 19, 2023

View reviewed changes

slumber approved these changes Apr 19, 2023

View reviewed changes

paritytech-processbot bot merged commit 5fd2bf8 into master Apr 19, 2023

paritytech-processbot bot deleted the mrcnski/pvf-update-artifact-docs branch April 19, 2023 16:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pvf: Update docs for PVF artifacts #6551

pvf: Update docs for PVF artifacts #6551

mrcnski commented Jan 13, 2023

burdges Jan 13, 2023

mrcnski Jan 13, 2023

mrcnski Jan 13, 2023

s0me0ne-unkn0wn Jan 13, 2023

s0me0ne-unkn0wn Jan 13, 2023 •

edited

Loading

mrcnski Jan 13, 2023

mrcnski Mar 23, 2023

burdges Jan 13, 2023

eskimor Jan 13, 2023 •

edited

Loading

burdges Jan 13, 2023 •

edited

Loading

s0me0ne-unkn0wn Jan 13, 2023

eskimor Jan 13, 2023

burdges Jan 13, 2023

mrcnski Jan 13, 2023 •

edited

Loading

burdges Jan 13, 2023

mrcnski Mar 23, 2023

burdges commented Jan 13, 2023

mrcnski commented Mar 23, 2023

mrcnski commented Apr 18, 2023

mrcnski commented Apr 19, 2023

pvf: Update docs for PVF artifacts #6551

pvf: Update docs for PVF artifacts #6551

Conversation

mrcnski commented Jan 13, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

s0me0ne-unkn0wn Jan 13, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eskimor Jan 13, 2023 • edited Loading

Choose a reason for hiding this comment

burdges Jan 13, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mrcnski Jan 13, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

burdges commented Jan 13, 2023

mrcnski commented Mar 23, 2023

mrcnski commented Apr 18, 2023

mrcnski commented Apr 19, 2023

s0me0ne-unkn0wn Jan 13, 2023 •

edited

Loading

eskimor Jan 13, 2023 •

edited

Loading

burdges Jan 13, 2023 •

edited

Loading

mrcnski Jan 13, 2023 •

edited

Loading