Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parathreads: Take II #828

Open
rphmeier opened this issue May 11, 2022 · 24 comments
Open

Parathreads: Take II #828

rphmeier opened this issue May 11, 2022 · 24 comments
Assignees
Labels
I6-meta A specific issue for grouping tasks or bugs of a specific category.

Comments

@rphmeier
Copy link
Contributor

rphmeier commented May 11, 2022

Based on the original work done by @gavofyork in 2019: #341

Board: https://github.com/orgs/paritytech/projects/67
Feature branch: paritytech/polkadot#6969

Background and Motivation

Polkadot currently only supports parachains, running leases of between 6-24 months. Kusama leases run from either 6 weeks to 48 weeks. If a chain has a slot and loses one, it simply stops running. If a project can't afford a slot due to limited supply or intense competition, it simply doesn't get off the ground.

Parathreads are pay-as-you-go parachains: they pay for security on a block-by-block basis as opposed to leasing an entire core for a prolonged period of time. Parathreads only differ from parachains in terms of how they are scheduled, collated, and backed. Availability, approval-checking, and disputes all function the exact same way.

Parathreads provide an on-ramp and off-ramp for projects, as well as the option for chains which produce blocks only occasionally. They provide another class of offering in the market for shared security which will lead to better pricing of security for longer leases as well.

The idea of parathreads is to allocate a number of parachain availability cores specifically to parathreads, and these parathread cores multiplex blocks from a backing queue of 'claims', which represent upcoming parathread blocks. The queue of claims is populated by an auction process which runs in every or almost every relay chain block, allowing collators to bid in the relay chain's native token in order to earn a claim. Claims are processed in order of submission, or close to it. Each claim is associated with the specific collator.

The parachain scheduler should attempt to schedule pending parathread claims onto available parathread cores. Once scheduled, parachain backers should connect to the specific collator mentioned in the claim and get the candidate and PoV from them.

Design Considerations

Spam-resistant Auctions: Bids from collators who don't win auctions should appear on-chain in some form so that useless bids still have some cost associated with them, meaning that spam isn't free.

Stacked Claims: It should be possible for multiple claims from a single parachain to be present, and for some kind of dependency relationship to be expressed between them. This will allow parathreads to have fast blocks during bursty periods if they so desire.

Data Unavailability: Polkadot only keeps data available for 24 hours. This means that it is possible that the data proving a prior block might be lost if the last block was produced more than 24 hours ago. In the past, we've discussed forcing parathreads to author blocks at least once every 24 hours. This probably isn't necessary - parathread clients should just make sure that they're fully syncing within 24 hours of the last block and gathering data from the data availability system if necessary.

Wasm Execution Risks: related to #990 ; if there are underlying issues with PVF execution that we aren't aware of, it'll be far cheaper and easier to exploit them on parathreads than it is on parachains.

Asynchronous Backing: We should expect parathread blocks to be built 12-30s ahead of time and accordingly for validators to know that the parathread will or might be scheduled 12-30s ahead of time. The claim should probably not commit to the candidate hash, or if it does, claims will have to be retired across sessions.

Leniency and Censorship: When a claim is scheduled, it's possible that it's either not fulfilled because of the collator not producing a block or because the backers haven't managed to back the block in time. Claims should stay scheduled for a few relay chain blocks, and potentially should be scheduled onto cores where other backing groups are assigned.

Runtime

Claims queue: Manages all pending parathread claims.

Auction Mechanism: for pushing onto the claims queue

Runtime API: for informing validators of scheduled or soon-to-be-scheduled parathreads

Node-side

Collator networking: Connecting to the specific collator and/or accepting connections from specific collators
Prospective parachains: detecting scheduled parathreads as well as parachains

Collator-side and Parachain Side

Auction Bidder: logic to control a hot wallet to participate in auctions under configurable conditions. We should expect some sidecar process to run alongside the node and take into account information that's not easily generalized: for example, users might create strategies incorporating the relay-chain token price, the parachain token price, unconfirmed transaction rewards, and the block reward mechanism of the parathread.
Parathread Consensus: logic to detect when a block should be authored based on the state of the claims queue / scheduled cores

@Polkadot-Forum
Copy link

This issue has been mentioned on Polkadot Forum. There might be relevant details there:

https://forum.polkadot.network/t/parachain-scaling-by-parablock-splitting/341/5

@eskimor
Copy link
Member

eskimor commented Nov 15, 2022

Summary of our discussion yesterday:

  • We need some optimal pricing controller: There will be a minimum price, so we have a healthy market even if we have very low demand in the beginning.
  • Some queue management
  • We will likely register the CollatorId and then only allow that collator to connect - good DoS protection for parathreads.
  • For registering the needed PVF, a deposit has to be provided. The opportunity costs pay for storage and messaging.

How do we register a claim?

There are two possibilities:

  1. On chain, via some extrinsic/transaction.
  2. The collator could claim directly via the collator protocol. Basically the collation advertisement would include a fee to pay for the block.

Option 1 has a longer delay between claiming and actually using the slot, but is potentially easier to implement.

Option 2 is actually quite intriguing at first sight: It would behave rather similar to how normal transactions are processed: You send it to a validator, who puts it in a mempool and then picks what to validate, based on price. But:

  • DoS protection becomes harder, no simple checking of collator id would be possible, except if we registered CollatorIds together with the PVF. Also the collator protocol would need to change significantly and would differ for parathreads and parachains, while with option 1, changes to the collator protocol will be fairly minimal.
  • Validator groups rotate, hence if your advertisement is not processed within the rotation time, you would need to resend it to the next group. Probably not a big deal though.
  • Actually needed price is not known in advance, more complication for the Cumulus side - see below.
  • The "included" fee, would also need to go to the relay chain in some way. Most likely as a separate transaction that needs to be sent, similar to variant 1 or we would adjust backing statements to include it. The separate transaction is risky for the validator and racy. If the transaction is not recorded before the backing statement, the chain would have to treat the backing statements as invalid. So it kind of has to go into the backing statement, which is another protocol change.

Hence from this high level perspective it seems that variant 1 will be way simpler to implement.

Cumulus integration

For option 1 above, we would provide some transaction the relay chain accepts to claim a slot. We should also provide an interface for submitting said transaction from Cumulus. Then based on that we will need to cooperate with SDK node team for implementing actual strategies for using it. E.g. claim once per day, claim once mempool is filled up to xx%, ... etc. Something ready to use for parathread developers.

For option 2 above, the actual pricing would also need to be determined on the Cumulus side, e.g.: No rush in authoring a block, start with a low fee, and ramp it up slowly on each validator group rotation until the block gets in. By monitoring the relay chain, the collator might also have an idea on current pricing.

@bkchr
Copy link
Member

bkchr commented Nov 15, 2022

  • For registering the needed PVF, a deposit has to be provided. The opportunity costs pay for storage and messaging.

We already require a deposit or you want to increase the deposit?

Option 1 has a longer delay between claiming and actually using the slot, but is potentially easier to implement.

From what you have written, I also like this solution. We could integrate some sort of "min block on when to start bidding for a slot". After winning a slot, your slot would be X blocks in the future. I don't think that we need such a tight time between bidding for a slot and then actually building on this slot. If you have some application that needs to hit a very special slot, you could either bid a lot of dots or you could also just run a parachain and then being fully in control over your block production.

@eskimor
Copy link
Member

eskimor commented Nov 16, 2022

We already require a deposit or you want to increase the deposit?

No, just listing requirements.

For option 1, yep but no matter how much you bid you will have to wait at least a slot worth of time. We are not really concerned about this at this point, it is just a matter of fact and something to consider when picking options. Favorite so far is clearly option 1. A parathread should be fine to wait a bit for its slot.

@eskimor
Copy link
Member

eskimor commented Dec 22, 2022

Some more thoughts on asynchronous backing:

As already mentioned by @rphmeier we will need a runtime API that allows us to see into the future what parathreads will be scheduled on a core in the next blocks ahead. This is because also parathreads should be backable asynchronous, but this means they need to be able to provide a collation ahead of time.

This implies that the collator needs to know in advance that a core for its parathread is upcoming and the validators need to know, so they will be willing to accept the collation.

Flow will be something like this:

  1. Collator for parathread A sees that it is scheduled on some core in X blocks ahead, where X is within the constraints of the maximum allowed depth. It produces a collation with the currently available relay parent.
  2. Backing group will accept that collation, because they know that this parathread is upcoming.
  3. Collator for parathread B sees that it is scheduled on the same core in Y blocks ahead (Y constrained equally as X above), produces a collation.
  4. Same as 2, but for parathread B.
    ... possibly more parathreads.

Now we have both those parathread candidates available in prospective parachains.

  1. Relay chain block with a core sceduled for parachain B is produced. Block producer puts statements from prospective parachains on chain.
  2. Same for parachain A.

Consequences

  1. The core assignment in the relay parent is now nothing more than one of x to be accepted assignments. (The oldest one).
    -> Probably makes sense to unify with the lookahead, nothing really special about the "current" core assignment.
  2. Validators have to accept connections and collations from different parathreads all with the same relay parent.

Something to understand here, in general with asynchronous backing but here again relevant: There is the relay parent as specified by a candidate and there is the relay chain block where its relay chain child (next block) will have the candidate backed on chain. Those are different in asynchronous backing, which means:

  1. Backing group assignment has to be based on the relay parent as referenced by the candidate.
  2. Core -> ParaId assignment is based on the information of the parent block of the relay chain block being produced.

2 explained in more detail: When a block producer wants to build a new relay chain block based on some block Z, then this block is relevant to obtaining which statements of what parathread/parachain to put into the block. The relay parent of the candidates is irrelevant.

What makes this even more intertwined, is that the backing group to connect to is based on the core. So we are connecting to a particular backing group because as of our relay parent our ParaId is currently assigned to that core, even though it is possible that by the time our candidate gets actually backed on chain on that very core, some other group might already be assigned to that core. The reason is simple: We cannot reliably know when the candidate will actually get backed, so we have to use the core assignment based on the relay parent as that is fixed.

So the correspondence CoreId -> ParaId and CoreId -> Backing Group is evaluated in the context of different relay chain blocks for a single candidate.

Possible Generalizations

This lookahead is mandatory for parathreads, but should be a general method even for parachains. This allows for greater flexibility for parachains as well and given that the core->paraid assignment is per block, it would also be more correct.

The lookahead should likely be used in the runtime itself as well, basically any parathread in the current lookahead view will be accepted (not only one). This would allow for better utilization: If a validator wants to produce a block, but it has not yet received enough statements for parathread A, it can just put statements for another parathread B in. So we basically would have a lookahead queue, where we prefer candiates at the top, but fallback to later candidates if available. If we fallback, we could leave the skipped candidate in the queue, but keep a record that it was skipped. If it was skipped/not provided n times*) we remove it and just charge some fee for the failed attempt (see below).

So in a nutshell, instead of having a ParaId assigned to a core as of a given relay chain state, we have a queue of ParaIds and any of them is acceptable, but validators should prefer "older" entries. This way we don't waste block space, just because a parathread was a bit slow in providing a candidate. For a parachain that lookahead queue would all be the same ParaId - so nothing changes for parachains.

Possible Issues with fallbacks/economic considerations

Having validators being allowed to fall back to other candidates, would allow them to pick favorites. This could be disincentivized by rewarding provided top of the queue candidates more than ones further down. I don't see an actual security threat as eventually an honest block producer will pick up the candidate (at least after some retrying). Also without fallbacks, a block producer can also always decide to just not put statements in, so from a censorship perspective nothing changes.

With the fallback mechanism, the success rate of parathread backing should be increased and in any case makes sure block space is utilized, but at least eventually we have to remove the ParaId from the queue.

Reasons for a candidate not making it in:

  1. Collator has problems/is malfunctioning.
  2. Backing group has problems or is actively censoring that parathread.
  3. Block producers are censoring.

Both backing group and block producers are punished by losing out on rewards. We should also charge the parathread a fee for a missed scheduling attempt, likely less than for a successful one, but enough to disincentivize on purpose spamming.

This is not really "fair" as always two parties get punished for the misbehavior of a third, but with a low enough "punishment", this is just bad luck and will even itself out over time.

In any case if a parathread block does not make it through in time, the parathread can just issue another transaction and try again, just as for the first attempt. It will likely get assigned a different backing group and different block producers.

*) We could also just punt on the parathread. You missed it - out, try again. In collator protocol we should be able to prioritize fetching for candidates that are earlier down the road, so indeed that complication is likely not necessary and we should just remove a ParaId from the queue if no candidate was provided.

@rphmeier
Copy link
Contributor Author

rphmeier commented Jan 24, 2023

Yes, as you point out, groups can be "assigned" to multiple cores per relay-parent once exotic scheduling is live. We will need to handle this in a few places, but with the right runtime API the code in the collator-protocol and statement-distribution should Just Work.

Ordering of candidates in Asynchronous Backing is an open problem. There are other issues where the relay-chain block author can ignore a long prospective chain and include an alternative short chain in the relay-chain block, leading to wasted work. The cost is non-zero as disregarding work means doing more work in the future and potentially missing out on era points.

Most of the changes described here, with respect to the choice of which candidates to second, would live in the backing subsystem.

Both backing group and block producers are punished by losing out on rewards. We should also charge the parathread a fee for a missed scheduling attempt, likely less than for a successful one, but enough to disincentivize on purpose spamming.

This is not really "fair" as always two parties get punished for the misbehavior of a third, but with a low enough "punishment", this is just bad luck and will even itself out over time.

We might approach this from the other angle: the user has paid for blockspace whether or not they utilize it, so we could simply give validators rewards when no candidate is backed as well as when a candidate is backed, although much smaller ones. I think this is equivalent, but it is also reasonable to provide a deposit which is refunded if a block actually gets backed (although, critically, not gated on the parablock becoming available)

@eskimor eskimor self-assigned this Feb 17, 2023
@eskimor
Copy link
Member

eskimor commented Feb 17, 2023

Yes, as you point out, groups can be "assigned" to multiple cores per relay-parent once exotic scheduling is live. We will need to handle this in a few places, but with the right runtime API the code in the collator-protocol and statement-distribution should Just Work.

Just read again and realized, I don't really understand this paragraph: Minor thing, I think you meant async backing and not exotic scheduling and the other: groups can be assigned to multiple cores at a single point in time, but still only one per relay parent - it is just that more relay parents are considered/valid at any given point in time.

Re-iterating for me and others, with ASCII graphic. We will have a claim queue per core with ParaId entries:

Some claim queue as of relay chain block B1:

| Para1 | Para2 | Para3 |

Collators providing a collation with relay parent B1, will be allowed to do so for Para1, Para2 and Para3: The assigned backing group is expected to accept collations for all those based on B1. It will also have other relay chain blocks in its implicit view, e.g. the ancestor of B1, B0. Assuming that B0 is also within the rotation boundaries, the backing group will also accept collations for the claim queue of B0, if those collations have B0 as their relay parent.

When authoriing a block, any backing a para in the claim queue will be accepted, but earlier entries are preferred. Assuming Para1 gets backed and included, The scheduler will update the claim queue: Para1 gets dropped and some Para4 gets pushed back, new claim queue:

| Para2 | Para3 | Para4 |

The size of the claim queue should be configured related to the maximum depth of a candidate relay parent. E.g. if we only allow a depth of 2 (maximum parent of current leaf), claim queue size larger than 2 makes little sense, even if we assume 6s block times, as the candidate would no longer be valid by the time it could get backed on chain.

Other considerations

The para ids in a claim queue don't necessarily have to differ, e.g.:

| Para1 | Para1 | Para1 |

would also be a perfectly valid claim queue, in fact for parachains they will look exactly like this. Result is normal asynchronous backing behaviour: A single parachain can prepare multiple candidates ahead of time.

Claim queues of adjacent relay chain blocks will normally have an overlap. Especially if backing groups accept collations for older (up to max depth) relay parents, they should keep track of the most current claim queue and in general should consider information about already received candidates and chain state - e.g. candidates that are pending availability to avoid wasting work on candidates that cannot possibly make it.

For considering chain state on backed candidates, the claim queue will likely also have a state for each queued item like "scheduled", "occupied" ... similar to availability cores we have now. The old availability cores mechanism are basically claim queues of size 1.

@bkchr
Copy link
Member

bkchr commented Feb 17, 2023

The size of the claim queue should be configured related to the maximum depth of a candidate relay parent. E.g. if we only allow a depth of 2 (maximum parent of current leaf), claim queue size larger than 2 makes little sense, even if we assume 6s block times, as the candidate would no longer be valid by the time it could get backed on chain.

Even if we would have a maximum depth of 1 (like we currently have), the claim queue should be bigger IMO. I could imagine collators may use this time before building to fetch relevant transaction or already be able to prepare certain computations. There is no hard connection between the relay parent and the transactions a Parachain can include. They could maybe optimize for first applying all the transactions that don't need any information about the relay chain and then in the "last second" they push the relay chain block (they are building on) to the runtime followed by the transaction that are requiring these information.

@eskimor
Copy link
Member

eskimor commented Feb 17, 2023

Justifies maybe another state: "Upcoming". Those collations would not yet be accepted, but the collator could start preparation work.

@bkchr
Copy link
Member

bkchr commented Feb 17, 2023

Maybe I misunderstood you, I thought that the "claim queue" is the order in which a core is given out to certain collator/parathread combinations based on them winning the slots in this queue?

@eskimor
Copy link
Member

eskimor commented Feb 17, 2023

More or less. It is the upcoming core assignments (ParaIds) for a core. It is relevant to collators, so they know when they are supposed to produce a block and relevant to validators, so they know what collations to accept. Indeed the claim queue for the parathread cores is the result of successful orders.

Upcoming would then be a special state, that is just a heads up for collators that they are coming up, but validators would not yet accept such collations. But honestly, I don't think we will really need this with async backing, as then the claim queue will be larger 1, hence you know what is upcoming anyway.

@bkchr
Copy link
Member

bkchr commented Feb 17, 2023

Upcoming would then be a special state, that is just a heads up for collators that they are coming up

Can they not just find this out by inspecting at which point in the queue they are? Why do we require some special state for this?

@rphmeier
Copy link
Contributor Author

rphmeier commented Feb 18, 2023

Minor thing, I think you meant async backing and not exotic scheduling and the other: groups can be assigned to multiple cores at a single point in time, but still only one per relay parent - it is just that more relay parents are considered/valid at any given point in time.

Looking back over it, I think you are right, although, I also think I was alluding to the fact that we can have the same group assigned to multiple parachains at the same time, even on the same core.

@rphmeier
Copy link
Contributor Author

rphmeier commented Feb 18, 2023

Collators providing a collation with relay parent B1, will be allowed to do so for Para1, Para2 and Para3: The assigned backing group is expected to accept collations for all those based on B1. It will also have other relay chain blocks in its implicit view, e.g. the ancestor of B1, B0. Assuming that B0 is also within the rotation boundaries, the backing group will also accept collations for the claim queue of B0, if those collations have B0 as their relay parent.

Do you mean that backing validators at B1 should accept candidates for all parathreads in the claims queue for their core with relay-parent B1? I'll assume so going forward, as that's my best reading of the text here.

The size of the claim queue should be configured related to the maximum depth of a candidate relay parent. E.g. if we only allow a depth of 2 (maximum parent of current leaf), claim queue size larger than 2 makes little sense, even if we assume 6s block times, as the candidate would no longer be valid by the time it could get backed on chain.

I think this is accurate w.r.t. the part of the queue that could be considered by backing groups, but i don't understand why we would constrain the size of the entire claim queue by this as opposed to just constraining the size of the prefix of the claim queue that backing validators should consider at any point.

Q: why shouldn't the queue have 100 items, if backing validators only have to deal with max-depth at most?
A: if parathread claims commit to specific candidate hashes, this is a hard requirement, as the candidate hash commits to the relay-parent. Otherwise, there is no technical reason.

Q: why should parathread claims commit to specific candidate hashes?
A: ?

The para ids in a claim queue don't necessarily have to differ, e.g.:

| Para1 | Para1 | Para1 |

would also be a perfectly valid claim queue, in fact for parachains they will look exactly like this. Result is normal asynchronous backing behaviour: A single parachain can prepare multiple candidates ahead of time.

Is the plan to refactor parachains to use claim queues like this as well? or does "exactly like this" mean something different?

The para ids in a claim queue don't necessarily have to differ,

Agreed, however, we should clarify that a para ID may only exist within a single claim queue at a time. At least until we have sequence numbers in candidate receipts, which will then let us store (ParaId, SequenceNumber) in claim queues.

@eskimor
Copy link
Member

eskimor commented Feb 20, 2023

Do you mean that backing validators at B1 should accept candidates for all parathreads in the claims queue for their core with relay-parent B1? I'll assume so going forward, as that's my best reading of the text here.

Yes, exactly.

I think this is accurate w.r.t. the part of the queue that could be considered by backing groups, but i don't understand why we would constrain the size of the entire claim queue by this as opposed to just constraining the size of the prefix of the claim queue that backing validators should consider at any point.

This is kind of equivalent to the the suggested third state "Upcoming". It is possible to expose more via either mechanism and we can if there is a need. I am not sure there is though. On the flip side the claim queue is a contract on what is coming up and it is not supposed to change, apart from revealing more entries on the end and popping the front on timeout/availability. The longer the exposed queue (with this guarantee) the less flexibility, e.g. pushing back a ParaId when it timed out on availability.

We could relax the contract for entries after the prefix, saying "we think they will come up on this order, but this might change" ... but that seems a bit moot. On the other hand the queue size also influences assignment providers, which might not even be able to reveal the next 100 items already. (because they are still discovered, orders coming in ...)

Q: why should parathread claims commit to specific candidate hashes?

They don't, this would severely complicate the implementation, I think - and I don't see why we would want that.

Is the plan to refactor parachains to use claim queues like this as well? or does "exactly like this" mean something different?

Yes. The core abstraction gains a dimension, no longer of dimension 0, but dimension 1 - called claim queue now. My current thinking is that after the assignment provider there is no difference between parathreads and parachains. Validator node side and even collators don't care (apart from the separate part where they send bid extrinsics): They see their assignment coming up and produce a block, they don't care whether this happens all the time (parachain) or only once in a while (parathread).

@eskimor
Copy link
Member

eskimor commented Feb 20, 2023

Can they not just find this out by inspecting at which point in the queue they are? Why do we require some special state for this?

We would like to meaningfully limit the number of collations a validator has to assume valid and needs to accept at a given point in time. If a validator accepted collations that come later up in the queue, it would be wasting resources as such a collation if provided has no chance of making it into a block before it becomes invalid - hence no point in accepting it in the first place. If exposed at all, either another state which just shows "coming up, get ready" - or as Rob suggested specify some prefix length, with the same effect.

But again, I don't see why we would need either (once async backing landed).

@rphmeier
Copy link
Contributor Author

We seem to have a communication issue. What we agree on is

  1. There is a claim queue for every core
  2. Validators use the claim queue to determine what they should work on
  3. Early claims in the queue are definitely going to be acted, while later claims may not

The miscommunication is that I am only saying that the exposed claim queue via Runtime API should encompass (3) but that this does not need to limit the general size of the claim queue. That's all; I'm not suggesting that far-in-the-future stuff be exposed to or acted on to validators, just that the claims may be stored with further lookahead on the relay chain. Maybe a moot point.

@eskimor
Copy link
Member

eskimor commented Feb 22, 2023

Full agreement on point 1 and 2, for point 3: How I envision it to work as of now, is the claim queue is managed by the scheduler and includes as many elements per core as we actually need. If the first entry becomes available/candidate gets included, we remove it from the front, pop from the assignment provider to fill it up again (push back onto the claim queue). The assignment provider, in case of parathreads will have its own order queue which it uses to provide those assignments when "popped".

Is this maybe part of the communication issue? We do have another queue - on the side of the parathread assignment provider. We might want to expose that one as well, if that is useful.

We also could have a longer claim queue within the relay chain runtime than what we actually expose to validators for backing, this might be useful or even necessary. But I am not sure this is what you meant.

@rphmeier
Copy link
Contributor Author

The assignment provider, in case of parathreads will have its own order queue which it uses to provide those assignments when "popped".

This is what I was missing. This all makes sense to me now. Thanks for elaborating

@Polkadot-Forum
Copy link

This issue has been mentioned on Polkadot Forum. There might be relevant details there:

https://forum.polkadot.network/t/on-demand-parachains/2208/1

@koute
Copy link
Contributor

koute commented Mar 9, 2023

Wasm Execution Risks: related to #990 ; if there are underlying issues with PVF execution that we aren't aware of, it'll be far cheaper and easier to exploit them on parathreads than it is on parachains.

This is a separate issue to potential sources of non-determinism and is more of an implementation issue rather than a design issue, but considering today's security vulnerability in wasmtime where a really nasty severity vulnerability was found (which can lead to remote code execution!) I think more security hardening (#882) should be a hard blocker before we actually enable parathreads on any value bearing chain. I'd like to have at least a full seccomp jail for the PVF host process.

@burdges
Copy link

burdges commented Aug 22, 2023

Option 2 is actually quite intriguing at first sight: It would behave rather similar to how normal transactions are processed: You send it to a validator, who puts it in a mempool and then picks what to validate, based on price. But:

I'd like to better understand the issue here.

DoS protection becomes harder, no simple checking of collator id would be possible, except if we registered CollatorIds together with the PVF.

We've parathread state on the relay chain. It'd contain collator ids, like any other parachain presumably does, no?

Also the collator protocol would need to change significantly and would differ for parathreads and parachains, while with option 1, changes to the collator protocol will be fairly minimal.

It's true this approach opens up the collator protocol, not merely tacks something onto the front end, yes. I guess this is the concern.

Validator groups rotate, hence if your advertisement is not processed within the rotation time, you would need to resend it to the next group. Probably not a big deal though.

Identical issue with other parachains, no?

Actually needed price is not known in advance, more complication for the Cumulus side - see below.

Yes, but we decided we'd have the price known in advance anyways, no?

The "included" fee, would also need to go to the relay chain in some way. Most likely as a separate transaction that needs to be sent, similar to variant 1 or we would adjust backing statements to include it. The separate transaction is risky for the validator and racy. If the transaction is not recorded before the backing statement, the chain would have to treat the backing statements as invalid. So it kind of has to go into the backing statement, which is another protocol change.

Ain't too hard to have prepayments, ala paritytech/cumulus#2154 (comment)

@eskimor
Copy link
Member

eskimor commented Aug 23, 2023

We've parathread state on the relay chain. It'd contain collator ids, like any other parachain presumably does, no?

no we don't do that.

Identical issue with other parachains, no?

No it is quite different. In the described model there is no core scheduled for the on-demand chain yet, hence we don't know whether there are resources available right now. While with normal parachains they have a scheduled core: They can realistically assume their collation to be processed by the backing group they advertised it to.

With the probabilistic scheduling @rphmeier is pivoting to now, things are changing though and it might make sense at some point to re-open decisions made on how we want to do on-demand.

Yes, but we decided we'd have the price known in advance anyways, no?

Not really, no. Price is adjusting all the time, based on demand.

Ain't too hard to have prepayments, ala paritytech/cumulus#2154 (comment)

Would be pretty hard, requiring substantial changes .. but they go in a similar direction Rob is moving anyway. The main benefit would be that the parachain itself could order its cores, instead of having to refund collators - right? Downside is, more work for the relay chain as we have to keep stuff around for potentially a very long time, also Gav was concerned about people buying cores/resources when they are cheap and then hoarding them to use later.

@burdges
Copy link

burdges commented Aug 23, 2023

As I said elsewhere, I'm happy with a KISS approach here: We could deploy one simple scheme, but tell those parathreads they should upgrade to being full parachains if they want more flexibility. We could later build a cleanly abstracted scheme if & when we want parathreads to have more flexibility. Avoid too much investment in parathreads before we learn more about their usage, handling, etc.

@Sophia-Gold Sophia-Gold transferred this issue from paritytech/polkadot Aug 24, 2023
@the-right-joyce the-right-joyce moved this to In Progress in parachains team board Oct 12, 2023
@the-right-joyce the-right-joyce added the I6-meta A specific issue for grouping tasks or bugs of a specific category. label Oct 12, 2023
@the-right-joyce the-right-joyce moved this from Draft to Open in Parity Roadmap Oct 12, 2023
claravanstaden pushed a commit to Snowfork/polkadot-sdk that referenced this issue Dec 8, 2023
* SNO-429

* Remove Genesis config & Governance call by xcm

* Upgrade foundry fix ci breaking

* Some cleanups

* E2E speedup

* For pure mac&linux env compatible

* Revert "Upgrade foundry fix ci breaking"

This reverts commit c557d8483a04878fcddf0cc2d874855425c6f57f.

* More refactoring

* Add env config for fast mode

* Format&Cleanup
@eskimor eskimor moved this from In Progress to Completed in parachains team board Mar 21, 2024
bkchr pushed a commit that referenced this issue Apr 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I6-meta A specific issue for grouping tasks or bugs of a specific category.
Projects
Status: Open
Status: Completed
Status: In progress
Development

No branches or pull requests

11 participants