-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proper handling for legacy parachain leases with gaps in coretime migration #426
Proper handling for legacy parachain leases with gaps in coretime migration #426
Conversation
This is a draft because I am not happy with |
…oper fix can be applied
As discussed offline with @eskimor we'll move the whole coretime migration to the fellowship repo so that a proper fix can be made. |
}); | ||
|
||
let reservation_content = message_content.clone().chain(reservations).collect(); | ||
let pool_content = message_content.clone().chain(pool).collect(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pool will be empty. Do we send an empty message here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I missed that. I guess we send an empty message. I'll fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
Some(mk_coretime_call::<T>(CoretimeCalls::SetLease(p.into(), time_slice))) | ||
}); | ||
|
||
let core_count: u16 = configuration::ActiveConfig::<T>::get() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. on another call in a minute, but please check that this core handling is actually correct. This core_count value should read 0 on Polkadot at the time of the migration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not 0. We update it on line 211-214 in the same file:
https://github.com/polkadot-fellows/runtimes/pull/426/files#diff-6a311f9b6c4cd2a09f397e5e0c90ea9193ba103bf277bb88de6bd223148af52eR211-R214
The actual value is 52:
coretime-migration TRACE: Set core count to 52. legacy paras count is 52
The reserved cores are 0 though because legacy_paras_count == core_count
. I have got an assert for this in the test:
https://github.com/tdimitrov/polkadot-coretime-launch-test/blob/a8e7aba7d6a741b10bd5330ae3119ef15a018711/coretime-migration-check/main.js#L158-L160
Is this correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reserved cores are 0 though
What you mean by reserved cores? The system chains?
Generally your code is correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the system chains.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, wait. These are reserved for on demand, if I am not mistaken.
We reserve cores for system chains, then for lease cores and what's left is for on demand. Right @eskimor ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pre-migration the system chains just have leases as if they're regular parachains. They're included in the 52, except for the coretime chain since it isn't live yet. It will be registered by the time this migration runs, so it'll detect 53 leases (maybe more depending on how the auctions go).
reservations: Asset Hub, Collectives, Bridge Hub, People Chain and Coretime Chain for a total of 5 (you'll see 4 at the minute in your tests)
leases: 48 (plus any new ones from auctions, minus any that end between now and when it runs).
legacy_paras_count
isn't the same as leases
though, I think the assertion you're expecting is system_chains.len() + leases.len() == core_count
, since core_count
is equal to legacy_paras_count
by construction (since num_cores
is zero atm)
let valid_until: u32 = match valid_until.try_into() { | ||
Ok(val) => val, | ||
Err(_) => { | ||
log::error!("Converting block number to u32 failed!"); | ||
return None | ||
}, | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let valid_until: u32 = match valid_until.try_into() { | |
Ok(val) => val, | |
Err(_) => { | |
log::error!("Converting block number to u32 failed!"); | |
return None | |
}, | |
}; |
block number is u32
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't compile due to expected u32
found associated type errors. Let's keep it as it is.
let round_up = if valid_until % TIME_SLICE_PERIOD > 0 { 1 } else { 0 }; | ||
let time_slice = valid_until / TIME_SLICE_PERIOD + TIME_SLICE_PERIOD * round_up; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let round_up = if valid_until % TIME_SLICE_PERIOD > 0 { 1 } else { 0 }; | |
let time_slice = valid_until / TIME_SLICE_PERIOD + TIME_SLICE_PERIOD * round_up; | |
let time_slice = (valid_until + TIME_SLICE_PERID - 1) / TIME_SLICE_PERIOD; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we do a different kind of round up here. timeslice
is not number of periods but a lifetime in blocks (divided by 80). Your formula works only for the 'number of periods' case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused... why we multiply the round up with TIME_SLICE_PERIOD
?
Shouldn't it be let time_slice = valid_until / TIME_SLICE_PERIOD + round_up;
) which is equivalent to your formula?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
round_up
is just 0
or 1
. It leads to the same result as what I have written there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't. Round up is multiplied by TIME_SLICE_PERIOD. Have a look: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=a5ce70639f2ee8a169e5d618e707ed74
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basti's version is right. Integer division is just the floor, adding TIMESLICE_PERIOD
rounds up, subtracting 1 makes sure it doesn't round up when v%T == 0
.
The original code is adding 80 timeslices to the end of the lease, rather than one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
Co-authored-by: Bastian Köcher <[email protected]>
let round_up = if valid_until % TIME_SLICE_PERIOD > 0 { 1 } else { 0 }; | ||
let time_slice = valid_until / TIME_SLICE_PERIOD + TIME_SLICE_PERIOD * round_up; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basti's version is right. Integer division is just the floor, adding TIMESLICE_PERIOD
rounds up, subtracting 1 makes sure it doesn't round up when v%T == 0
.
The original code is adding 80 timeslices to the end of the lease, rather than one.
Some(mk_coretime_call::<T>(CoretimeCalls::SetLease(p.into(), time_slice))) | ||
}); | ||
|
||
let core_count: u16 = configuration::ActiveConfig::<T>::get() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pre-migration the system chains just have leases as if they're regular parachains. They're included in the 52, except for the coretime chain since it isn't live yet. It will be registered by the time this migration runs, so it'll detect 53 leases (maybe more depending on how the auctions go).
reservations: Asset Hub, Collectives, Bridge Hub, People Chain and Coretime Chain for a total of 5 (you'll see 4 at the minute in your tests)
leases: 48 (plus any new ones from auctions, minus any that end between now and when it runs).
legacy_paras_count
isn't the same as leases
though, I think the assertion you're expecting is system_chains.len() + leases.len() == core_count
, since core_count
is equal to legacy_paras_count
by construction (since num_cores
is zero atm)
let pool_content = message_content.clone().chain(pool).collect(); | ||
let leases_content_1 = message_content | ||
.clone() | ||
.chain(leases.by_ref().take(legacy_paras_count / 2)) // split in two messages to avoid overweighted XCM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.chain(leases.by_ref().take(legacy_paras_count / 2)) // split in two messages to avoid overweighted XCM | |
.chain(leases.by_ref().take(leases.len() / 2)) // split in two messages to avoid overweighted XCM |
legacy_paras_count
includes system parachains
leases
does not
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but leases
is an iterator and it's a bit involved to get its length. Let's keep it that way. It's not a big deal that the messages aren't equally distributed as long as they are sent in two passes (it's just one ParaId
which causes the overweight).
|
||
let config = configuration::ActiveConfig::<T>::get(); | ||
// num_cores was on_demand_cores until now: | ||
for on_demand in 0..config.scheduler_params.num_cores { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On Polkadot this is not the path of this config entry, checking the chain state it's at config.coretime_cores
. Is there a migration ahead of this one to change the configuration in storage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parachains_configuration::migration::v12::MigrateToV12<Runtime>
does this change:
https://github.com/paritytech/polkadot-sdk/blob/ebf4f8d2d590f41817d5d38b2d9b5812a46f2342/polkadot/runtime/parachains/src/configuration/migration/v12.rs#L146-L158
It gets executed before the coretime migration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still the comment is actually not correct for Polkadot. It used to be coretime_cores not on_demand cores and there was no on-demand core before, thus nothing to migrate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(This loop will do nothing, which is fine)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good
Co-authored-by: Dónal Murray <[email protected]>
|
||
let config = configuration::ActiveConfig::<T>::get(); | ||
// num_cores was on_demand_cores until now: | ||
for on_demand in 0..config.scheduler_params.num_cores { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still the comment is actually not correct for Polkadot. It used to be coretime_cores not on_demand cores and there was no on-demand core before, thus nothing to migrate.
|
||
let config = configuration::ActiveConfig::<T>::get(); | ||
// num_cores was on_demand_cores until now: | ||
for on_demand in 0..config.scheduler_params.num_cores { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(This loop will do nothing, which is fine)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-approving to trigger a CI run with the broken runner disabled
There was a Please see this commit for details: 4675183 |
LGTM, zombienet is failing in CI but seems like it's unrelated |
/merge |
Enabled Available commands
For more information see the documentation |
ffe6a36
into
polkadot-fellows:main
While testing #401 I noticed that there are two parachains with gaps in their leases which are not migrated to coretime correctly. These are para ids 3359 and 3388.
The problem is that the migration obtains the list of parachain ids which should be migrated from
paras::Parachains
(here) and there are no entries for para ids which are not active at the moment.Paras 3359 and 3388 are not active in the current lease period (excerpt from
slots::leases()
:and:
And they have got no entries in
paras::parachains()
:As a result the migration skips them which is wrong.
A proper fix for this issue will require a new polkadot-sdk release and a bump in the runtimes repo which requires time and effort. To avoid this the PR moves the coretime migration from polkadot-sdk to the fellowship repo.