Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add retry mechanics to pallet-scheduler #3060

Merged
merged 44 commits into from
Feb 16, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
328d0cd
Add option to retry task in scheduler
georgepisaltu Jan 23, 2024
6f56d9f
Add unit tests for retry scheduler
georgepisaltu Jan 23, 2024
22800c5
Add benchmarks for retry scheduler
georgepisaltu Jan 24, 2024
9315c2e
Add some docs to retry functions
georgepisaltu Jan 25, 2024
5bbd31e
Remove redundant clone
georgepisaltu Jan 25, 2024
3eb8016
Add real weights to scheduler pallet
georgepisaltu Jan 25, 2024
9ecae46
Merge remote-tracking branch 'upstream/master' into retry-schedule
georgepisaltu Jan 25, 2024
fcff15c
".git/.scripts/commands/bench-all/bench-all.sh" --pallet=pallet_sched…
Jan 25, 2024
a8fd732
".git/.scripts/commands/bench-all/bench-all.sh" --pallet=pallet_sched…
Jan 25, 2024
3e6e77e
Merge branch 'master' of https://github.com/paritytech/polkadot-sdk i…
Jan 25, 2024
9120d60
".git/.scripts/commands/bench-all/bench-all.sh" --pallet=pallet_sched…
Jan 25, 2024
4167d6f
".git/.scripts/commands/bench-all/bench-all.sh" --pallet=pallet_sched…
Jan 25, 2024
816906a
Use `TaskAddress` in `set_retry`
georgepisaltu Jan 26, 2024
8a28ea5
Merge remote-tracking branch 'upstream/master' into retry-schedule
georgepisaltu Jan 26, 2024
900d2d5
Add prdoc
georgepisaltu Jan 26, 2024
05cbdfc
Refactor agenda query in `set_retry`
georgepisaltu Jan 26, 2024
5fc62f4
Minor renames and fixes
georgepisaltu Jan 26, 2024
5e305e1
Refactor `schedule_retry` return type
georgepisaltu Jan 26, 2024
5943dcc
Implement `ensure_privilege`
georgepisaltu Jan 26, 2024
690f2b8
Add event for setting retry config
georgepisaltu Jan 26, 2024
4a81417
Make retry fail if insufficient weight
georgepisaltu Jan 29, 2024
438effb
Remove redundant weight parameter in `set_retry`
georgepisaltu Jan 29, 2024
d048760
Merge remote-tracking branch 'upstream/master' into retry-schedule
georgepisaltu Jan 29, 2024
3c2b540
Add test for dropping insufficient weight retry
georgepisaltu Jan 29, 2024
7a39a69
Merge remote-tracking branch 'upstream/master' into retry-schedule
georgepisaltu Jan 29, 2024
2e8f954
Clean up retry config on cancel
georgepisaltu Feb 1, 2024
7dfb517
Small refactor
georgepisaltu Feb 1, 2024
2e26707
Merge remote-tracking branch 'upstream/master' into retry-schedule
georgepisaltu Feb 1, 2024
40b567d
Add docs to retry config map
georgepisaltu Feb 1, 2024
2b35465
Add retry count to `RetrySet` event
georgepisaltu Feb 2, 2024
a43df52
Merge remote-tracking branch 'upstream/master' into retry-schedule
georgepisaltu Feb 2, 2024
9da58c6
Make retries independent of periodic runs
georgepisaltu Feb 7, 2024
2976dac
Merge remote-tracking branch 'upstream/master' into retry-schedule
georgepisaltu Feb 7, 2024
8ce66f7
Small refactoring
georgepisaltu Feb 13, 2024
dc9ef2e
Add `cancel_retry` extrinsics
georgepisaltu Feb 13, 2024
945a095
Merge remote-tracking branch 'upstream/master' into retry-schedule
georgepisaltu Feb 13, 2024
39eb209
Add e2e unit test for retry schedule
georgepisaltu Feb 14, 2024
2290240
Merge remote-tracking branch 'upstream/master' into retry-schedule
georgepisaltu Feb 14, 2024
50a2010
Simplify `schedule_retry`
georgepisaltu Feb 15, 2024
e9cc27e
Add docs for `as_retry`
georgepisaltu Feb 15, 2024
497100c
Merge remote-tracking branch 'upstream/master' into retry-schedule
georgepisaltu Feb 15, 2024
863bec7
Update doc comments for `set_retry`
georgepisaltu Feb 16, 2024
7a16648
Merge remote-tracking branch 'upstream/master' into retry-schedule
georgepisaltu Feb 16, 2024
b72da0c
Move common logic under `do_cancel_retry`
georgepisaltu Feb 16, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion substrate/frame/scheduler/src/benchmarking.rs
Original file line number Diff line number Diff line change
Expand Up @@ -327,7 +327,7 @@ benchmarks! {
let (mut when, index) = address;
let task = Agenda::<T>::get(when)[index as usize].clone().unwrap();
}: {
assert!(Scheduler::<T>::schedule_retry(when, when, index, task).is_ok());
Scheduler::<T>::schedule_retry(when, when, index, &task);
} verify {
when = when + BlockNumberFor::<T>::one();
assert_eq!(
Expand Down
140 changes: 83 additions & 57 deletions substrate/frame/scheduler/src/lib.rs
ggwpez marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
Expand Up @@ -159,6 +159,24 @@ pub struct Scheduled<Name, Call, BlockNumber, PalletsOrigin, AccountId> {
_phantom: PhantomData<AccountId>,
}

impl<Name, Call, BlockNumber, PalletsOrigin, AccountId>
Scheduled<Name, Call, BlockNumber, PalletsOrigin, AccountId>
where
Call: Clone,
PalletsOrigin: Clone,
{
pub fn as_retry(&self) -> Self {
georgepisaltu marked this conversation as resolved.
Show resolved Hide resolved
Self {
maybe_id: None,
priority: self.priority,
call: self.call.clone(),
maybe_periodic: None,
muharem marked this conversation as resolved.
Show resolved Hide resolved
origin: self.origin.clone(),
_phantom: Default::default(),
}
}
}

use crate::{Scheduled as ScheduledV3, Scheduled as ScheduledV2};

pub type ScheduledV2Of<T> = ScheduledV2<
Expand Down Expand Up @@ -481,8 +499,15 @@ pub mod pallet {
/// will be removed from the schedule.
///
/// Tasks which need to be scheduled for a retry are still subject to weight metering and
/// agenda space, same as a regular task. Periodic tasks will have their periodic schedule
/// put on hold while the task is retrying.
/// agenda space, same as a regular task. If a periodic task fails, it will be scheduled
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be worth mentioning that we talking about possible recoverable fail

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My view on this is that we shouldn't mention in the pallet docs what type of calls this applies to because the pallet itself cannot make the distinction between a recoverable or unrecoverable fail. Only a user can do that and they should update their retry config accordingly.

/// normally while the task is retrying.
///
/// Tasks scheduled as a result of a retry will have their own retry configuration derived
/// from the original task's configuration, but will have a lower value for `remaining` than
/// the original `total_retries`.
///
/// To cancel a retry configuration for a task, users should call this extrinsic with a
/// `retries` value equal to `0`.
georgepisaltu marked this conversation as resolved.
Show resolved Hide resolved
#[pallet::call_index(6)]
#[pallet::weight(<T as Config>::WeightInfo::set_retry())]
pub fn set_retry(
Expand Down Expand Up @@ -517,8 +542,17 @@ pub mod pallet {
/// will be removed from the schedule.
///
/// Tasks which need to be scheduled for a retry are still subject to weight metering and
/// agenda space, same as a regular task. Periodic tasks will have their periodic schedule
/// put on hold while the task is retrying.
/// agenda space, same as a regular task. If a periodic task fails, it will be scheduled
/// normally while the task is retrying.
///
/// Tasks scheduled as a result of a named retry will have their own retry configuration
/// derived from the original task's configuration, but will have a lower value for
/// `remaining` than the original `total_retries`. Tasks scheduled as a result of a retry of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you explain why you think it should be this way?
the way I think about it, is I want some call to be executed 5 times, and when any of it fail, I wanna retry it 3 times. I wanna be sure that every call will be given chance to be retried 3 times. If first exhausts all retries for some unexpected reason, I do not wanna leave the rest calls without retries. It is hard for me to model such situation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way you described it is the way it works.

A periodic task that fails at any point during its periodic execution will have a clone of itself scheduled as a retry. That clone will have a retry configuration attached with remaining being total_retries - 1 at first. The original task still has its own unchanged retry configuration. The retry task will keep being retries until it runs out of retries. The periodic task keeps running periodically, potentially spawning other retry clones if it fails again.

For non-periodic tasks, there is no clone, they become the retry task because there no point in keeping clones around for something which will never run again.

In short, the point of the doc comment is to say that a retry configuration with remaining == total_retries means it's the original task. A retry configuration with remaining < total_retries means it's a retry clone.

I will try to make it clearer in the docs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

/// a named task will not be named.
///
/// To cancel a retry configuration for a named task, users should call this extrinsic with
/// a `retries` value equal to `0`. To cancel a retry attempt for a named task, users should
/// instead call `set_retry` using that task's address.
#[pallet::call_index(7)]
#[pallet::weight(<T as Config>::WeightInfo::set_retry_named())]
pub fn set_retry_named(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way to remove a retry config is just to set it to 0, or?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. We could add 2 more extrinsics to remove a retry config, or we could make period: Option<BlockNumberFor<T>>, and unset it in the case of None. I like it how it is now, but I don't feel strongly about this, WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im also fine with both 🤷‍♂️

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be just worth mentioned in the doc how it can be removed, and make sure we have a unit test for this removal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

Expand Down Expand Up @@ -1189,7 +1223,7 @@ impl<T: Config> Pallet<T> {
when: BlockNumberFor<T>,
agenda_index: u32,
is_first: bool,
task: ScheduledOf<T>,
mut task: ScheduledOf<T>,
) -> Result<(), (ServiceTaskError, Option<ScheduledOf<T>>)> {
if let Some(ref id) = task.maybe_id {
Lookup::<T>::remove(id);
Expand Down Expand Up @@ -1231,14 +1265,9 @@ impl<T: Config> Pallet<T> {
result,
});

let mut task = if failed {
match Self::try_schedule_retry(weight, now, when, agenda_index, task) {
Ok(()) => return Ok(()),
Err(task) => task,
}
} else {
task
};
if failed {
Self::try_schedule_retry(weight, now, when, agenda_index, &task);
georgepisaltu marked this conversation as resolved.
Show resolved Hide resolved
}

if let &Some((period, count)) = &task.maybe_periodic {
if count > 1 {
Expand Down Expand Up @@ -1318,18 +1347,13 @@ impl<T: Config> Pallet<T> {
}

/// If there is enough weight, try to run the task reschedule logic.
///
/// If the task was successfully rescheduled for later, this function will return `Ok`.
/// Otherwise, if the task was not rescheduled, either because there was not enough weight left
/// over, there was no retry configuration in place, there were no more retry attempts left, or
/// the agenda was full, this function will return the task so it can be handled somewhere else.
fn try_schedule_retry(
weight: &mut WeightMeter,
now: BlockNumberFor<T>,
when: BlockNumberFor<T>,
agenda_index: u32,
task: ScheduledOf<T>,
) -> Result<(), ScheduledOf<T>> {
task: &ScheduledOf<T>,
) {
if weight
.try_consume(T::WeightInfo::schedule_retry(T::MaxScheduledPerBlock::get()))
.is_err()
Expand All @@ -1338,55 +1362,57 @@ impl<T: Config> Pallet<T> {
task: (when, agenda_index),
id: task.maybe_id,
});
return Err(task);
return;
}

Self::schedule_retry(now, when, agenda_index, task)
}

/// Check if a task has a retry configuration in place and, if so, try to reschedule it.
///
/// If the task was successfully rescheduled for later, this function will return `Ok`.
/// Otherwise, if the task was not rescheduled, either because there was no retry configuration
/// in place, there were no more retry attempts left, or the agenda was full, this function
/// will return the task so it can be handled somewhere else.
/// Possible causes for failure to schedule a retry for a task:
/// - there was no retry configuration in place
/// - there were no more retry attempts left
/// - the agenda was full.
fn schedule_retry(
now: BlockNumberFor<T>,
when: BlockNumberFor<T>,
agenda_index: u32,
task: ScheduledOf<T>,
) -> Result<(), ScheduledOf<T>> {
task: &ScheduledOf<T>,
) {
// Check if we have a retry configuration set.
match Retries::<T>::take((when, agenda_index)) {
Some(RetryConfig { total_retries, mut remaining, period }) => {
remaining = match remaining.checked_sub(1) {
Some(n) => n,
None => return Err(task),
};
let wake = now.saturating_add(period);
match Self::place_task(wake, task) {
Ok(address) => {
// Reinsert the retry config to the new address of the task after it was
// placed.
Retries::<T>::insert(
address,
RetryConfig { total_retries, remaining, period },
);
Ok(())
},
Err((_, task)) => {
// TODO: Leave task in storage somewhere for it to be
// rescheduled manually.
T::Preimages::drop(&task.call);
Self::deposit_event(Event::RetryFailed {
task: (when, agenda_index),
id: task.maybe_id,
});
Err(task)
},
}
},
None => Err(task),
if let Some(RetryConfig { total_retries, mut remaining, period }) =
Retries::<T>::take((when, agenda_index))
{
// If we're going to run this task again and it's the original one, keep its retry
// config in storage at the original location.
if task.maybe_periodic.is_some() && remaining == total_retries {
Retries::<T>::insert(
(when, agenda_index),
RetryConfig { total_retries, remaining, period },
);
}
remaining = match remaining.checked_sub(1) {
Some(n) => n,
None => return,
};
let wake = now.saturating_add(period);
match Self::place_task(wake, task.as_retry()) {
Ok(address) => {
// Reinsert the retry config to the new address of the task after it was
// placed.
Retries::<T>::insert(address, RetryConfig { total_retries, remaining, period });
},
Err((_, task)) => {
// TODO: Leave task in storage somewhere for it to be
// rescheduled manually.
T::Preimages::drop(&task.call);
Self::deposit_event(Event::RetryFailed {
task: (when, agenda_index),
id: task.maybe_id,
});
},
}
}
}

Expand Down
Loading