[Feature] Implement better garbage collection of snapshots #10
Labels
kind/enhancement
Enhancement, improvement, extension
lifecycle/stale
Nobody worked on this for 6 months (will further age)
How to categorize this issue?
/kind enhancement
What would you like to be added:
Steward should provide a garbage collection mechanism for the uploaded snapshots, to ensure that the object storage container stays as lean as possible, helping reduce costs. Garbage collection options provided to the user must be robust, ie, users must be offered flexibility to choose from different garbage collection strategies/policies, and hardcoding of policy values must be avoided.
Snapshot retention policies to be offered: (assume that a
snapshot set
refers to a full snapshot along with the delta snapshots on top of it, up till the next full snapshot; asnapshot set
is used for restoration of the etcd, and the below policies are defined for snapshot sets rather than individual snapshots)Hour
,Day
,Week
andMonth
.Hour
- is minute [00, 59]Day
- is hour:minute [00:00 - 23:59]Week
- is day-of-week [Monday - Sunday]Month
- is day-of-month [0 - month-end]Each unit has has 2 aspects:
Max per unit
andNumber of previous units
, defined as X/Y, which tells us the maximum number of snapshot sets to retain (X) per every time unit for the previous Y time units.Example:
Hour (max per hour/number of previous hrs) -
1/5
Day - (max per day/number of previous days) -
2/7
Week - (max per week/number of previous weeks) -
1/3
Month - (max per month/number of previous months) -
2/5
Let's assume that a full snapshot schedule of
once every 30 mins
.Let's also assume that garbage collection at 15:10 hrs UTC on 13-Sep-2023. The above schedule will then be interpreted in the following manner:
* All snapshot sets in the current hour (15:00-15:59) will be retained, since the current hour is not considered for garbage collection.
* All snapshot sets from the previous hour for last 5 hours (10:00 - 14:59) a max of 1 (latest) for every hour will be retained. This will retain a total of 5 snapshot sets (1 per hour, for the previous 5 hours)
* Once the hour schedule is handled, then it considers the day schedule. This starts from the previous day computed from reference point - 10:00. Start day is the previous day which is 12th Sep from which it applies the rules for day which states that take the latest 2 snapshot sets per day for previous 7 days starting 12th Sep all the way upto 6th. This will retain a total of 12 snapshot sets (2 per day for the previous 6 days)
* Once the day schedule is handled, then it considers week schedule. The week before is 28th Aug - 3rd Sep from where it starts and takes 1 latest snapshot per week for a total previous 3 weeks upto 14th Aug. This will retain a total of 3 snapshot sets (1 per week for the previous 3 weeks)
* Once the week schedule is handled, then it considers monthly schedule. This starts from previous month which is July and takes a latest of 2 snapshots per month for previous 5 months starting July backward till March. This will retain a total of 10 snapshot sets (2 per month for the previous 5 months)
Why is this needed:
Part of #1
Task List:
The text was updated successfully, but these errors were encountered: