Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup sensitive data from KEB db for all deprovisioned clusters #440

Open
szwedm opened this issue Feb 8, 2024 · 9 comments
Open

Cleanup sensitive data from KEB db for all deprovisioned clusters #440

szwedm opened this issue Feb 8, 2024 · 9 comments
Assignees

Comments

@szwedm
Copy link
Contributor

szwedm commented Feb 8, 2024

Items from operations and runtime states must be removed when the instance is fully deprovisioned.
There must be a time, when the items are kept for investigation (defined retention period)

@ukff
Copy link
Contributor

ukff commented Feb 15, 2024

Check impact of cleanup on current metrics and if needed adjust implementation to handle it
It must be done before we will run cleanup.

https://github.com/orgs/kyma-project/projects/38/views/1?pane=issue&itemId=53104649

@piotrmiskiewicz
Copy link
Member

piotrmiskiewicz commented Feb 16, 2024

Solution:

New entity, where we store all necessary non-sensitive data after deprovisioning.

Subtask:

  • define model and a storage layer
  • add a logic which creates the archived instnace at the end of deprovidioning process and delete operations/runtimestates (with feature flag) PR: Archiving and cleaning in the deprovisioning process #498
  • implement getting archived instances in the runtimes endpoint for deprovisioned instance PR: Fetch instance archived runtimes endpoint #527
  • create a job which will be run on a migration time (feature flag) and archive all deprovisioned instances
  • documentation
  • enable archiving and cleaning, run the migration job

@ebensom
Copy link
Member

ebensom commented Feb 23, 2024

The behavior of existing {prefix}_operations_*_total statistics count metrics should be preserved, as they are used as high-level SLIs to track success ratio of provision/deprovision operations (and update in the near future). So the new instances_archived model IMO should be able to reflect the provisioning state (failed or succeeded), the count of deprovisioning and update operations per state, so that the stats collector could use these.

@ebensom
Copy link
Member

ebensom commented Feb 23, 2024

The track-record of operations for a deprovisioned instance could also be useful for post-mortem troubleshooting. How about keeping all the operations records after deprovisioning, just cleaning up the sensitive part (present in data and provisioning_parameters columns)?

@piotrmiskiewicz
Copy link
Member

We are planning to use increase prometheus function. It looks we can use it instead of using absolute values.

@piotrmiskiewicz
Copy link
Member

piotrmiskiewicz commented Mar 25, 2024

The release scenario:

  1. Enable Archiving on DEV, verify if the deprovisioning works fine (no errors, KCP CLI shows such deprovisioned runtime - -S deprovisioned)
  2. Enable Cleaning on DEV, verify if the deprovisioning works fine (no errors, KCP CLI shows such deprovisioned runtime - -S deprovisioned)
  3. Run the archiver job on DEV, check if archived (by the job) runtimes are visible by KCP CLI
  4. Enable archiving on STAGE
  5. Enable cleaning on STAGE
  6. Run the archiver on STAGE
  7. Wait few days to check if there are any problems on DEV/STAGE
  8. Enable archiving on PROD
  9. Enable cleaning on PROD
  10. Run archiver on PROD

Enabling archiving and cleaning in the deprovisioning process is defined in a yaml files, see: https://github.tools.sap/kyma/management-plane-config/pull/5068/files

@piotrmiskiewicz
Copy link
Member

fullfill change requirements, see: https://pages.github.tools.sap/kyma/docusaurus-docs/kyma/ops/change_process/

@piotrmiskiewicz
Copy link
Member

@piotrmiskiewicz
Copy link
Member

STAGE (before running):

select count(*) from operations;
 count
-------
 94107
(1 row)

broker=> select count(*) from operations where instance_id not in (select instance_id from instances);
 count
-------
 63625
(1 row)


broker=> select count(*) from runtime_states;
 count
--------
 124967
(1 row)

after running archiver (but before enabling archiving and cleaning):

broker=> select count(*) from operations;
 count
-------
 30549
(1 row)

broker=> select count(*) from operations where instance_id not in (select instance_id from instances);
 count
-------
    59
(1 row)

broker=> select count(*) from runtime_states;
 count
-------
 27604
(1 row)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants