Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix/cldsrv 514 handling of metadata storage errors #5547

Open
wants to merge 3 commits into
base: archive/8.7
Choose a base branch
from

Conversation

williamlardier
Copy link
Contributor

A first set of fixes to reduce the occurence of orphans creation, when the fix is "easy", that is, we can delete the orphan in the same API.

Note: The code to set delete markers is safe, as only metadata is updated. However, when deleting the data (usually, after the metadata), it becomes possible to create orphans in the storage, in this case, we only log it, for rnow, before a more consistent approach.

Some APIs will do the following operation, sequentially:
- Store data in the storage service
- Store the associated metadata in the DB
- If an error occurs when dealing with the DB, return the
error to the client.

In such a scenario, the data is still present on the data disks,
and is never deleted.
The change ensures that in case of an error, we properly clean the
orphans.
Some APIs will delete the metadata before the storage side:
in this case, we log a specific warning with the associated
information, as a first way to keep track of such objects.
Future work will persist this information , to be processed
by some background service.
@bert-e
Copy link
Contributor

bert-e commented Mar 6, 2024

Hello williamlardier,

My role is to assist you with the merge of this
pull request. Please type @bert-e help to get information
on this process, or consult the user documentation.

Available options
name description privileged authored
/after_pull_request Wait for the given pull request id to be merged before continuing with the current one.
/bypass_author_approval Bypass the pull request author's approval
/bypass_build_status Bypass the build and test status
/bypass_commit_size Bypass the check on the size of the changeset TBA
/bypass_incompatible_branch Bypass the check on the source branch prefix
/bypass_jira_check Bypass the Jira issue check
/bypass_peer_approval Bypass the pull request peers' approval
/bypass_leader_approval Bypass the pull request leaders' approval
/approve Instruct Bert-E that the author has approved the pull request. ✍️
/create_pull_requests Allow the creation of integration pull requests.
/create_integration_branches Allow the creation of integration branches.
/no_octopus Prevent Wall-E from doing any octopus merge and use multiple consecutive merge instead
/unanimity Change review acceptance criteria from one reviewer at least to all reviewers
/wait Instruct Bert-E not to run until further notice.
Available commands
name description privileged
/help Print Bert-E's manual in the pull request.
/status Print Bert-E's current status in the pull request TBA
/clear Remove all comments from Bert-E from the history TBA
/retry Re-start a fresh build TBA
/build Re-start a fresh build TBA
/force_reset Delete integration branches & pull requests, and restart merge process from the beginning.
/reset Try to remove integration branches unless there are commits on them which do not appear on the source branch.

Status report is not available.

@bert-e
Copy link
Contributor

bert-e commented Mar 6, 2024

Request integration branches

Waiting for integration branch creation to be requested by the user.

To request integration branches, please comment on this pull request with the following command:

/create_integration_branches

Alternatively, the /approve and /create_pull_requests commands will automatically
create the integration branches.

Comment on lines +366 to +375
return data.delete(objectMD.location, deleteLog, err => {
if (err) {
log.warn('potential orphan in storage', {
object: objectMD.location,
error: err,
});
return cb(err);
}
return cb(null, res);
});
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for reviewers: this code was not looking for any error when deleting data from storage. In case of error, that would create both orphans and invisible error codes for the user. Let me know if that should not be changed due to some non-documented reason(s).

@@ -364,7 +364,7 @@ function getObjMetadataAndDelete(authInfo, canonicalID, request,
objMD, authInfo, canonicalID, null, request,
deleteInfo.newDeleteMarker, null, overheadField, log,
's3:ObjectRemoved:DeleteMarkerCreated', (err, result) =>
callback(err, objMD, deleteInfo, result.versionId));
callback(err, objMD, deleteInfo, result?.versionId));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for reviewers: this code is not strictly related to this PR, but when the function returns an error, result is null so we crash here.

Copy link

@KazToozs KazToozs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Supposing the await for the 'more consistent' approach is why we don't see tests for the cases where we can't delete?
Do you know what this approach will be?
LGTM otherwise

Copy link
Contributor

@benzekrimaha benzekrimaha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@williamlardier
Copy link
Contributor Author

williamlardier commented Mar 12, 2024

Supposing the await for the 'more consistent' approach is why we don't see tests for the cases where we can't delete?
Do you know what this approach will be?

If I understand well your question @KazToozs , you are referring to the remaining oprhans we crate, or the cases where we only log it. My suggested approach, for Zenko, is to rely on transactions to perform atomic operations on the database. This way, we can easily avoid partial metadata updates that lead to either oprhans on the storage side, or in the metadata DB.
This however requires more design work beforehand.

Another solution, to complete it, because we can still have orphans with atomic updates (because we delete data from 2 different storage backends), it to persist the list of known keys that are (maybe) orphans, and have an internal job (or manual operation) taking care of them, if needed. This also requires some design.

@@ -197,6 +197,8 @@ function createAndStoreObject(bucketName, bucketMD, objectKey, objMD, authInfo,
/* eslint-disable camelcase */
const dontSkipBackend = externalBackends;
/* eslint-enable camelcase */
let dataGetInfoArr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you make this variable global, you need to remove it from the variables which trickle through the waterfall... (e.g. infoArr).... and it should problably be renamed, since it is not really "dataGet" anymore

options.dataToDelete, requestLogger, requestMethod, next);
options.dataToDelete, requestLogger, requestMethod, (err, data) => {
if (err) {
needsToCleanStorage = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of introducing 2 variables, it seems we simply need to batchDelete here : since this is the only case where cleanup needs to happen...

@@ -331,6 +334,7 @@ function objectPutCopyPart(authInfo, request, sourceBucket,
if (err) {
log.debug('error storing new metadata',
{ error: err, method: 'storeNewPartMetadata' });
needsToCleanStorage = true;
Copy link
Contributor

@francoisferrand francoisferrand Mar 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the only case, might as well cleanup here: avoiding the global variables and keeping the existing waterfall logic...

@@ -333,6 +336,7 @@ function objectPutPart(authInfo, request, streamingV4Params, log,
error: err,
method: 'objectPutPart::metadata.putObjectMD',
});
needsToCleanStorage = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same, best to cleanup here

@jonathan-gramain
Copy link
Contributor

I haven't looked at the details of this PR but would like to mention that for S3C, it is a deliberate choice not to cleanup orphans. Indeed it's possible to have a dangling metadata entry because we are not sure if the metadata write actually failed for real, when we get an error. The dangling entry can cause serious issues to applications or suspicion of data loss because we cannot always know what is the history of this entry and if it has had an error.

Maybe a middle ground to tackle this issue better could be to defer the orphan cleanup after some time long enough to let the Metadata layer settle all its pending requests or timeout, then re-check what is the metadata state before doing the orphan deletion.

@jonathan-gramain
Copy link
Contributor

Also, when we have a good solution in mind, we should definitely consider applying it on 7.x branches (but we could do a later backport after more testing if we are concerned about the risk of regression on S3C).

@williamlardier
Copy link
Contributor Author

williamlardier commented Mar 15, 2024

@jonathan-gramain , do you mean we can have errors returned by metadata in the S3C case, and this approach for MongoDB is not safe for the 7.x branches? Or do you mean, even with MongoDB, we should not "trust" the errors returned by the driver, as it might report an error, while the metadata was actually written?
Note that here, we tackle orphans in the storage, not in metadata.

Having something running after a while seem unsafe: we can, between this timelapse, have other operations on this object's metadata and have it changed/deleted in a way that would not solve our issue here. E.g.: we really fail to write the metadata at first, but the data A is written in the storage. Then the client retries and succeed, metadata is written and data B is stored. Then the cleanup job detects that the metadata is here, and does nothing. At the end, we have an orphan.

Anyway, putting this work on hold as we will need a unified solution for both branches (IMHO, in our APIs, as we should rely on their return codes perfectly).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants