-
Notifications
You must be signed in to change notification settings - Fork 241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bugfix/cldsrv 514 handling of metadata storage errors #5547
base: archive/8.7
Are you sure you want to change the base?
Bugfix/cldsrv 514 handling of metadata storage errors #5547
Conversation
Some APIs will do the following operation, sequentially: - Store data in the storage service - Store the associated metadata in the DB - If an error occurs when dealing with the DB, return the error to the client. In such a scenario, the data is still present on the data disks, and is never deleted. The change ensures that in case of an error, we properly clean the orphans.
Some APIs will delete the metadata before the storage side: in this case, we log a specific warning with the associated information, as a first way to keep track of such objects. Future work will persist this information , to be processed by some background service.
Hello williamlardier,My role is to assist you with the merge of this Available options
Available commands
Status report is not available. |
Request integration branchesWaiting for integration branch creation to be requested by the user. To request integration branches, please comment on this pull request with the following command:
Alternatively, the |
return data.delete(objectMD.location, deleteLog, err => { | ||
if (err) { | ||
log.warn('potential orphan in storage', { | ||
object: objectMD.location, | ||
error: err, | ||
}); | ||
return cb(err); | ||
} | ||
return cb(null, res); | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note for reviewers: this code was not looking for any error when deleting data from storage. In case of error, that would create both orphans and invisible error codes for the user. Let me know if that should not be changed due to some non-documented reason(s).
@@ -364,7 +364,7 @@ function getObjMetadataAndDelete(authInfo, canonicalID, request, | |||
objMD, authInfo, canonicalID, null, request, | |||
deleteInfo.newDeleteMarker, null, overheadField, log, | |||
's3:ObjectRemoved:DeleteMarkerCreated', (err, result) => | |||
callback(err, objMD, deleteInfo, result.versionId)); | |||
callback(err, objMD, deleteInfo, result?.versionId)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note for reviewers: this code is not strictly related to this PR, but when the function returns an error, result
is null so we crash here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Supposing the await for the 'more consistent' approach is why we don't see tests for the cases where we can't delete?
Do you know what this approach will be?
LGTM otherwise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
If I understand well your question @KazToozs , you are referring to the remaining oprhans we crate, or the cases where we only log it. My suggested approach, for Zenko, is to rely on transactions to perform atomic operations on the database. This way, we can easily avoid partial metadata updates that lead to either oprhans on the storage side, or in the metadata DB. Another solution, to complete it, because we can still have orphans with atomic updates (because we delete data from 2 different storage backends), it to persist the list of known keys that are (maybe) orphans, and have an internal job (or manual operation) taking care of them, if needed. This also requires some design. |
@@ -197,6 +197,8 @@ function createAndStoreObject(bucketName, bucketMD, objectKey, objMD, authInfo, | |||
/* eslint-disable camelcase */ | |||
const dontSkipBackend = externalBackends; | |||
/* eslint-enable camelcase */ | |||
let dataGetInfoArr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you make this variable global, you need to remove it from the variables which trickle through the waterfall... (e.g. infoArr).... and it should problably be renamed, since it is not really "dataGet" anymore
options.dataToDelete, requestLogger, requestMethod, next); | ||
options.dataToDelete, requestLogger, requestMethod, (err, data) => { | ||
if (err) { | ||
needsToCleanStorage = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of introducing 2 variables, it seems we simply need to batchDelete here : since this is the only case where cleanup needs to happen...
@@ -331,6 +334,7 @@ function objectPutCopyPart(authInfo, request, sourceBucket, | |||
if (err) { | |||
log.debug('error storing new metadata', | |||
{ error: err, method: 'storeNewPartMetadata' }); | |||
needsToCleanStorage = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the only case, might as well cleanup here: avoiding the global variables and keeping the existing waterfall logic...
@@ -333,6 +336,7 @@ function objectPutPart(authInfo, request, streamingV4Params, log, | |||
error: err, | |||
method: 'objectPutPart::metadata.putObjectMD', | |||
}); | |||
needsToCleanStorage = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same, best to cleanup here
I haven't looked at the details of this PR but would like to mention that for S3C, it is a deliberate choice not to cleanup orphans. Indeed it's possible to have a dangling metadata entry because we are not sure if the metadata write actually failed for real, when we get an error. The dangling entry can cause serious issues to applications or suspicion of data loss because we cannot always know what is the history of this entry and if it has had an error. Maybe a middle ground to tackle this issue better could be to defer the orphan cleanup after some time long enough to let the Metadata layer settle all its pending requests or timeout, then re-check what is the metadata state before doing the orphan deletion. |
Also, when we have a good solution in mind, we should definitely consider applying it on 7.x branches (but we could do a later backport after more testing if we are concerned about the risk of regression on S3C). |
@jonathan-gramain , do you mean we can have errors returned by metadata in the S3C case, and this approach for MongoDB is not safe for the 7.x branches? Or do you mean, even with MongoDB, we should not "trust" the errors returned by the driver, as it might report an error, while the metadata was actually written? Having something running after a while seem unsafe: we can, between this timelapse, have other operations on this object's metadata and have it changed/deleted in a way that would not solve our issue here. E.g.: we really fail to write the metadata at first, but the data A is written in the storage. Then the client retries and succeed, metadata is written and data B is stored. Then the cleanup job detects that the metadata is here, and does nothing. At the end, we have an orphan. Anyway, putting this work on hold as we will need a unified solution for both branches (IMHO, in our APIs, as we should rely on their return codes perfectly). |
A first set of fixes to reduce the occurence of orphans creation, when the fix is "easy", that is, we can delete the orphan in the same API.
Note: The code to set delete markers is safe, as only metadata is updated. However, when deleting the data (usually, after the metadata), it becomes possible to create orphans in the storage, in this case, we only log it, for rnow, before a more consistent approach.