Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Address shadowing divergence in reftest, update semantics doc #1201

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

c-hagem
Copy link
Contributor

@c-hagem c-hagem commented Dec 12, 2024

This commit addresses a case where MP model and property tests diverge (#1066). The issue was caused by the reference not correctly implementing the shadowing order defined in #4f8cf0b. This commit fixes the reference model, and clarifies the semantics arising from concurrent MPUs.

This is not a breaking change, as it only impacts the reference tests.

This does not need a Changelog entry, as the change does not impact Mountpoint's behaviour.


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the Developer Certificate of Origin (DCO).

@c-hagem c-hagem requested a deployment to PR integration tests December 12, 2024 16:05 — with GitHub Actions Waiting
@c-hagem c-hagem requested a deployment to PR integration tests December 12, 2024 16:05 — with GitHub Actions Waiting
@c-hagem c-hagem requested a deployment to PR integration tests December 12, 2024 16:05 — with GitHub Actions Waiting
@c-hagem c-hagem requested a deployment to PR integration tests December 12, 2024 16:05 — with GitHub Actions Waiting
@c-hagem c-hagem requested a deployment to PR integration tests December 12, 2024 16:05 — with GitHub Actions Waiting
@c-hagem c-hagem requested a deployment to PR integration tests December 12, 2024 16:05 — with GitHub Actions Waiting
@c-hagem c-hagem requested a deployment to PR integration tests December 12, 2024 16:05 — with GitHub Actions Waiting
@c-hagem c-hagem force-pushed the semantics-document-reftest-fix branch from b3d7045 to f405d75 Compare December 12, 2024 16:54
@c-hagem c-hagem temporarily deployed to PR integration tests December 12, 2024 16:54 — with GitHub Actions Inactive
@c-hagem c-hagem temporarily deployed to PR integration tests December 12, 2024 16:54 — with GitHub Actions Inactive
@c-hagem c-hagem temporarily deployed to PR integration tests December 12, 2024 16:54 — with GitHub Actions Inactive
@c-hagem c-hagem temporarily deployed to PR integration tests December 12, 2024 16:54 — with GitHub Actions Inactive
@c-hagem c-hagem temporarily deployed to PR integration tests December 12, 2024 16:54 — with GitHub Actions Inactive
@c-hagem c-hagem temporarily deployed to PR integration tests December 12, 2024 16:54 — with GitHub Actions Inactive
@c-hagem c-hagem temporarily deployed to PR integration tests December 12, 2024 16:54 — with GitHub Actions Inactive
@dannycjones dannycjones self-requested a review December 13, 2024 10:08
@dannycjones dannycjones changed the title Improve Semantics.md, fix error in reference causing proptest failure Address shadowing divergence in reftest, update semantics doc Dec 13, 2024
Copy link
Contributor

@dannycjones dannycjones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the right change and semantics, I've put some feedback on docs mainly

doc/SEMANTICS.md Outdated
Comment on lines 79 to 85
For concurrently writing to the same object from multiple instances, we do not place any guarantees on which version of the object will be written to S3.
Thus, if two clients, at least one of them using Mountpoint, write to the same file and there is some overlap, Mountpoint will never read partial or corrupt data -- however there are no guarantees, which version ultimately is written to S3.
This is mainly determined by the S3 semantics and the specific kinds of requests the clients use to write to S3.
If you use multiple instances for writing the same key (or another client modifies the Object while the file is open for writing), be aware that Mountpoint currently uses Multipart Uploads (MPU) that are created after the file is opened to upload the data to S3.
This implies, that the provisions in the [S3 Documentation](https://docs.aws.amazon.com/AmazonS3/latest/userguide/mpuoverview.html#distributedmpupload) regarding multiple concurrent MPUs apply.
For example, if (in a versioning-enabled bucket) you open a file for writing on the first MP instance at 11 AM, and if you open the same file on another instance at 12 AM, and finish writing on the first instance after the second,
the version of the second instance will be committed to S3, and the 11 AM version will not be accessible from any of the Mountpoint instances afterwards.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would cut this to be honest. We already recommend above not to write to the same object concurrently, and this is far too much detail for this half of the semantics doc. (We split it into an overview for most readers, then a detailed section below that goes into specific S3 and file system behaviors.)

If needed, we could mention in the detailed semantics later in the doc but delegate reading to the linked documentation: https://docs.aws.amazon.com/AmazonS3/latest/userguide/mpuoverview.html#distributedmpupload.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cut this and added one sentence to the end of the file -- but I'm not too sure if we should keep that or just cut it completely.

doc/SEMANTICS.md Outdated
Comment on lines 160 to 164
then mounting your bucket would give a file system with a `blue` directory, containing the file `image.jpg`. The `blue` object will not be accessible. Deleting the key `blue/image.jpg` will remove the `blue` directory, and cause the `blue` file to become visible.
Additionally, remote content shadows local content. Thus Mountpoint shadows in the order: remote directories > any local state > remote files.
Thus, for example, if a user creates a local directory, and then a conflicting
object appears in the remote bucket, the directory will still be
visible instead of the new file.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
then mounting your bucket would give a file system with a `blue` directory, containing the file `image.jpg`. The `blue` object will not be accessible. Deleting the key `blue/image.jpg` will remove the `blue` directory, and cause the `blue` file to become visible.
Additionally, remote content shadows local content. Thus Mountpoint shadows in the order: remote directories > any local state > remote files.
Thus, for example, if a user creates a local directory, and then a conflicting
object appears in the remote bucket, the directory will still be
visible instead of the new file.
then mounting your bucket would give a file system with a `blue` directory, containing the file `image.jpg`. The `blue` object will not be accessible. Deleting the key `blue/image.jpg` will remove the `blue` directory, and cause the `blue` file to become visible.
Additionally, remote directories will always shadow local directories and files. Thus, Mountpoint shadows directory entries in the following order where the first takes precedence: remote directories, any local state, and finally remote files.
For example, if a user creates a local directory, and then a conflicting
object appears remotely in the S3 bucket, the directory will still be
visible instead of the new file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I mainly incorporated this -- I tried to additionally keep the example similar (using colours) / not switch from second to third person.

Comment on lines 1294 to 1298
/*
Ensure that local files are shadowed by the remote directories.
*/
#[test]
fn regression_put_nested_over_open_file() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/*
Ensure that local files are shadowed by the remote directories.
*/
#[test]
fn regression_put_nested_over_open_file() {
/*
Ensure that local files are shadowed by the remote directories.
*/
#[test]
fn regression_put_shadowing_new_local_file() {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Comment on lines 114 to 120
// If a directory of this name exists, ignore any local file
if let Some(node) = children.get(dir) {
if node.node_type() == NodeType::Directory {
return true;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should return false, no? The node was not added.

Suggested change
// If a directory of this name exists, ignore any local file
if let Some(node) = children.get(dir) {
if node.node_type() == NodeType::Directory {
return true;
}
}
// If a directory of this name exists, ignore any local file
if let Some(node) = children.get(dir) {
if node.node_type() == NodeType::Directory {
return false;
}
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thanks for catching that. I'll also update the comment for the method to reflect the implemented behaviour.

@dannycjones
Copy link
Contributor

This PR closes #1066.

This commit addresses a case where MP model and property tests diverge (awslabs#1066). The issue was caused by the reference not correctly implementing the shadowing order defined in [#4f8cf0b](awslabs@4f8cf0b). Additionally, it clarifies the Semantics arising from concurrent MPUs.

Signed-off-by: Christian Hagemeier <[email protected]>
@c-hagem c-hagem force-pushed the semantics-document-reftest-fix branch from f405d75 to c5f1469 Compare December 13, 2024 14:38
@c-hagem c-hagem requested a deployment to PR integration tests December 13, 2024 14:38 — with GitHub Actions Waiting
@c-hagem c-hagem requested a deployment to PR integration tests December 13, 2024 14:38 — with GitHub Actions Waiting
@c-hagem c-hagem requested a deployment to PR integration tests December 13, 2024 14:38 — with GitHub Actions Waiting
@c-hagem c-hagem requested a deployment to PR integration tests December 13, 2024 14:38 — with GitHub Actions Waiting
@c-hagem c-hagem requested a deployment to PR integration tests December 13, 2024 14:38 — with GitHub Actions Waiting
@c-hagem c-hagem requested a deployment to PR integration tests December 13, 2024 14:38 — with GitHub Actions Waiting
@c-hagem c-hagem requested a deployment to PR integration tests December 13, 2024 14:38 — with GitHub Actions Waiting
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants