Skip to content

Commit

Permalink
Improve Semantics.md, fix error in reference causing proptest failure
Browse files Browse the repository at this point in the history
This commit addresses a case where MP model and property tests diverge (#1066). The issue was caused by the reference not correctly implementing the shadowing order defined in [#4f8cf0b](4f8cf0b). Additionally, it clarifies the Semantics arising from concurrent MPUs.

Signed-off-by: Christian Hagemeier <[email protected]>
  • Loading branch information
c-hagem committed Dec 12, 2024
1 parent eecf301 commit b3d7045
Show file tree
Hide file tree
Showing 3 changed files with 41 additions and 0 deletions.
12 changes: 12 additions & 0 deletions doc/SEMANTICS.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,14 @@ New files that are being written are not available for reading until the writing
If you have multiple Mountpoint mounts for the same bucket, on the same or different hosts, there is no coordination between writes to the same object.
Your application should not write to the same object from multiple instances at the same time.

For concurrently writing to the same object from multiple instances, we do not place any guarantees on which version of the object will be written to S3.
Thus, if two clients, at least one of them using Mountpoint, write to the same file and there is some overlap, Mountpoint will never read partial or corrupt data -- however there are no guarantees, which version ultimately is written to S3.
This is mainly determined by the S3 semantics and the specific kinds of requests the clients use to write to S3.
If you use multiple instances for writing the same key (or another client modifies the Object while the file is open for writing), be aware that Mountpoint currently uses Multipart Uploads (MPU) that are created after the file is opened to upload the data to S3.
This implies, that the provisions in the [S3 Documentation](https://docs.aws.amazon.com/AmazonS3/latest/userguide/mpuoverview.html#distributedmpupload) regarding multiple concurrent MPUs apply.
For example, if (in a versioning-enabled bucket) you open a file for writing on the first MP instance at 11 AM, and if you open the same file on another instance at 12 AM, and finish writing on the first instance after the second,
the version of the second instance will be committed to S3, and the 11 AM version will not be accessible from any of the Mountpoint instances afterwards.

By default, Mountpoint ensures that new file uploads to a single key are atomic. As soon as an upload completes, other clients are able to see the new key and the entire content of the object. If the `--incremental-upload` flag is set, however, Mountpoint may issue multiple separate uploads during file writes to append data to the object. After each upload, the appended object in your S3 bucket will be visible to other clients.

### Optional metadata and object content caching
Expand Down Expand Up @@ -150,6 +158,10 @@ S3 places fewer restrictions on [valid object keys](https://docs.aws.amazon.com/
* `blue/image.jpg`

then mounting your bucket would give a file system with a `blue` directory, containing the file `image.jpg`. The `blue` object will not be accessible. Deleting the key `blue/image.jpg` will remove the `blue` directory, and cause the `blue` file to become visible.
Additionally, remote content shadows local content. Thus Mountpoint shadows in the order: remote directories > any local state > remote files.
Thus, for example, if a user creates a local directory, and then a conflicting
object appears in the remote bucket, the directory will still be
visible instead of the new file.

We test Mountpoint against these restrictions using a [reference model](https://github.com/awslabs/mountpoint-s3/blob/main/mountpoint-s3/tests/reftests/reference.rs) that programmatically encodes the expected mapping between S3 objects and file system structure.

Expand Down
23 changes: 23 additions & 0 deletions mountpoint-s3/tests/reftests/harness.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1290,4 +1290,27 @@ mod mutations {
0,
)
}

/*
Ensure that local files are shadowed by the remote directories.
*/
#[test]
fn regression_put_nested_over_open_file() {
run_test(
TreeNode::Directory(BTreeMap::from([])),
vec![
Op::CreateFile(
ValidName("a".into()),
DirectoryIndex(0),
FileContent(0, FileSize::Small(0)),
),
Op::PutObject(
DirectoryIndex(0),
Name("a/b".into()),
FileContent(0, FileSize::Small(0)),
),
],
0,
)
}
}
6 changes: 6 additions & 0 deletions mountpoint-s3/tests/reftests/reference.rs
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,12 @@ impl MaterializedReference {
break;
}

Check warning on line 112 in mountpoint-s3/tests/reftests/reference.rs

View workflow job for this annotation

GitHub Actions / Formatting

Diff in /home/runner/work/mountpoint-s3/mountpoint-s3/mountpoint-s3/tests/reftests/reference.rs
}
// If a directory of this name exists, ignore any local file
if let Some(node) = children.get(dir) {
if node.node_type() == NodeType::Directory {
return true;
}
}
let new_node = match typ {
NodeType::Directory => Node::Directory {
children: BTreeMap::new(),
Expand Down

0 comments on commit b3d7045

Please sign in to comment.