Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify the spec of fd_readdir #3

Closed
marmistrz opened this issue Oct 8, 2019 · 7 comments
Closed

Clarify the spec of fd_readdir #3

marmistrz opened this issue Oct 8, 2019 · 7 comments

Comments

@marmistrz
Copy link
Contributor

The current specification of fd_readdir is too vague. While working on the new implementation of fd_readdir for wasmtime I have noticed several issues. All of this refers to the spec of fd_readdir as found in https://github.com/CraneStation/wasmtime/blob/master/docs/WASI-api.md#fd_readdir


On Unix, readdir(3p) explicitly states that entries for . and .. will be returned if they exist, (and I haven't ever encountered a case where they are not returned) but on Windows a similar syscall, FindNextFileW, will not return such entries.
The WASI spec is not clear if the implementation should emulate the Unix-like behavior and return . & .. even if the target platform's syscall doesn't explicitly return them.


The validity period of cookies is not covered. For instance, seekdir(3p) explicitly states that:

If the value of loc was not obtained from an earlier call to telldir(), or if a call to rewinddir() occurred between the call to telldir() and the call to seekdir(), the results of subsequent calls to readdir() are unspecified.

Moreover, readdir(3p) explicitly states that:

If a file is removed from or added to the directory after the most recent call to opendir() or rewinddir(), whether a subsequent call to readdir() returns an entry for that file is unspecified.

The WASI spec doesn't say which operations (if any) invalidate the cookies, so it's natural to assume that the cookies will be valid indefinitely and that fd_readdir should reflect the current state of the filesystem, which is very difficult to implement efficiently (if possible at all).

In particular, call to fd_readdir with cookie equal to __WASI_DIRCOOKIE_START will invalidate any other cookies returned by previous calls to fd_readdir with the natural implementation using readdir.


Finally, it would be worth noting that this WASI syscall is similar to readdir/seekdir/telldir on Unix and that cookies should be treated as opaque values as readdir(3) says. I'd also be a little more elaborate on the semantics and what kind of location does the cookie point to, because the doc was completely unclear to me at the beginning. It may also be a good idea to stress that the entry names are not null terminated, unlike in POSIX readdir.

cc @kubkon @sunfishcode

@mdempsky
Copy link

When successful, the contents of the output buffer consist of a sequence of directory entries. Each directory entry consists of a __wasi_dirent_t object, followed by __wasi_dirent_t::d_namlen bytes holding the name of the directory entry.

If a sequence of two directory entries are written into the buffer, how do you compute the second one's position? Are they densely packed without care for machine alignment? Or are they padded somehow to keep the wasi_dirent_t fields aligned?

The getdents system calls on Linux and BSD include d_reclen so the reader knows where to find the next dirent record.

@MarkMcCaskey
Copy link

@mdempsky They appear to be tightly packed: laid out as wasi_dirent_t followed by wasi_dirent_t::namelen bytes of the name, then the next dirent.

@mdempsky
Copy link

@MarkMcCaskey Thanks.

Does wasm allow unaligned memory accesses? I just tried looking at the wasm spec, and I see mention of alignment in memarg, but I couldn't immediately find any explanation of what happens if the address doesn't have the expected alignment. Does it affect correctness or is it just a performance hint?

If unaligned memory accesses are guaranteed by wasm, then densely packing seems like a reasonable choice. That should probably be explicitly stated though.

@bjorn3
Copy link

bjorn3 commented Nov 15, 2019

If unaligned memory accesses are guaranteed by wasm

Yes, but they may be slower depending on the target arch. For exmple arm requires emulation for that.

@sunfishcode sunfishcode transferred this issue from WebAssembly/WASI May 20, 2020
@dbaeumer
Copy link

dbaeumer commented Jun 9, 2022

The spec states that for the result

The number of bytes stored in the read buffer. If less than the size of the read buffer, the end of the directory has been reached.

However, I noticed in my implementation that I get another call even if bytes stored in less than the size if the read buffer. Returning 0 as the bytes stored correctly signals the end of the directory.

If this is the desired behavior the spec could be simplified in a way that we don't have to fill in a truncated directory entry name.

@sunfishcode
Copy link
Member

The readdir definition in the current wit files doesn't yet clarify or address thie issue, however it is still using the push-buffer mechanism which newer versions of wit don't support. I'm hoping to replace it with something that returns a stream of dir entries, which will eliminate the problem of truncated entries.

@sunfishcode
Copy link
Member

The readdir entry API is now updated to use a stream type: #72

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants