Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Type safe path file descriptors? #478

Open
SUPERCILEX opened this issue Dec 8, 2022 · 12 comments
Open

Type safe path file descriptors? #478

SUPERCILEX opened this issue Dec 8, 2022 · 12 comments

Comments

@SUPERCILEX
Copy link
Contributor

See this discussion over in nix: nix-rust/nix#1863 (comment). I believe rustix can get the same treatment and it should be backwards compatible which is awesome. I'd like to push this through, but just wanted to double check that it'll be accepted before starting.

@SUPERCILEX SUPERCILEX changed the title Type safe path file descriptors Type safe path file descriptors? Dec 8, 2022
@sunfishcode
Copy link
Member

I appreciate the cleverness of this, but my first impression here is it's too much complexity for too little gain. Accidentally passing cwd(), aka AT_FDCWD, into write isn't unsafe in any way, and it doesn't feel like a common problem. cwd() returns a first-class BorrowedFd, so a Cwd type wouldn't fully replace cwd(), so we'd end up with two different ways to use AT_FDCWD and subtle documentation explaining the difference, in addition to a new trait and subtle documentation explaining when to use it vs. AsFd.

@sunfishcode
Copy link
Member

Also, the name IntoPathFd sounds like IntoRawFd but one one does an ownership transfer and the other doesn't.

@SUPERCILEX
Copy link
Contributor Author

Also, the name IntoPathFd sounds like IntoRawFd but one one does an ownership transfer and the other doesn't.

Good point, renamed to AsPathFd.

Accidentally passing cwd(), aka AT_FDCWD, into write isn't unsafe in any way

While true, I think Alan's point still stands about it being better to capture this at compile time rather than runtime.

it doesn't feel like a common problem

That's fair.

so a Cwd type wouldn't fully replace cwd()

I was going to deprecate cwd()... where would you still need the BorrowedFd? If it's to pass stuff around, I think you could use impl AsPathFd. Your API already had to be generic over a lifetime, so switching that to be generic over a type shouldn't be a big deal.


In general, using BorrowedFd feels kind of icky since it's not a real fd and stuff like try_clone will fail. Also, how did you deal with haiku? libc::AT_FDCWD for haiku OS seemed to return -1 which isn't allowed by BorrowedFd.

@sunfishcode
Copy link
Member

Accidentally passing cwd(), aka AT_FDCWD, into write isn't unsafe in any way

While true, I think Alan's point still stands about it being better to capture this at compile time rather than runtime.

It does make some things better. But also, we already have a surprising number of types and traits that Rust programmers must know about to work with file descriptors at this abstraction level, and this would add one more. And then there's cwd(); see below. It's not obvious that that's worth fixing a problem that we have no reports of ever having happened in the wild (and if it ever did, it'd reliably, cleanly, and immediately fail).

so a Cwd type wouldn't fully replace cwd()

I was going to deprecate cwd()... where would you still need the BorrowedFd? If it's to pass stuff around, I think you could use impl AsPathFd. Your API already had to be generic over a lifetime, so switching that to be generic over a type shouldn't be a big deal.

I can't store an impl AsPathFd in a struct field or a local variable; I can with BorrowedFd and cwd(). It's not a super common thing to do, but it'd be an arbitrary limitation if we couldn't do it. And impl AsPathFd implies non-trivial monomorphization which can negatively impact code size.

In general, using BorrowedFd feels kind of icky since it's not a real fd and stuff like try_clone will fail. Also, how did you deal with haiku? libc::AT_FDCWD for haiku OS seemed to return -1 which isn't allowed by BorrowedFd.

In BorrowedFd it's try_clone_to_owned, and if we wanted to make it work on AT_FDCWD, we could have it special-case AT_FDCWD and do open("."). I haven't heard anyone ask for that, but it's a thing we could do.

The restriction that BorrowedFd can't be -1 is only on Unix and WASI where AT_FDCWD can be guaranteed to be never -1. If someone ported BorrowedFd to Haiku, they could omit that restriction.


Another perspective on this is that the vast majority of Rust code in the world that uses filesystem APIs uses std or things build on top of std, and that's how it should be. Std doesn't yet have openat functionality, but it plausibly could some day, and a plausible design would be to have a Dir type, similar to a File, but for directory handles. A type like that could protect users against accidentally misusing not just AT_FDCWD, but directories as a category. And on top of that, it'd prevent users from passing all manner of things that aren't directories to openat by accident too.

rustix/etc. still have their uses, but they tend to be much more focused on "I want to drop down an abstraction level because std can't do what I need". Safety remains essential, but preventing hypothetical non-safety bugs at compile time is nice but less important. At the same time, allowing unusual-but-technically-valid code patterns is more important.

Because who knows; maybe someone will find a situation where they do want to pass cwd() into write. Maybe they're writing tests for their new OS and want to make sure the OS gracefully handles that situation. Or maybe they're writing code that depends on Unix's dynamic typing nature, like "write these bytes if it's a file, otherwise give me EISDIR so I can do something else". Or maybe they have threads doing I/O and the main thread can silence a thread's I/O by replacing its file descriptors with AT_FDCWD to make all subsequent I/O fail. Or maybe some clever new trick that I can't predict. I don't know. Rust programs that do things that I didn't predict are some of the most interesting programs out there. Consequently, I don't want to tell people that they can't do unusual things at this abstraction level, unless I have good reasons for it.

@SUPERCILEX
Copy link
Contributor Author

we already have a surprising number of types and traits that Rust programmers must know about to work with file descriptors at this abstraction level, and this would add one more.

True, but I'm not in favor of dumbing things down for the masses. And the same argument could apply to the existing traits, though I think your point is that the value-add of AsPathFd is significantly less than OwnedFd which is totally fair.

It's not obvious that that's worth fixing a problem that we have no reports of ever having happened in the wild (and if it ever did, it'd reliably, cleanly, and immediately fail).

Again, I agree with this, but I think it's downplaying the annoyance of having to wait for a test cycle to know that the call "immediately" failed (it could be deep in the weeds and take a while for people to even notice). I also feel like this argument veers dangerously close to the "programmers know what they're doing, duh" argument that says we should just be using C. We have a phenomenal type system at our disposal, let's use it!

I can't store an impl AsPathFd in a struct field or a local variable

False:

struct OldTest<'a> {
    fd: BorrowedFd<'a>
}

struct NewTest<Fd: AsPathFd> {
    fd: Fd,
}

fn foo() {
    // let fd: impl AsPathFd = return_trait(); // Soon hopefully: https://rust-lang.github.io/impl-trait-initiative/explainer/lbit.html
    let fd = return_trait();
    let t = NewTest { fd };
    // Do stuff
}

fn return_trait() -> impl AsPathFd {
    Cwd
}

And impl AsPathFd implies non-trivial monomorphization which can negatively impact code size.

True, but also true of all other generics over traits, so not sure why this case deserves a special callout.

In BorrowedFd it's try_clone_to_owned, and if we wanted to make it work on AT_FDCWD, we could have it special-case AT_FDCWD and do open("."). I haven't heard anyone ask for that, but it's a thing we could do.

To be clear, I wasn't saying that this case should work, but rather that it's odd for that method to exist and be available given the fact that it is known to always fail. That's what I meant by feeling icky. In general, we're letting people use a type that can be used in a bunch of places where we know statically that the operation will always fail (which is odd/a poor use of the type system). Hence the desire for a new type that doesn't muddle these concepts.

If someone ported BorrowedFd to Haiku, they could omit that restriction.

Gotya, seems reasonable.


The idea of a Dir type to model the split between files and dirs is quite compelling actually. I wish we could attain that level of beauty, but I don't think it'll ever be possible since you can't know the type of a file statically.

At the same time, allowing unusual-but-technically-valid code patterns is more important.

I generally agree with your sentiment and the use cases you presented, but I think this goes back to muddling what a BorrowedFd is for. Using a plain BorrowedFd for invalid file descriptors in your tests is just plain wrong/lazy. A good test suite would create a TestFd(RawFd) type that implements AsFd and uses unsafe to create the BorrowedFd. And then you would create a bunch of different test fds (TestFd(libc::AT_FDCWD), TestFd(9999999 /* Doesn't exist yet */), TestFd(-42069), ...). Relying on the value of cwd() to have AT_FDCWD should be treated as an implementation detail for those kinds of tests IMO.

I don't want to tell people that they can't do unusual things at this abstraction level, unless I have good reasons for it.

Since this is a repeated theme, I just want to be clear that the AsPathFd proposal doesn't impose any restrictions as far as I can tell since people can always get their hands dirty and implement that trait without unsafe or implement AsFd with unsafe to create weird file descriptors.

@sunfishcode
Copy link
Member

I'm going to take a little more time to think about this, but I did want to respond to one thing quickly:

Also, the name IntoPathFd sounds like IntoRawFd but one one does an ownership transfer and the other doesn't.

Good point, renamed to AsPathFd.

Thanks. I also suggest changing the "Path" to "Dir", because "Path" in Rust sounds like a string, and a "path file descriptor" in Linux suggests something opened with O_PATH. Calling it AsDirFd feels better, at least.

@sunfishcode
Copy link
Member

Still not a full reply, just a table I made while thinking about this:

operation directory fd AT_FDCWD
"at" argument to openat etc. yes yes
implicitly inherited by child processes yes yes
close/CLOEXEC yes no
fstat yes no, but you can get the effect with stat(".")
dup yes no, but you can get pretty close with open(".", O_RDONLY)
dup2 second argument yes no, but you can get the effect with fchdir
fopendir yes no, but you can get pretty close with opendir(".")
send over unix-domain socket yes no, though you can get pretty close with open(".", O_RDONLY) and sending that`
get the path not in a portable way, and you get a lecture from a Unix guru on why you shouldn't do this no, but you can get the effect with getcwd, because whatever

@SUPERCILEX
Copy link
Contributor Author

I'm going to take a little more time to think about this

Sounds good! I also think I was too quick to dismiss the Dir idea: if O_DIRECTORY plays nice, we could potentially get to a place where you can guarantee statically that an FD is a directory by calling special functions. That is, calling something like open_dir guarantees that it will return a DirFd, but calling open could return an fd that's either a file or a directory. If true, that's really interesting—I'll play with it over the next few days and see where it goes.

I also suggest changing the "Path" to "Dir"

Great point again, fixed. :)

@SUPERCILEX
Copy link
Contributor Author

cc @asomers

I had a bit of time to look at this today. The posix standard specifies O_DIRECTORY with the following behavior:
"If path resolves to a non-directory file, fail and set errno to [ENOTDIR]." This means we can create an open_dir function which guarantees that the returned FD will be a directory, say as an OwnedDirFd. Thus, functions can know statically that the passed in FD is a directory.

It could look something like this:

trait AsDirFd {
    fn as_dir_fd(&self) -> RawFd;
}

pub struct Cwd;
impl AsDirFd for Cwd

struct OwnedDirFd(OwnedFd);
impl OwnedDirFd {
    fn into_inner(self) -> OwnedFd
}
impl AsDirFd for OwnedDirFd
impl AsDirFd for &OwnedDirFd

fn open(_fd: impl AsDirFd, path, ...) -> Result<OwnedFd> {}
fn open_dir(_fd: impl AsDirFd, path, ...) -> Result<OwnedDirFd> {}
fn write(_fd: impl AsFd, ...) {}

fn foo() {
    let dir = open_dir(Cwd);

    open(dir);
    open(&dir);
    open(Cwd);
    open(borrowed_fd OR owned_fd); // Fails to compile
    write(dir); // Fails to compile
    write(&dir); // Fails to compile
    write(Cwd); // Fails to compile
}

If we do go this route, the key question is how do we handle someone showing up with a BorrowedFd/OwnedFd and wanting to use it for AsDirFd. To blocks dirs in write/read/etc and non-dirs in open, we need AsDirFd and AsFd to be unrelated, but that makes interoperability annoying (I'm not sure to what extent people actually care about this interoperability though). We could allow AsFd to something AsDirFd conversion though unsafe, but that seems extreme given that this isn't a safety issue. Then again, using an unopened FD isn't unsafe either and yet we require unsafe for those cases, so perhaps it's justified? Or we could simply allow the conversion, but make it manual so you still get type safety as long as there are no explicit conversions.

@SUPERCILEX
Copy link
Contributor Author

Did a bit more research and it does seem like we'd need something like BorrowedDirFd because I've seen code with the pattern of open -> loop { recurse with opened fd }. That's both a curse and a blessing. On the one hand, it really does make all the different types we have available quite complicated. On the other hand, it simplifies the proposal: we can just say all the DirFd types are exactly the same as their stdlib counterparts (resolving the unsafe question) which kind of means you only need to learn the stdlib types and you'll understand the DirFd ones.

For rustix, I think this proposal is a bit much since it's a breaking change and we don't know how it'll land with people.

For nix, we're already breaking everything so this should just be part of the migration process. I'd propose implementing this in nix and then bringing it over to rustix as a breaking change if we like it.

@sunfishcode
Copy link
Member

My instinct here is still that "Dir" and file-vs-dir type safety makes sense to introduce at the abstraction level of std, which is the abstraction level most Rust code is written at. At the abstraction level of rustix, it feels like it will introduce a lot of complexity, more than it's worth, and risk making "interesting" use cases that we can't anticipate more complex or even less safe.

That said, it sounds like you're really motivated to do this experiment, and I don't wish to stand in the way :-). I'll be interested to hear how it goes.

@SUPERCILEX
Copy link
Contributor Author

That said, it sounds like you're really motivated to do this experiment, and I don't wish to stand in the way :-). I'll be interested to hear how it goes.

Haha, yeah. :) Hopefully I'll be able to report back in a month or two once things pick back up after the holidays.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants