Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for dynamic linking #6927

Open
sandtreader opened this issue Oct 22, 2024 · 12 comments
Open

Support for dynamic linking #6927

sandtreader opened this issue Oct 22, 2024 · 12 comments
Labels
A-tokio Area: The main tokio crate C-bug Category: This is a bug. M-runtime Module: tokio/runtime

Comments

@sandtreader
Copy link

sandtreader commented Oct 22, 2024

Version

└── tokio v1.41.0
    └── tokio-macros v2.4.0 (proc-macro)

Platform

Debian 12
Linux j 6.1.0-17-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.69-1 (2023-12-30) x86_64 GNU/Linux

Description
Put simply, using Tokio multi-threaded runtime with dynamic libraries doesn't work, even if you explicitly pass it through to the library and call it directly. Single-threaded works fine.

I've created a test environment for this: https://github.com/sandtreader/rust-tokio-dylib with annotated code and a full explanation.

My guess is that the multi-threaded runtime is using some static data for its thread-pooling which is not encompassed by the runtime, and the dynamic library has a different copy of it. I've tried to remove the issue of the thread-local variables by passing the runtime directly to the library and calling it directly rather than implicitly with tokio::spawn().

I know this has been raised before (both here: #1964, and in async-ffi: oxalica/async-ffi#14) but I think it's important - a lot of larger systems use dynamic plugin mechanisms for flexibility and shorter build times, and it's a shame that Tokio (at least in multi-threaded form) can't be used in these. I'm certainly prepared to put effort into resolving it, although I know little about Tokio internals at the moment!

Just one more thing - yes I know the whole area of dynamic libraries in Rust has major issues around the unstable ABI. In my particular use case this isn't a major problem because it's mostly there for compositional flexibility and build times within a single workspace, but there are also possible solutions like abi_stable or reducing the interface to C-only. This static data problem (if that's what it is) is an orthgonal issue.

Many thanks!

Paul

@sandtreader sandtreader added A-tokio Area: The main tokio crate C-bug Category: This is a bug. labels Oct 22, 2024
@Darksonn Darksonn added the M-runtime Module: tokio/runtime label Oct 22, 2024
@Darksonn Darksonn changed the title Multi-threaded runtime fails when passed to dynamic library Support for dynamic linking Oct 22, 2024
@Darksonn
Copy link
Contributor

The proposal #6780 is a possible way forward on this, but it's going to take a long time. Until then, this will be unsupported on our end. In the meantime, you may be able to make your own forked Tokio that works via rubicon.

@sandtreader
Copy link
Author

sandtreader commented Oct 22, 2024

Hi Alice,

Thanks for your quick response! That RFC (#6780) is interesting for dynamic library deployment in general but I'm not sure it's relevant to this problem, unless I'm missing something!

Just to reiterate, my test:

  • Exists all within one workspace (same compiler, same Tokio version)
  • Uses FFI * only for a single 'init' method
  • Works with the single-threaded runtime

As I mentioned, this is a specific problem with (I think) some kind of static data in the multi-threaded runtime, not one with dynamic linking in general.

Thanks again!

Paul

(* actually it's not really Foreign at all, since it's calling a dylib, not a cdylib)

@Darksonn
Copy link
Contributor

I understand that the RFC solves a broader problem than your problem. But it would solve your problem.

As for the single-threaded runtime, I'm pretty sure that you're just getting lucky. Both runtimes use a thread-local variable for the context, and the thread-local does not work if you use dylibs in a way that cause the application to contain several copies of Tokio.

@sandtreader
Copy link
Author

OK, thanks - I'll have to hold out hope that that gets adopted then! :-)

I had just found this (runtime/context.rs:77) which seems the key to it, as you say:

tokio_thread_local! {
    static CONTEXT: Context = const {
        Context {
            #[cfg(feature = "rt")]
            thread_id: Cell::new(None),

            // Tracks the current runtime handle to use when spawning,
            // accessing drivers, etc...
            #[cfg(feature = "rt")]
            current: current::HandleCell::new(),

What I don't understand, though, is why when I'm calling the runtime.spawn() directly, or use enter() - which also fails - this isn't set in the dylib's version of the thread_local? But I'll take your word for it!

This was interesting, though: "if you use dylibs in a way that cause the application to contain several copies of Tokio". Does that mean there's a way using using dylibs where Tokio is shared?

@Darksonn
Copy link
Contributor

The root cause is that you end up with several copies of the thread-local, and only one of them gets updated. I don't know if it's possible to compile the dylibs without getting multiple copies of Tokio.

@oestradiol
Copy link

oestradiol commented Nov 6, 2024

The root cause is that you end up with several copies of the thread-local, and only one of them gets updated. I don't know if it's possible to compile the dylibs without getting multiple copies of Tokio.

Hello @Darksonn and @sandtreader. Good morning/afternoon/evening. I came across the same problem (exactly the same actually, making a plugin management system) and while looking into it, I stumbled upon this issue. Just wanted to say that yes, this is possible, and it's actually a very simple solution, that shouldn't cause any breaking changes (I think). Bear with me for a bit, I'll mention this again at the end.

In rustc, as you likely already know, there are multiple crate types for linking, among them are two particularly relevant for Rust, rlib and dylib. Differently from cdylib, they both contain Rust metadata which allows them to be compatible with each other.

According to the reference:

--crate-type=dylib, #![crate_type = "dylib"] - A dynamic Rust library will be produced. This is different from the lib output type in that this forces dynamic library generation. The resulting dynamic library can be used as a dependency for other libraries and/or executables.
--crate-type=rlib, #![crate_type = "rlib"] - A “Rust library” file will be produced. This is used as an intermediate artifact and can be thought of as a “static Rust library”. These rlib files, unlike staticlib files, are interpreted by the compiler in future linkage.
(...)
Note that these outputs are stackable in the sense that if multiple are specified, then the compiler will produce each form of output without having to recompile. (...) If an executable is being produced and the -C prefer-dynamic flag is not specified, then dependencies are first attempted to be found in the rlib format.

For more details, if you haven't already, I really recommend taking a minute to read more in-depth how the compiler works this out. It's actually pretty smart about it and very cool!
That being said, the last statement means, from my understanding, that as long as the crate supports it in its Cargo.toml, an user can opt-in to using dylib instead of the default rlib, by setting the RUSTFLAGS="-C prefer-dynamic". I made a local fork of tokio and tested it out, and just like specified there, not specifying it compiled normally as rlib and caused the issue. When specifying, however, the compiler obeyed, and used dylib, unifying it amongst all the various crates that depended on it.
Adding this to the main tokio crate's Cargo.toml was all it took:

[lib]
crate-type = ["rlib", "dylib"]

You can then proceed by normally linking it in your crate as a dependency, just like any other crate, while setting the appropriate flag for compilation, as mentioned above.

Now then, for the possible drawback I hinted at when I said "I think". From what I read in the specs and tested locally, it really doesn't seem like rustc will ever prefer dylib by itself, and even if it does, it seems to be very strict about it so there shouldn't be any issues (even more because they're supposed to be compatible with each other). My only concern is, projects that, for some reason or another, already use the prefer-dynamic compilation flag, might suffer from an unexpected edge-case while updating tokio. This really seems like an edge case possibility though, so I don't know if it's worth overthinking it.

I'd like to hear what you think, and thank you @Darksonn for maintaining this, Tokio is amazing!
And I hope this helps you @sandtreader with your problem, at least temporarily.

@sandtreader
Copy link
Author

Wow, that's really interesting, thanks @oestradiol! That does sound like a get-out-of-jail-free card (although it leaves the issue of ABI, that's a different can of worms). It's a shame we have to fork tokio, though - that's not something I really want to impose on my plugin developers. But given that cargo build builds tokio anyway, I wonder if it can be forced as part of that?

Alternatively, @Darksonn what are the chances of making a dylib build standard? Pretty please :-)

@sandtreader
Copy link
Author

sandtreader commented Nov 6, 2024

Well, with a little prompting, ChatGPT suggests a solution using a build.rs:

https://chatgpt.com/share/672baa62-b284-800b-91d0-2f48aff57336

Basically do a local fork, either from Github or from cargo's source cache, and then hack it. I will play with this when I have time!

@oestradiol
Copy link

oestradiol commented Nov 7, 2024

@sandtreader

although it leaves the issue of ABI, that's a different can of worms

I don't think this is a big issue tbh, if the ABI changes (it's unstable after all), all you'll need to do is recompile. And once you recompile, all the dependencies get recompiled too which should fix whatever issues you would have otherwise.
Unless you're interfacing with C or other langs, in which case you'd be using cdylib instead (not what we're talking about here), you shouldn't have any issues using dynamically linked libs in pure Rust.

It's a shame we have to fork tokio, though - that's not something I really want to impose on my plugin developers. But given that cargo build builds tokio anyway, I wonder if it can be forced as part of that?

you don't need to force others to fork tokio, you could just fork it yourself on your Github and add it as a dependency on your project's plugin API, and re-export it. Or if you're not working like that, worse case scenario they could link it to your Git instead of crates.io. It should work fine!

Alternatively, @Darksonn what are the chances of making a dylib build standard? Pretty please :-)

I also want this. Would be great if possible!

@Darksonn
Copy link
Contributor

Darksonn commented Nov 8, 2024

To clarify, the suggestion is to add this to our Cargo.toml?

[lib]
crate-type = ["rlib", "dylib"]

I will have to investigate what the implications of that are before I can say anything.

@maurer
Copy link

maurer commented Nov 8, 2024

While the thread local variable is the specific thing that you're seeing broken, what's happening here is a broader case of pointer incompatibility. What you've got today is a linkage that looks like:

bin -static-> tokio
bin -dynload-> lib
lib -static-> other_tokio

You then call fn init(Arc<other_tokio::Runtime>) as though it were fn init(Arc<tokio::Runtime>), leading to the issues you hit. This will always happen with that linkage pattern, because even if built identically, tokio in those two static linkage positions will always be "different".

Switching to a dynamically linked tokio reduces the chances of this, because you get a pattern like this:

bin -dyn-> tokio_x
bin -dynload-> lib
lib -dyn-> tokio_y

If tokio_x and tokio_y are actually the same (e.g. have the same -C metadata), you end up loading one copy and you're fine. In your toy example, this will happen.

Examples of things that can change the identity of these libraries include:

  1. Different versions, even if they're semver compatible.
  2. Different feature flags
  3. Different compiler flags

If you're doing it with a hot-reloading system, this is going to cause spooky bugs. For example:

  1. If you cargo upgrade cargo build your plugin, it will load, but may have incorrect behavior in any number of ways
  2. If you have different optimization levels on your cargo invocation for building the base binary vs the plugin, it may have incorrect behavior.
  3. If you use fuzzing flags like -Zrandomize-layout, and you build the base binary and the plugin with different cargo invocations, your loaded library will again be subtly wrong.

Adding ["rlib", "dylib"] to tokio should be generally fine, but what I'm trying to point out is that if you are using cargo to build, a plugin system that relies on the base binary and the plugin depending on the exact same third shared object is going to be brittle. If you're not using cargo to build, you don't need the crate type modification, so I'm kind of assuming you are.

Finally - because init is not an extern "C" function, you are getting lucky that it's even in the generated .so itself. It is not guaranteed that any given Rust function in the crate actually gets codegenned - some of them may remain as MIR living in the rmeta and not actually have a codegenned copy available. This isn't a tokio issue, but just figured you should know before you rely on dlopening dylibs.

@oestradiol
Copy link

@maurer Yeah, I was indeed kind of assuming without noticing that all of those conditions would be fulfilled.

I see what you're talking about, and honestly, that's a big issue in this plugin mechanism that I'm not very sure how to solve hahah
After all, if you can't rely on shared libs, it kind of defeats the purpose. Tokio was an example, but one could also want to share things like tracing and other functionalities...

Thanks for warning me, I really appreciate it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tokio Area: The main tokio crate C-bug Category: This is a bug. M-runtime Module: tokio/runtime
Projects
None yet
Development

No branches or pull requests

4 participants