-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Try to generate reusable jlcall wrapper #11439
base: master
Are you sure you want to change the base?
Conversation
Update1 (out-of-date):
Original message (with some useless stuff (backtrace, old update) deleted):
Following the discussion yesterday #11306 and a 3-year-old TODO from @JeffBezanson, this is my attempt toward reducing the overhead of generating jlcall wrappers.
IMHO, the reusable jlcall wrapper can serve as a tag and it can even work for arbitrary number of arguments. In it's current state, I'm just trying to make the wrapper function pull the specialized function from it's currently unused first argument (literally implementing the todo) so each function still gets it's own wrapper and there's no saving yet. This seems to work for sysimg generation. However, when I try to run the final binary, it fails because the new What I want to ask specifically are (although other comments are welcome too),
|
|
6659c5f
to
f2bbe99
Compare
Not sure about the CI (i.e. older llvm version) yet but it passes the test for me locally with llvm What I don't really understand is why it is fine to leave |
80f3bbe
to
fd39aa5
Compare
no, it isn't guaranteed. on llvm3.6, i think it happens almost always however. that part of the code definitely needs some cleanup since it is now an odd mix of a large number of llvm versions.
it takes a raw function pointer and creates an llvm::Function object around it for linking into the JIT |
I guess the version on the CI is also not so unhappy about it either.... In this case, what is the earliest point I can get that function pointer? I thought the function should be in the memory after Or is it better to define a getter function to extract the function pointer from lambda_info and generate it if necessary? This would introduce a overhead in each jlcall wrapper although maybe the fast path will be fast enough.
OT: Is there a upgrade guide for llvm (or any other document on the difference between versions)?
What I'm confused about is that why is it not necessary for |
c6df553
to
93776b0
Compare
PR text updated. (see above) |
Another impact of this is that the function object passed is has to be kept alive now. |
@JeffBezanson will correct me if I'm wrong but since the argument is unused in almost all cases, and jl_apply_generic is a very hot function, I think it makes sense to break the rooting convention here and root inside the callee at the very begining if you do actually need this function. For example I think it's safe to move the jl_gc_pop of jl_apply_generic before the return (I introduced this one I think...) to hopefully get a tailcall here. However, this is only performance bikesched without benchmark so just root it for testing and we'll figure out something later ;) |
@carnaval Actually you remind me that I only need root for the slow path (which never triggers). (The fast path only has pointer dereference) |
7ee97ff
to
6283739
Compare
Generate generic wrappers
Should be "feature complete" now. Few questions remain.
|
fwiw, we may still want something like this general-purpose code, but the limited case (jlcall with 0,1,2 jl_value_t args) can be implemented now by extending the |
"this should not be on the milestone" - @yuyichao |
Update2:
The goal of this PR is to lower the overhead of creating jlcall wrappers by generating a generic version of the wrapper that looks up the function pointer in the function object passed in and cache them in a global table. The wrapper function can be later used on other functions with the same signature.
This PR should be "feature complete" now. A few things that I'm not so sure about is listed here:
std::map
.jl_value_t
) is cached and never freed, it that always true? (and if not, how do I tell if it is cached and only cache the wrapper function in that case)As for performance.
sys.so
size goes from3.4M
to3.3M
.jl_lambda_info_t
instead of a new one) but I think I would like to get feedback on the overall design first.Original messages and other updates see first comment below.