-
Notifications
You must be signed in to change notification settings - Fork 392
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add tagged lightuserdata #1087
Add tagged lightuserdata #1087
Conversation
I'd also be interested in this being implemented. I'm serializing the state of Question though, should the tag affect I also don't know about keeping track of |
Good points! We also use lightuserdata for custom iterator state, so that would indeed be another use case for tags.
You're right, that is clearly an omission. I added tag checks to all the equality checks. However, I'm not sure if hashing the tags is necessary. I'm trying to think how often we would use lightuserdata with same value but different tag as a table key. Note that hash collisions only makes table operations slightly slower in the worst case (there should be no other observable effects). If these cases are rare, maybe it would not make sense to add overhead to lightuserdata hashing?
Tag names are used by the VM when printing or when converting lightuseradata to string with tostring. Normally we use metatables to configure typenames but since lightuserdata values are not objects, they share the same metatable. Alternatively we could replace lua_setlightuserdataname with more general lua_setlightuserdatametatable(tag, mt) API, especially if there's need to further customize lightuserdata subtypes? I'm also ok with dropping lightuserdata names in the this first version if this feels too messy. |
Makes sense to me, thanks! Looks like this should work correctly in JITed code as well, since that will still use
I don't feel too strongly. If it was a choice between names and metatables, I'd take names as you've implemented them here. |
Note that as implemented, this PR introduces the notion of "untagged" light user data in that not all light user data objects have tags (specifically, those produced by the VM code don't). I think we'll need to fix that, maybe we could then reserve a tag for internal iterator state. However this also caused us to look at some internal code that manages the iterator state and that code has some issues that we'd need to fix before adding tags. There's some perf cost for setting tags up so we'd need to be careful but it can probably be managed if we only initialize them when the loop is set up. The access code would need to be more careful. We shouldn't access I would recommend avoiding metatable customization at least for now; it can be useful, but it's also not free as it will impact every place where type -> metatable lookup is performed by adding more branches. Compared to that, names are basically free. Also cc @vegorov-rbx for further comments if any. |
…only sets the value of lightuserdata without setting type and tag
Thanks! Pushed new revision:
However, there's an issue which I don't understand. For some reason LOP_FORGPREP_INEXT and LOP_FORGPREP_NEXT in the VM and forgLoopNodeIter cause assertion failures if I change them to use setpvaluefast, so I use the slower setpvalue for them instead. Does this perhaps happen because iterator state is not set fully at the start of loops? I'd need some help here. Alternatively we could drop setpvaluefast or leave it for future work; I doubt it is a significant win in practice, since the same cache line is being modified anyway when setting the value. |
I would start with just using setpvalue. We will need to thoroughly analyze the performance impact here anyway, and it's easier to do this from a simpler baseline. As for "why", this is what I was referring to above wrt issues in codegen: codegen actually never sets up the lightuserdata properly in X64 because it fully inlines the loop prologue in the optimistic case, which only writes the lightuserdata value partially and doesn't write the tag at all. We are looking into a better way to structure this to avoid these issues in the future. |
Alright, done. Please let me know if there's anything else. |
I finally had time to benchmark the changes. I first ran the regular tests in bench/tests but any differences fell under the noise threshold. So next I ran a few selected micro tests which stress-test looping (test_LargeTableSum*.lua). I disabled garbage collection for the tests to further reduce noise. Conclusion: there doesn't seem to be any measurable speed difference, at least on my hardware (i7-9700K). @zeux, have you had time to think about this yet? I believe this change would make lightuserdata more useful generally. The problem is, at the moment an application can only use lightuserdata values safely for a single purpose, unless tags are encoded manually and directly into the 64-bit lightuserdata values which is problematic in many ways. (FWIW, we are very interested in this because we're heavily using lightuserdata and we'd like to have water tight type checks in our bindings.) Command line used: Note that I added Results:
|
For the record, I've been using this for the last week or so with no noticeable perf reduction, and it's heavily reduced the number of cases where I need to use GC'd |
I'll let @vegorov-rbx review this and make the call regarding the value + note if further code adjustments would be necessary; from my perspective this is a good change that increases usefulness and safety of lightuserdata without significant downsides. Native codegen will need to be adjusted in the future for correctness as it doesn't currently write tags and that would need new IR instructions, but that can probably be done later / separately. Thanks for running the micro benchmarks - based on these and on the future possible improvements (like elidining tag or even type tag writes in the iteration opcode) I am not concerned about performance impact here. |
Thanks! One thought that came up today: currently internal tags are >= LUA_LU_TAG_LIMIT. Would it be cleaner to reserve negative values for internal tags instead? That way LU_TAG_ITERATOR would be equal to -1 and we could remove |
Note that we are using a similar structure for regular user data tags, with reserved values for newproxy and inline dtor, so I would keep the numbering the same - maybe there’s an argument to rework both (later/separately) but I don’t think it matters a lot one way or the other. |
Ok check, let's keep the tags consistent with regular userdata. |
Hi @vegorov-rbx! Would you have time to review this pull request? |
Thank you for the review! Pushed the changes. |
Thank you! |
This change adds support for tagged lightuserdata and optional custom typenames for lightuserdata.
Background: Lightuserdata is an efficient representation for many kinds of unmanaged handles and resources in a game engine. However, currently the VM only supports one kind of lightuserdata, which makes it problematic in practice. For example, it's not possible to distinguish between different kinds of lightuserdata in Lua bindings, which can lead to unsafe practices and even crashes when a wrong kind of lightuserdata is passed to a binding function. Tagged lightuserdata work similarly to tagged userdata, i.e. they allow checking the tag quickly using lua_tolightuserdatatagged (or lua_lightuserdatatag).
The tag is stored in the 'extra' field of TValue so it will add no cost to the (untagged) lightuserdata type.
Alternatives would be to use full userdata values or use bitpacking to embed type information into lightuserdata on application level. Unfortunately these options are not that great in practice: full userdata have major performance implications and bitpacking fails in cases where full 64 bits are already used (e.g. pointers or 64-bit hashes).
Lightuserdata names are not strictly necessary but they are rather convenient when debugging Lua code. More precise error messages and tostring returning more specific typename are useful to have in practice (e.g. "resource" or "entity" instead of the more generic "userdata").
Impl note: I did not add support for renaming tags in lua_setlightuserdataname as I'm not sure if it's possible to free fixed strings. If it's simple enough, maybe we should allow renaming (although I can't think of a specific need for it)?