Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Moving transform data out of Mat4 #596

Merged
merged 8 commits into from
Oct 18, 2020
Merged

Conversation

MarekLg
Copy link
Contributor

@MarekLg MarekLg commented Sep 28, 2020

Concerning #229.

As pointed out @AThilenius, having mutable transform data stored in a matrix results in rotation overlapping with scale and therefore affecting each other over time (see also this article).

This PR aims to resolve only this issue.
Any feedback and propositions for changes are very welcome!

@@ -29,8 +29,8 @@ fn propagate_recursive(
transform_query.get::<Transform>(entity),
transform_query.get_mut::<GlobalTransform>(entity),
) {
*global_transform.value_mut() = parent * *transform.value();
*global_transform.value()
global_transform.value = *parent * transform.compute_matrix();
Copy link

@AThilenius AThilenius Sep 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This requires recomputing the local space transform for every decedent of a parent recursively, which is unnecessary. The original impl cached it in LocalToParent. This also can never work with on-change detection because you are forced to recompute every hierarchy in case a descendant changed. Meaning every run you must recompute local space transforms, and global space transforms for every singe entity in a hierarchy regardless of if they changed or not. Especially in development builds where static objects in a hierarchy haven't yet been backed into a GlobalTransform, that sounds like a lot performance overhead to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, I totally overlooked that. Thanks for pointing that out!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now the matrix is only recomputed when the transform changed. This still recomputes all the GlobalTransforms every frame, regardless if any changes occured, but this is how the system worked before and should IMO be for a different PR to fix.

@Moxinilian Moxinilian added the C-Feature A new feature, making something new possible label Sep 28, 2020
translation: Vec3::zero(),
rotation: Quat::identity(),
scale: Vec3::one(),
matrix_cache: Some(Mat4::identity()),
Copy link
Contributor

@bitshifter bitshifter Sep 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caching the matrix in this transform is going to result in a cache miss for any access of translation, rotation or scale. Rust automatically orders struct fields by alignment and Mat4 is 16 byte aligned due to SIMD so it will go first. It's a 64 byte struct so it will fill an entire cache line by itself. Additionally putting it inside an Option will add 15 bytes of padding to the beginning of the structure. An Option<Mat4> will start with a discriminant byte that tells Rust if the option is Some or None, this byte will be aligned to the same alignment of what the option holds, so 16, thus 15 bytes of padding.

It would be more efficient to keep these cached values in a separate structure. However if you want to keep it in the same structure you would be better off getting rid of the Option and just use a bool to flag if there is a cache value or not.

Also glam has a type which stores transforms as translation, rotation (as Quat) and scale so you could use that instead of writing it yourself. It does require enabling the transform-types feature of glam though.

Edit: see example of size differences of Option versus bool https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=2624654e172aac09955ffb796e73404a

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I don't think we should store the "matrix cache" here because the type is now "too big"

But I also want to point out that if we rip out the matrix into a separate component, we now have re-created the very "synchronization problem" that we were trying to solve in the first place with the transform rework. At that point, why not use individual components like @AThilenius's initial implementation used? Then we get the benefits of "better cache perf and easier access to individual parts of a transform".

I think its very hard to justify a Transform { pos/rot/scale } component w/ a matrix cache (either in another component or embedded inside Transform). Unless I'm missing something, we should either eat the "mat4 creation cost" at the call site whenever a full 4x4 matrix is required, or we should just move back to the old system (or something like it).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I like @termhn's idea to use Similarities and multiply those directly whenever possible. We only really need a 4x4 when writing data used by shaders, which only needs to happen once per rendered GlobalTransform.

#229 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that Transform { pos/rot/scale } (with no matrix cache) is a Similarity (in glam known as a TransformSRT). But I agree, the transformation of that to a matrix should only happen as @cart described

Copy link
Contributor Author

@MarekLg MarekLg Sep 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are some very good points.

Unless I'm missing something, we should either eat the "mat4 creation cost" at the call site whenever a full 4x4 matrix is required, or we should just move back to the old system (or something like it).

I think that is the main question we have to decide on. I'm personally against separate components, only for the fact that with a transform we can take more work away from the user (the reason being explained in the lower part of my comment).

Copy link
Member

@cart cart Oct 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. I think I would prefer a custom Transform type with the relevant methods/traits implemented instead of new-typing a math-lib type. It makes direct property access nicer and this logic isnt particularly complicated.

Lets wait a day or so for someone to chime in with counterpoints. Otherwise I'm inclined to move forward on

struct Transform {
  pub translation: Vec3,
  pub rotation: Quat,
  pub scale: Vec3,
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll revert the last commit so everyone checking in is on the same page.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. I think the biggest missing piece at this point is implementing Transform x Transform = Transform and using that when calculating GlobalTransform. To get the most out of this, I think we should also make GlobalTransform a similarity (ideally not a newtype to allow convenient direct field access). But that then creates the issue of "how do we convert GlobalTransform to matrix bytes when creating uniform buffers?". I think "data-driven uniforms" need a better story for "this type should be converted to this other byte-convertible type when used as a uniform". The current system assumes the type is directly byte-convertible.

In the short term I think something like this will work:

#[derive(RenderResources, RenderResource)]
#[render_resources(from_self)]
pub struct GlobalTransform {
    pub translation: Vec3,
    pub rotation: Quat,
    pub scale: Vec3,
}

impl Bytes for GlobalTransform {
    fn write_bytes(&self, buffer: &mut [u8]) {
        // convert self to matrix and write bytes to buffer
    }

    fn byte_len(&self) -> usize {
        return std::mem::size_of::<[f32; 16]>();
    }
}

I think it also makes sense to remove things like the "rotate" and "apply_scale" helper methods in favor of direct field access.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fully agree with the first part.

    pub translation: Vec3,
    pub rotation: Quat,
    pub scale: Vec3,

I'm not quite sold on the attributes being public though, yet. The way I'm perceiving GlobalTransform, the developer shouldn't / shouldn't need to mutate it themselves, or am I forgetting something?

I think it also makes sense to remove things like the "rotate" and "apply_scale" helper methods in favor of direct field access.

I think this comes down to the question "how much should the developer know to use the engine". Most people should find it trivial to apply one Quat to another, but I think the few added lines of code are a worthwhile price for not requiring that sort of knowledge. It could be, however, that I'm reading too much into the priorities of this project, in which case I'd be totally fine with that change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sold on the attributes being public though, yet. The way I'm perceiving GlobalTransform, the developer shouldn't / shouldn't need to mutate it themselves, or am I forgetting something?

I see no harm in making them public in this case. First, it provides parity with Transform (which I think is the biggest win). People can access "translation" the same way on both components. Different apis would increase the cognitive load. Additionally, there may be cases where people want to directly modify GlobalTransform (ex: given that the engine internally uses GlobalTransform as the "actual transform of the entity", there might eventually be a case where we have an entity that only has GlobalTransform).

I think this comes down to the question "how much should the developer know to use the engine". Most people should find it trivial to apply one Quat to another, but I think the few added lines of code are a worthwhile price for not requiring that sort of knowledge. It could be, however, that I'm reading too much into the priorities of this project, in which case I'd be totally fine with that change.

I think I'd like to keep it simple for now. Every api we add has a cost. If it turns out that this is a real usability gap, we can always layer it on top.

}

pub fn from_scale(scale: f32) -> Self {
Transform::new(Mat4::from_scale(Vec3::splat(scale)))
Transform {
scale: Vec3::one() * scale,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could do Vec3::splat(scale) here so it's just assignment with no multiplication.

self
/// Returns transform with the same translation and scale, but rotation so that transform.forward() points at the origin
pub fn looking_at_origin(self) -> Self {
self.looking_at(Vec3::zero(), Vec3::unit_y())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an aside it might pay for bevy to have some constants for UP, FORWARD and SIDE (or RIGHT) axes, might make it clearer what the coordinate system is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added support for creating constants in glam but I need to publish a new release. It's a bit tricky because underlying intrinsics aren't const fn so macros have to be used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooh thats cool / much appreciated 😄

@cart
Copy link
Member

cart commented Oct 16, 2020

I'd like to include this in the next Bevy release (which should be in the next week or so). This is pretty close to being ready. @MarekLg do you want me to push the last few requested changes needed or would you prefer to do it? Don't feel pressured to do work you don't have time for. You've already done a ton of good work here ❤️

@MarekLg
Copy link
Contributor Author

MarekLg commented Oct 16, 2020

Feel free to make the final changes. I don't have much time this weekend, sadly. But thanks for the kind words, looking forward to more contributions!

@cart
Copy link
Member

cart commented Oct 16, 2020

Alrighty I made GlobalTransform a similarity, implemented math ops (using @termhn's Similarity impl as a reference), and removed some methods from Transform/GlobalTransform to match my personal taste.

I'll leave this open for a day or two in case someone wants to comment on my changes, then its merge time!

@cart cart merged commit 5acebed into bevyengine:master Oct 18, 2020
joshuajbouw pushed a commit to joshuajbouw/bevy that referenced this pull request Oct 24, 2020
Transform and GlobalTransform are now Similarities.

This resolves precision errors and simplifies the api

Co-authored-by: Carter Anderson <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-Feature A new feature, making something new possible
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants