Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System-order-independent ECS change tracking #68

Closed
cart opened this issue Jul 22, 2020 · 57 comments
Closed

System-order-independent ECS change tracking #68

cart opened this issue Jul 22, 2020 · 57 comments
Labels
A-ECS Entities, components, systems, and events

Comments

@cart
Copy link
Member

cart commented Jul 22, 2020

The current change tracking system is very lightweight, both for event generation and consumption (which is why we can enable it by default), but it has the following problem:

schedule.run()
  System1: Change<T> query
  System2: mutates T
  world.clear_trackers()
schedule.run()
  System1: Change<T> query ... misses System2's changes

In most cases, this can be worked around via system registration order and stages, but I anticipate some cases that cannot work around this behavior, as well as general confusion from users like: "why isn't my system running ... im definitely mutating this component?"

I think it would be useful to (optionally) allow stateful "order independent" change tracking. I posted some ideas here: #54

@cart cart added the A-ECS Entities, components, systems, and events label Jul 22, 2020
@cart cart changed the title System order independent ECS change tracking System-order-independent ECS change tracking Jul 22, 2020
@alice-i-cecile
Copy link
Member

alice-i-cecile commented Dec 24, 2020

So, the current behaviour has a seriously broken edge case that seems likely to emerge in larger apps.

Suppose we have a system which relies on Changed<Foo> and mutates Bar as well as a system that does the converse. No matter which order we place them in, we will not be able to properly process both sets of changes: they'll silently vanish at the end of the frame. As your app grows, the number of possible interactions of this sort rises quadratically, and you can also get larger loops with the same issue.

Pulling from #54, there seem to be a few options.

  1. Current behavior, which is problematic for the reason given above.
  2. Double-buffering. Somewhat inelegant, makes handling more complex and duplicates data. The requirement to check whether the last frame's entities are still valid (from ECS Change Events #54) seems like a red herring to me: there's no guarantee that they're valid even when only looking within the same frame.
  3. Stateful queries with modified generations.
  4. Track on a per-system basis whether each change has been consumed by it. Like 6, but heavier. Does allow dynamically changing the set of systems more safely though. (I think this is the same as "For systems with a Changed query, identify systems with &mut T after and track changes that occur" from ECS Change Events #54). Like 6. but much heavier. Does allow for dynamically changing systems more safely though.
  5. Manual clearing of changes. This is really onerous and hard to maintain, and probably will end up with most users running into the problems above in their own implementation. Could be handy for certain types of deferrable cleanup systems I guess?
  6. Ensure that changes hang around for "exactly" one frame, by tagging the changes with the system that caused them, then cleaning them up after that system runs in the next frame. Because these changes require reading the resource / component in question, I think that we guarantee that the order WRT all other relevant systems stays consistent between passes. The new "system label" tech that Ratys is working on would work well for this, or the existing SystemID may work.

Edit: Apparently #6 is called a "ring buffer", which should help with finding prior art on efficient implementations.

@fopsdev
Copy link

fopsdev commented Dec 24, 2020

i'm pretty sure i've read something about a "Changed"-queue and the system gets called over and over again until its empty.
But i could be completely wrong and i sadly don't remember where i picked this up...

that should kinda solve it?

@alice-i-cecile
Copy link
Member

I can see the value in a self-recursing queue of Changed objects that @fopsdev, but I'm not sure it's a great fit here.

The self-recursing Changed queue seems hard to fit in:

  • we would need to repeatedly run every system across all stages that takes a Changed. That would result in some very strange ordering, making it harder to reason about the logic flow, and avoiding double-counting would be hard
  • the list of Changed components here are never actually consumed under the current model, in order to ensure that they can be used by multiple orthogonal systems. This is the critical distinction :) Your model works better for a proper event stream; changing this change tracking to use an event model requires a lot more manual handling WRT re-emitting events if you're not done with them and so on
  • possibility of accidentally writing emergent infinite loops, or just very long ones that cause lag spikes

@mockersf
Copy link
Member

How about firing events at the end of a frame before clearing the marker bits?

  • for order dependent change detection within a frame, use query filter Changed
  • for change from last frame, react to events ComponentEvent::Changed<Component>(Entity) (not sure this type signature would work but it would be nice)

It could potentially be a large number of events

@alice-i-cecile
Copy link
Member

How about firing events at the end of a frame before clearing the marker bits

I don't think this gets around the shortcomings of event-based change detection: namely, that you can't elegantly have several consumers of them. If you have two or more systems that come before your change-generating system, your first system will eat the event and the second one never encounters the event.

A simple double-buffer, where we keep around two copies of Changed, one for the current frame and one for the last seems like a strictly better version of this. But I'm still nervous about the complexity and fragility of actually working with it.

With a double-buffer, there's no way of knowing whether we already processed this change in the last time step, resulting in double-processing of changes that occured in preceding systems. To get around this, we could only process certain events that occured in the last time step, but then you can't handle antecedent systems and your code breaks again if the change-generating system gets moved.

You could bypass this by attaching a unique ID to each changed, but then you're left with the overhead of tracking which change has been seen in every change-detecting system, which is a giant mess and probably very costly.

I'm increasingly convinced that solution 6., a ring-buffer, is the only feasible way to handle this. All of the other solutions seem to introduce either subtle logic bugs or require very manual, fragile change detection ala events, which are prone to reintroduce those exact same logic bugs when our end users try to handle the changes on their own.

@mockersf
Copy link
Member

namely, that you can't elegantly have several consumers of them

But you can, by having an event reader local to your system it will keep its own track of which events it read and let other systems read them too

I think a ring buffer has the same issue as a double buffer, you'll have to check that the entity still exists if you try to use it in the next frame

@alice-i-cecile
Copy link
Member

But you can, by having an event reader local to your system it will keep its own track of which events it read and let other systems read them too

Ah cool, I was mistaken on exactly how Bevy's event handling works then. That involves keeping track of a seperate list per system though right?

I think a ring buffer has the same issue as a double buffer, you'll have to check that the entity still exists if you try to use it in the next frame

The issue I raised in my previous comment is distinct from this :) The entities-potentially-not-existing issue raised by you (and cart in his original comments) isn't unique to ring buffers or double buffers. It actually crops up in our current Changed implementation in 0.4. Entities can be deleted mid-frame via commands: they're processed at the end of each stage, so you can easily try to access an entity that no longer exists if you're just operating off the Changed list.

@mockersf
Copy link
Member

Ah cool, I was mistaken on exactly how Bevy's event handling works then. That involves keeping track of a seperate list per system though right?

Each system only keeps a counter. There is a Res event list that is a double buffer, and a Local event reader that counts how many events was read by a system (example). implementation is quite nice and easy to read, and I think without risk of playing same event twice in a reader.

@alice-i-cecile
Copy link
Member

alice-i-cecile commented Jan 5, 2021

Here are my thoughts as I start to do some more concrete design work on these proposals, looking at component flags as our lens.

Component flags are essentially metadata, attached to a single entity-component unit of data.

Desiderata

  1. Component flags are set whenever a specific action occurs.
  2. Component flags only exist for valid entities.
  3. There should be no (or very little) performance cost for components that aren't tracking these changes.
  4. The performance cost of using a flagged component should be reasonably low.
  5. The amount of memory used to store flags must not increase over time.
  6. A component flag should be able to be detected by any other system, independent of their relative location in our scheduling graph.
  7. A component flag should be able to be detected by the system that set it on the next tick.
  8. Each component should be detected exactly once by every other system.
  9. External users should be able to easily add more component flags of their own.

I would say that overhead on non-flagged components should weigh at about 10 times more than overhead on flagged components, as most components will be unflagged.

  1. and 7. are distinct criteria here, since I'm not sure that 7 should be true.

System-ordering properties

In general, the order in which systems run is neither deterministic nor linear. At first blush, this would seem to make hitting property 8 very challenging: if our systems are running in an unpredictable order, how can we be sure that a single flag always persists exactly long enough to be seen once and no more by each relevant system? If the order was fully unconstrained, for any fixed persistence window, the change may be seen 0, 1 or 2 times by another system.

We could get around this by storing a list of SystemID for each component flag (or conversely, store a matrix of entities x component flags for every system), checking to ensure that each system x component flag interaction is handled exactly once before resetting our component flag. This requires a great deal of memory and checking though, especially as our number of systems grows.

But, any valid schedule has two much stronger properties:

  1. Systems which write to a particular entity-component follow a total order. We always know whether one writing system occurs.
  2. Systems that read a particular component follow a partial order, constrained by the systems that write to that component. We always know which two component-writing systems a component-reading system occurs between.

This property is fundamental to how games in Bevy should work, rather than a quirk in the current scheduler. Without it, your behavior is fundamentally not reproducible, as you could get data-processing operations happening out of order.

As I try to show in my next comment, we can use these properties to persist our component flags for "exactly one tick". If we hang on to our component flags until the next time the originating system is run, they will have been seen by every other relevant system (that reads or writes to the associated entity-component) precisely once.

This leads us to another constraint when considering property 9: any system which can write component flags must have write-access to the described entity-component. Similarly, any system which reads component-flags must have read-access to the described entity-component. This makes sense: component flags are essentially a type-agnostic extension of the data contained in every component. We need to wait for changes to them to resolve before we can be confident in using them.

@alice-i-cecile
Copy link
Member

alice-i-cecile commented Jan 5, 2021

A simple "one tick" component flag proposal

  1. Store component flags as an Option<SystemId>. If this is None, the component flag is not set. If it is Some, then the component flag was most recently set by the recorded system.
  2. Whenever some action occurs on the entity-component unit of data in question, set the system flag, marking it with the the current SystemId, overwriting any previous state.
  3. When entering a system (or exiting, based on your feeling on property 7 above), wipe all component flags who match the SystemId of the current system.
  4. When entity-components are deleted (including when entities are despawned), remove all associated component flags.

And that's it! We're guaranteed to see each change exactly once for all other relevant system because of the total order given in the post above: component flags can only be set (either through overwriting or erasing) in systems that have write-access to the entity-component in question.

If we overwrite it with a newer flag, we're still processing that data when we should, and not wasting any effort (because we need to process the longer lifetime for the fresher system).

Notes

  • this costs us a little bit of memory: we need to store a SystemID, rather than just a single bit. We could probably compress this by only considering systems with write access to the component in question, defining a total order, then decoding it when needed.
  • If you have said numeric total order, you can swap to a tick-tock cycle to more quickly assess whether a flag should be reset, based on the system number being called, rather than having to traverse by SystemId
  • I'm not sure how this works with the current DerefMut magic that's currently used by Changed: there may need to be some refactoring
  • Ideally we can factor out this behavior, so then it's only occurring for the components that have an appropriate marker in at least one system's parameters
  • You'll obviously want to use this same behavior for Resources as well :) Changing Events to work using this style of queue may also be desirable
  • If not all systems are running every tick, this behavior gets a little wonky. Changes can persist for longer than they should while a system is sleeping, and if a system is asleep it can miss changes. The tick-tock variant solves this first problem nicely, and is probably better
  • You also need to account for writes made via Commands (caught by @mockersf)

@alice-i-cecile
Copy link
Member

To expand on the tick-tock system mentioned above:

  1. Generate your total order over the systems that can write to the flagged component.
  2. Assign each of these systems a system_time in sequence, stored as a small unsigned integer.
  3. Every pass of our game loop is either a tick or a tock. On a tick, the system's system_time is recorded as the base system num; on a tock, add 1/2 your integer size to the base. This gives us a nice circular measure of time.
  4. In each system, for each tracked component that we have write access to, wipe all component flags that were set between the last time the current system ran, and the current system_time.

(This idea was courtesy of @alercah).

Rather than having to traverse through these component flags one at a time like in the SystemID based proposal, store component flags that are set in their own data structure, always adding new flags to the end of the current list. When you need to reset flags, you can simply slice off the start, found by either a search or recording how many were added and how many were overwritten each time they're modified.

I'm not sure if that optimization would be net positive; I worry that it would make filtering queries slower, since the flags wouldn't be stored aligned.

@Ratysz
Copy link
Contributor

Ratysz commented Jan 5, 2021

In general, the order in which systems run is neither deterministic nor linear. [...] But, any valid schedule has two much stronger properties: [...] This property is fundamental to how games in Bevy should work, rather than a quirk in the current scheduler.

With the current executor the order is deterministic and linear within exactly the constraints you listed, it even says so on the tin in the docs. And it is, technically, a quirk of the implementation - although, I do agree that this is how games should generally work. But: it should be opt-in, not mandatory like it is now. For example, maintaining now some sort of index in a resource nails down the order between any systems that mutate it, even if the user doesn't actually care in which order those mutations happen. Which brings us to...

The new parallel executor in #1144 will replace the implicit systems ordering with explicit dependencies - you won't be able to generate any sort of "total order over the systems that can write to the flagged component", which kind of makes the rest of the proposal fall apart.

You will, however, have a different sort of order: the aforementioned explicit dependencies. Fundamentally, they allow the user to tell Bevy that "this system has to run after that system", which is a less convoluted way to say "this system must observe the side effects of that system". Making a change to a component or a resource is a side effect; how the rest of the systems run in relation to each other is irrelevant to this specific dependency relation.

An apparent "side problem": changes to a component can happen by an unrelated system, after running the dependency but before running the dependant. The solution is to do nothing, because that's not our problem - if the user cares about the dependant observing side effects of an unrelated system, they should specify their dependency tree to exclude that situation.

Regardless, all of that is largely irrelevant to the problem this issue brings up.

The problem is not "how can we improve our light-weight change tracking to also handle multi-frame changes" - that's not what the mechanism is for, it's a filter that enables fine-grained short-circuiting of query iterations, and should be left as just that.

The problem is "should we introduce stateful (beefier, less transient) change tracking, and if so, how".

@alice-i-cecile
Copy link
Member

Thanks for the explanation of the new scheduler: I finally understand what the proposed code does, and definitely agree with you that my proposed design here is incompatible with it. I am pretty firmly of the opinion that we should not go ahead with that scheduling algorithm but that's a better topic for #1184.

Regardless, all of that is largely irrelevant to the problem this issue brings up.

The problem is not "how can we improve our light-weight change tracking to also handle multi-frame changes" - that's not what the mechanism is for, it's a filter that enables fine-grained short-circuiting of query iterations, and should be left as just that.

The problem is "should we introduce stateful (beefier, less transient) change tracking, and if so, how".

I don't think I agree on this point. The existing change-tracking is light-weight, but has unintuitive behavior, results in emergent impossible-to-resolve circular dependencies, is very hard to debug if anything goes wrong and imposes a performance overhead on every component, whether or not it's tracked.

I don't believe we should offer change tracking in its current form: while it will do what you want for simple demos and perform well on benchmarks, it makes code bases incredibly brittle by restricting ordering significantly, and is very tricky to catch when it breaks due to the heavily non-local and unpredictable path from changes to detection.

@Ratysz
Copy link
Contributor

Ratysz commented Jan 5, 2021

I don't believe we should offer change tracking in its current form: while it will do what you want for simple demos and perform well on benchmarks, it makes code bases incredibly brittle by restricting ordering significantly, and is very tricky to catch when it breaks due to the heavily non-local and unpredictable path from changes to detection.

I still think it does have value: the aforementioned fine-grained short-circuiting of query iterations. As in, "I want to run this piece of code on all things in this query, but only if that one component has changed" - for things like synchronizing external data and what not; again, it's a filter. (EDIT: there might be a more fitting implementation if that's all we want from it, though)

What we shouldn't do is advertise it as a control flow mechanism.

@alice-i-cecile
Copy link
Member

Hmm, I can see its value for cases where the cost of tracking changes at higher granularity / duration is relatively high.

In any case, we should expect most of operations that use Changed to be idempotent: synchronization tasks of various forms. In that case, we don't mind doing extra work much, but really don't want to lose changes by accident.

In that case, we can simply persist Changed component flags for two frames with a double buffer like Events uses. You double the memory costs, add a tiny overhead and may have to repeat work, but gain system order independence, solving my other usability concerns.

This is much better default behavior than before, especially if we can make change tracking only apply to the relevant components.

For cases where you really don't want to be repeating work for logic or performance reasons, we may still want to consider implementing something heavier.

@branpk
Copy link

branpk commented Jan 6, 2021

I wonder if it would make sense to tweak the name of Changed to reflect this distinction, e.g. ChangedThisFrame. It's easy to forget that Changed works differently than events.

@alice-i-cecile
Copy link
Member

alice-i-cecile commented Jan 6, 2021 via email

@Davier
Copy link
Contributor

Davier commented Jan 28, 2021

I propose we could have both unreliable same-frame, and reliable next-frame change detection (Edit: here reliable means triggers exactly once). Since the reliable version completely ignores the same-frame changes, it is almost as lightweight as the current change detection. Besides this caveat, I think this solves most issues (system-order-dependence, and two systems detecting each other's changes). It would make for a great tool during prototyping a game, since the reliable versions can be easily upgraded to the same-frame version later on as needed.

More precicely, we would change the ComponentFlags from:

bitflags! {
    pub struct ComponentFlags: u8 {
        const ADDED = 1;
        const MUTATED = 2;
    }
}

to:

bitflags! {
    pub struct ComponentFlags: u8 {
        const ADDED = 1;
        const JUST_ADDED =  2;
        const MUTATED = 3;
        const JUST_MUTATED =  4;
    }
}

(Of course the naming is subject to debate).
When a component is added/mutated, only the JUST_X flags would be set. At the end of each frame, the struct would be updated with: (flags & 0b00001010) >> 1 instead of being cleared. The following query transformers would exist:

  • JustAdded<T>, JustMutated<T> and JustChanged<T> as the same-frame but unreliable versions (the current existing flags)
  • Added<T>, Mutated<T> and Changed<T> as the reliable but next-frame versions

The memory footprint would not change since ComponentFlags is a u8 already. It would be slightly more expensive to update flags at the end of the frame, and querying Changed would require an additional bit mask. Thus I expect the performance impact to be very low. I plan to implement and benchmark it.

@inodentry
Copy link
Contributor

inodentry commented Jan 28, 2021

Yes! The above suggestion is semantically the same as "double buffering", but without having any effect on memory use. These are 1-bit flags we are dealing with, and we only have 2 of them. The single byte of memory we have to use to store them has plenty of space! We could simply use 2 more bits out of that byte! No extra memory needed.

@DJMcNab
Copy link
Member

DJMcNab commented Jan 28, 2021

I'm going to dump my thoughts verbatim from discord:

Could we count the number of times Changed could be read for a given T, and then decrement it when that system is run/would be run (barring run-criteria failures), then reset it to that count whenever it gets changed?

It wouldn't be that bad is my point
Just a touch more piping through of the information we already have
We just pretend each criteria passes for the purposes of this checking

We'd have to be careful around looping criteria, to make sure the 'counter has been decremented for this system' would be properly managed, although we could just not decrement it on every time after the first and document that you have to manage your own steady-state, which seems like a reasonable limitation.

In essense, this makes checking Changed<T> a touch more expensive, and a touch of precomputation. But it seems like a cheap way to manage this, if not the most correct solution.

Also if this doesn't make any sense, that is expected.

@bjorn3
Copy link
Contributor

bjorn3 commented Jan 28, 2021

What about adding a count for each entity describing when it was last changed and then have each system store the last count for which it was run. Changed<T> can then check if this count was less than that of the entity. At 60fps that would require 18446744073709551615.0/60/60/60/24 = 3558399705576.688 days before overflowing assuming that the counter would be incremented once per frame. I think it would need to be incremented once per stage though, but even then it would still require years to overflow. This scheme also nicely handles systems that only run sometimes or dynamically added systems with both all entities marked as changed (system counter = 0) or unchanged (system counter = global counter). This would be especially important for the editor which would probably want to have a way to suspend all the systems forming the game logic while still running the systems that are responsible for rendering.

@inodentry
Copy link
Contributor

inodentry commented Jan 28, 2021

This sounds like it might be reasonable.

I am trying to get a sense of the memory requirements this would have. This would just replace the flags, right? / These counters would be in exactly the same place where we currently have the flags?

If so, then that just expands it from 1 byte to 8 (or 4, or 2, could it be made to work with an overflowing counter? not sure...), which sounds like it wouldn't be a huge amount of overhead.

Also, I think the "just added" case could be implemented by special-casing the zero value (a newly added component would have a zero counter).

If implemented this way, then Mutated<T> no longer makes sense (detecting mutations, but not additions). We would just have Added<T> and Changed<T>.

The real benefit of this proposal is that it does not split change detection into two ("reliable but next frame" vs "same frame but unreliable"), which is the main disadvantage of the "double buffering" approaches discussed before.

This might be worth the extra memory overhead. This approach is still cheap on computations / does not require complicated tracking algorithms.

@bjorn3
Copy link
Contributor

bjorn3 commented Jan 28, 2021

I am trying to get a sense of the memory requirements this would have. This would just replace the flags, right? / These counters would be in exactly the same place where we currently have the flags?

Correct

If so, then that just expands it from 1 byte to 8 (or 4, or 2, i think we could make it work with an overflowing counter? not sure...), which sounds like it wouldn't be a huge amount of overhead.

A u32 would give 828.5 days before overflowing given a single increase per frame. For a single increase per stage it is likely to overflow in a reasonable amount of time. I don't really think it would be possible to overflow. At least not easily. A system may not run for more than one overflow of the counter, but still need to see the change when it runs again. And yes, I don't think a u64 is much overhead. I expect the actual components to often be much bigger.

Also, I think the "just added" case could be implemented by special-casing the zero value (a newly added component would have a zero counter).

That would be a nice way of handling it.

@inodentry
Copy link
Contributor

inodentry commented Jan 28, 2021

@cart Would love to hear your opinion w.r.t these last two proposals:

1. Flag-doubling (@Davier 's proposal)

This is a simple and straightforward extension of the existing implementation.

Semantically equivalent to double buffering, but implemented by just doubling the bit flags, to track changes across 2 frames.

This should introduce no additional overhead over the existing implementation. All the flags still fit in 1 byte, and the clear operation is replaced with a bit mask and shift (which are basically free on modern CPUs).

The disadvantage is in UX. We would need to split the change detection query filters into JustChanged<T> vs. Changed<T>, and make the user explicitly consider the tradeoff: "same-frame detection, but possibly unreliable" (same as the old behavior) vs. "next-frame detection, guaranteed reliable" (the new use case enabled by this change).

2. Counters (@bjorn3 's proposal, with my revisions)

This would allow keeping the UX simple, with only one change detection filter that always works.

However, it would have greater memory usage and some perf impact (no idea how much).

The 1-byte-per-component flags would be replaced with a 8-byte u64 counter. Each system would also gain a single u64 counter.

The logic would be something like this (in pseudocode):

static GLOBAL_COUNTER: AtomicU64 = AtomicU64::new(1);

fn on_component_mutation() {
    component.counter = GLOBAL_COUNTER;
}

fn is_component_mutated() -> bool {
    component.counter > system.counter
}

fn after_run_system() {
    GLOBAL_COUNTER += 1;
    system.counter = GLOBAL_COUNTER;
}

Every time the scheduler runs a system, it increments the global atomic counter and updates the system's counter. The per-component counter is updated on mutation. The two are compared for change detection.

Added<T> could be handled by special-casing the zero value (a newly added component would have a zero counter).

This would obsolete Mutated<T>, as we can't detect "mutation, but not if the component was just added", leaving just Changed<T> and Added<T>.

Since this implementation would have a greater perf cost, @alice-i-cecile suggested that it could be mitigated in the future by analyzing the app to collect information about which components and systems need change detection, to disable it where it isn't needed.

BTW, we don't need to worry about the u64 counter overflowing. A quick calculation, assuming an app that runs 1000 systems per frame at 60 fps, shows that it would take almost 10 million years. 10000 systems at 240 fps = 243060 years.


So, we have a proposal that is simple in its implementation, but more complex to use, and a proposal that is complex in its implementation, but simple to use.

I personally favor the first proposal (due to its elegant simplicity and low overhead), but the second may be worth considering, too.

The explicit tradeoff with choosing which type of change detection you want (Changed<T> = "next frame, reliable" vs. JustChanged<T> = "this frame, unreliable") is unfortunate, but it is something that could be taught to users via documentation.

@alice-i-cecile
Copy link
Member

As a small comment on the second proposal: I don't think that losing access to Mutated is a serious concern at all. I can't see any practical uses for it, and if you really need that information, you can extract it back out by combining information from Added and Changed.

@inodentry
Copy link
Contributor

inodentry commented Jan 29, 2021

BTW, another idea for Added: what if we reserve / mask off the top bit of the counters, to use as an "added" flag?

Now that we are gonna have per-system counters anyway, that gives us the ability to have a per-system flag, too, for free. Not just on the components.

It would halve the range of the counters, but as we established before, we have no reason to worry about overflow.

@alice-i-cecile
Copy link
Member

Events seem quite heavyweight for this? Also, users have the option of using events if they want to (instead of Added), anyway. I don't think we would win anything in this case by implementing one bevy feature/mechanism using another. It would just make it redundant.

The idea here was to replace Added entirely if we couldn't get a working solution using bitflags.

Your proposal below works well though, and I think resolved the concern there.

@Davier
Copy link
Contributor

Davier commented Feb 6, 2021

Having reliable change detection would make the transition to the new scheduler way less error-prone, it should probably be done before the 0.5 release.

I believe we have reached consensus (otherwise please correct me) on u32 mutation counters, one in each component + one in each system, incremented on each system execution, with overflow detection.
With 800 systems at 60 fps, a system that is not run may miss a mutation after a day. This seems acceptable to me.
I suppose the implementation should wait for the two large ECS PRs to land. (Is someone working or planning to work on it, or should I?)

Here are a few comments about addition detection:

Usually, you'd use Added to do some nice post-processing before an entity is used.

I expect the main use case of Added in bevy itself would be for optimization of systems that do any kind of bookkeeping. Currently its only use is to detect new cameras in camera_system. I also use it in #1211 to incrementally update the UI. It's very important to not miss additions in this kind of core systems, and I think we will use Added in a lot of places as we optimize them.

[U]nder this new counter-based implementation, Added specifically means "freshly added, never mutated before".

I think you would want to flag all added components as mutated at the time of their creation.

Agreed. Moreover, clearing the Added value/flag on first mutation would break the use case I just described (in addition to the multiple-post-processing case @alice-i-cecile mentioned).

Events seem quite heavyweight for this? Also, users have the option of using events if they want to (instead of Added), anyway.

We can't rely on users to manually send the appropriate event when they create some component, that would be too brittle. However, events are currently not reliable for systems that skip frames, so they would have little advantage over a flag.
One advantage they'd have is that we wouldn't need to update all the component counter at the end of every frame, which could be a measurable perf improvement.

BTW, another idea for Added: what if we reserve / mask off the top bit of the counters, to use as an "added" flag?

Now that we are gonna have per-system counters anyway, that gives us the ability to have a per-system flag, too, for free. Not just on the components.

Could you expand on this? I don't see how the per-system flag would work since systems typically handle a lot of components.

What I can see is using one bit of the component counters as the current Added flag. Keeping Added in the same u32 as Mutated makes the Changed implementation easy (while I don't know how it would work if Added was an event). But of course it would force a few systems to run at the end of every frame.

Interestingly, I started writing this comment in favour for events, but changed my mind along the way. I now agree with the use of an Added bit flag (but we should revisit that question when we have some kind of reliable events, to make it work when frames are skipped).

PS: in order to get some perf improvement for the Added flags at the cost of little memory, we could store them separately from the Vec of counters in each archetype, packed into an array of bits, so that clearing them all becomes very cheap. Maybe having a sparse storage for them would be even better, I'm not sure.

@rmsc
Copy link
Contributor

rmsc commented Feb 16, 2021

Can we can simply store the Added count together with the Mutated count?

That would make Added as reliable as Mutated, with the added bonus that we could measure component lifetimes:

static GLOBAL_COUNTER: AtomicU16 = AtomicU16::new(1);

fn on_component_addition() {
    component.added_counter = GLOBAL_COUNTER;
}

fn on_component_mutation() {
    component.mutated_counter = GLOBAL_COUNTER;
}

fn is_component_added() -> bool {
   (component.added_counter - system.counter) & 0x7FFF > 0
}

fn is_component_mutated() -> bool {
   (component.mutated_counter - system.counter) & 0x7FFF > 0
}

fn get_component_lifetime() -> (u16, bool) {
    GLOBAL_COUNTER.overflowing_sub(component.added_counter)
}

fn after_run_system() {
    // integer overflow is OK
    GLOBAL_COUNTER.overflowing_add(1);
    system.counter = GLOBAL_COUNTER;
}

@Davier
Copy link
Contributor

Davier commented Feb 17, 2021

Can we can simply store the Added count together with the Mutated count?

It would solve the problem, but at a cost of more 4 bytes per component. I don't know if it's worth it. Unlike mutation it only happens once per component, so we should be able to find a more efficient way to do it (and I still believe reliable events will be a good solution when we have them).

with the added bonus that we could measure component lifetimes

The counter is increased after each system ran, and the number of system that run in a frame is variable. Would that measure be useful? Also, for a large app at high FPS, the lifetime could overflow in hours.

With all that said, I would be in favour of doing that as a short-term reliable solution, and improve it later.

@rmsc
Copy link
Contributor

rmsc commented Feb 17, 2021

Unlike mutation it only happens once per component, so we should be able to find a more efficient way to do it (and I still believe reliable events will be a good solution when we have them).

I agree, reliable events do make more sense in that case.

The counter is increased after each system ran, and the number of system that run in a frame is variable. Would that measure be useful? Also, for a large app at high FPS, the lifetime could overflow in hours.

I believe the counter should be incremented only for systems that mutate the component (it's a mutation timestamp after all). That should slow down the counter a lot, to the point where overflows occur very infrequently (if at all).

That said, I see how measuring lifetimes this way is likely non-deterministic (and therefore useless).

@Braymatter
Copy link
Contributor

My only reservation with the generation counter solution is that its not exact, if we have the same set of systems and components in a given frame it should act as a pure function and have the exact same end state regardless of how we get there.

If we start having indeterminate behavior because of scheduling / counter overflow / missed additions etc. we won't have the ability to reliably compute the same frame / entities across a network potentially, or on long running tasks.

Any solution we come up with probably needs these constraints on it:

  1. For a given system it processes an entity exactly once per frame
  2. A frame needs to act as a pure function. Given the same set of entities / components it needs to produce the same end state without side-effects.

Not sure what that solution ends up looking like, but I would definitely die on the hill of reliable/predictable computation

@Davier
Copy link
Contributor

Davier commented Feb 19, 2021

The planned solution (proposal 5 in #1471) should be reliable. To answer your concerns:

  • scheduling inconsistency should not happen if the systems are correctly ordered (otherwise it's a bug in the app, and the ambiguity detection tool should report it)
  • counter overflow will be corrected automatically
  • missed addition also will not be an issue if we use a counter like for mutations, or reliable events
  • any change will be detected exactly once

My only reservation with the generation counter solution is that its not exact

I don't see why, but it would indeed be an issue. Could you expand on this?

@fenduru
Copy link

fenduru commented Feb 20, 2021

💯 I was literally just starting to write up essentially the same proposal and came across this thread so glad to see this is already in the works.

Events have the same problem here, where the buffer get's cleared before it is guaranteed that all readers have had a chance to consume the events (i.e. in the case of a fixed timestep system). Would the implementation for Flags be generalized so that cases like Events could be implemented in a similar way (e.g. so the events can be kept in buffer until all consuming systems have had a chance to read them)?

@inodentry
Copy link
Contributor

@fenduru

(e.g. so the events can be kept in buffer until all consuming systems have had a chance to read them)

Events being non-persistent was one of the original selling points of bevy, it's an intentional design decision. It allows events to be performant, without concern for allocations or potential runaway memory usage.

I think they should be kept as they are.

@fenduru
Copy link

fenduru commented Feb 21, 2021

@jamadazi I believe you that it was an intentional decision, but the current event architecture is incompatible with RunCriteria, and I'm not seeing how this is a trade-off for performance/memory efficiency.

  1. If events were stored using a ring buffer instead of a double buffer you wouldn't need any additional allocation (compared to now)
  2. The memory overhead would be O(1) since the number of consumers (i.e. systems) is ~fixed and each consumer basically just needs to track a single index.
  3. The runtime cost of updating that index is also trivial.
  4. There wouldn't be any allocation concerns that don't already exist (actually looking at the code the memory performance for events is suboptimal right now because a new Vec gets created every frame rather than using Vec::clear)
  5. The only risk of "runaway memory usage" would be if you have a system that consumes Event<T> with a RunCondition of never.

@cart
Copy link
Member Author

cart commented Feb 21, 2021

The only risk of "runaway memory usage" would be if you have a system that consumes Event with a RunCondition of never.

I think this is more of a problem than you're making it out to be / it creates new problems. For example, consider an app that listens to some high volume event (collisions), both in the systems that run during the Menu state and the InGame state. Without additional logic to "unsubscribe", the Menu event listener will cause the collision event queue to grow unbounded. "runaway memory usage" will also happen when the RunCondition is infrequent (it doesnt need to be never). Ex: "every 20 minutes check for collision events".

There wouldn't be any allocation concerns that don't already exist (actually looking at the code the memory performance for events is suboptimal right now because a new Vec gets created every frame rather than using Vec::clear

This isn't clear cut imo.

  1. When many events are produced in a given update, Vec::clear() would result in "memory garbage" that is never cleaned up. So this is suboptimal for "burst-ey" events.
  2. Vec::new() is allocation free, so when events don't occur on a given update, no memory is allocated

The current impl is better for events that are either burst-ey or events that happen is small volumes. Your suggested impl is better for events that occur in the same volume each update.

@fenduru
Copy link

fenduru commented Feb 22, 2021

Without additional logic to "unsubscribe", the Menu event listener will cause the collision event queue to grow unbounded

This is a good point. I was thinking of this more from the perspective of two systems that are both "consistently running" but not necessarily every frame (i.e. fixed timestep systems). The case you describe for disjoint states as run conditions definitely runs into runaway memory problems.

This isn't clear cut imo.

I agree that it isn't clear cut, however for burst-ey events you're going to end up with multiple allocations on the frames where you have a burst of events (if you have 64 events you'll have 4 allocations). I think its arguably better to keep the buffer allocated even for the frames where you're getting 0 events for two reasons.

  1. If you have a burst of 64 events once, there's at least some probability that it will happen again. Obviously there could be a case where you get 1000 events and then never again and you'll keep the memory around, but that seems like an edge case rather than a common case.
  2. It avoids allocating in the middle of a frame as much as possible.

I'll have to think on this some more, but it seems like there's two use cases - consuming "new events" and consuming "events that happened since I ran last". But even in the latter case you might have nesting based on the disjoint states (menu vs ingame).

@alice-i-cecile
Copy link
Member

A draft implementation of Proposal 5 (Generation-Counter Change Detection) from #1471 is now available for review, courtesy of @Davier.

bors bot pushed a commit that referenced this issue Mar 19, 2021
# Problem Definition

The current change tracking (via flags for both components and resources) fails to detect changes made by systems that are scheduled to run earlier in the frame than they are.

This issue is discussed at length in [#68](#68) and [#54](#54).

This is very much a draft PR, and contributions are welcome and needed.

# Criteria
1. Each change is detected at least once, no matter the ordering.
2. Each change is detected at most once, no matter the ordering.
3. Changes should be detected the same frame that they are made.
4. Competitive ergonomics. Ideally does not require opting-in.
5. Low CPU overhead of computation.
6. Memory efficient. This must not increase over time, except where the number of entities / resources does.
7. Changes should not be lost for systems that don't run.
8. A frame needs to act as a pure function. Given the same set of entities / components it needs to produce the same end state without side-effects.

**Exact** change-tracking proposals satisfy criteria 1 and 2.
**Conservative** change-tracking proposals satisfy criteria 1 but not 2.
**Flaky** change tracking proposals satisfy criteria 2 but not 1.

# Code Base Navigation

There are three types of flags: 
- `Added`: A piece of data was added to an entity / `Resources`.
- `Mutated`: A piece of data was able to be modified, because its `DerefMut` was accessed
- `Changed`: The bitwise OR of `Added` and `Changed`

The special behavior of `ChangedRes`, with respect to the scheduler is being removed in [#1313](#1313) and does not need to be reproduced.

`ChangedRes` and friends can be found in "bevy_ecs/core/resources/resource_query.rs".

The `Flags` trait for Components can be found in "bevy_ecs/core/query.rs".

`ComponentFlags` are stored in "bevy_ecs/core/archetypes.rs", defined on line 446.

# Proposals

**Proposal 5 was selected for implementation.**

## Proposal 0: No Change Detection

The baseline, where computations are performed on everything regardless of whether it changed.

**Type:** Conservative

**Pros:**
- already implemented
- will never miss events
- no overhead

**Cons:**
- tons of repeated work
- doesn't allow users to avoid repeating work (or monitoring for other changes)

## Proposal 1: Earlier-This-Tick Change Detection

The current approach as of Bevy 0.4. Flags are set, and then flushed at the end of each frame.

**Type:** Flaky

**Pros:**
- already implemented
- simple to understand
- low memory overhead (2 bits per component)
- low time overhead (clear every flag once per frame)

**Cons:**
- misses systems based on ordering
- systems that don't run every frame miss changes
- duplicates detection when looping
- can lead to unresolvable circular dependencies

## Proposal 2: Two-Tick Change Detection

Flags persist for two frames, using a double-buffer system identical to that used in events.

A change is observed if it is found in either the current frame's list of changes or the previous frame's.

**Type:** Conservative

**Pros:**
- easy to understand
- easy to implement
- low memory overhead (4 bits per component)
- low time overhead (bit mask and shift every flag once per frame)

**Cons:**
- can result in a great deal of duplicated work
- systems that don't run every frame miss changes
- duplicates detection when looping

## Proposal 3: Last-Tick Change Detection

Flags persist for two frames, using a double-buffer system identical to that used in events.

A change is observed if it is found in the previous frame's list of changes.

**Type:** Exact

**Pros:**
- exact
- easy to understand
- easy to implement
- low memory overhead (4 bits per component)
- low time overhead (bit mask and shift every flag once per frame)

**Cons:**
- change detection is always delayed, possibly causing painful chained delays
- systems that don't run every frame miss changes
- duplicates detection when looping

## Proposal 4: Flag-Doubling Change Detection

Combine Proposal 2 and Proposal 3. Differentiate between `JustChanged` (current behavior) and `Changed` (Proposal 3).

Pack this data into the flags according to [this implementation proposal](#68 (comment)).

**Type:** Flaky + Exact

**Pros:**
- allows users to acc
- easy to implement
- low memory overhead (4 bits per component)
- low time overhead (bit mask and shift every flag once per frame)

**Cons:**
- users must specify the type of change detection required
- still quite fragile to system ordering effects when using the flaky `JustChanged` form
- cannot get immediate + exact results
- systems that don't run every frame miss changes
- duplicates detection when looping

## [SELECTED] Proposal 5: Generation-Counter Change Detection

A global counter is increased after each system is run. Each component saves the time of last mutation, and each system saves the time of last execution. Mutation is detected when the component's counter is greater than the system's counter. Discussed [here](#68 (comment)). How to handle addition detection is unsolved; the current proposal is to use the highest bit of the counter as in proposal 1.

**Type:** Exact (for mutations), flaky (for additions)

**Pros:**
- low time overhead (set component counter on access, set system counter after execution)
- robust to systems that don't run every frame
- robust to systems that loop

**Cons:**
- moderately complex implementation
- must be modified as systems are inserted dynamically
- medium memory overhead (4 bytes per component + system)
- unsolved addition detection

## Proposal 6: System-Data Change Detection

For each system, track which system's changes it has seen. This approach is only worth fully designing and implementing if Proposal 5 fails in some way.  

**Type:** Exact

**Pros:**
- exact
- conceptually simple

**Cons:**
- requires storing data on each system
- implementation is complex
- must be modified as systems are inserted dynamically

## Proposal 7: Total-Order Change Detection

Discussed [here](#68 (comment)). This proposal is somewhat complicated by the new scheduler, but I believe it should still be conceptually feasible. This approach is only worth fully designing and implementing if Proposal 5 fails in some way.  

**Type:** Exact

**Pros:**
- exact
- efficient data storage relative to other exact proposals

**Cons:**
- requires access to the scheduler
- complex implementation and difficulty grokking
- must be modified as systems are inserted dynamically

# Tests

- We will need to verify properties 1, 2, 3, 7 and 8. Priority: 1 > 2 = 3 > 8 > 7
- Ideally we can use identical user-facing syntax for all proposals, allowing us to re-use the same syntax for each.
- When writing tests, we need to carefully specify order using explicit dependencies.
- These tests will need to be duplicated for both components and resources.
- We need to be sure to handle cases where ambiguous system orders exist.

`changing_system` is always the system that makes the changes, and `detecting_system` always detects the changes.

The component / resource changed will be simple boolean wrapper structs.

## Basic Added / Mutated / Changed

2 x 3 design:
- Resources vs. Components
- Added vs. Changed vs. Mutated
- `changing_system` runs before `detecting_system`
- verify at the end of tick 2

## At Least Once

2 x 3 design:
- Resources vs. Components
- Added vs. Changed vs. Mutated
- `changing_system` runs after `detecting_system`
- verify at the end of tick 2

## At Most Once

2 x 3 design:
- Resources vs. Components
- Added vs. Changed vs. Mutated
- `changing_system` runs once before `detecting_system`
- increment a counter based on the number of changes detected
- verify at the end of tick 2

## Fast Detection
2 x 3 design:
- Resources vs. Components
- Added vs. Changed vs. Mutated
- `changing_system` runs before `detecting_system`
- verify at the end of tick 1

## Ambiguous System Ordering Robustness
2 x 3 x 2 design:
- Resources vs. Components
- Added vs. Changed vs. Mutated
- `changing_system` runs [before/after] `detecting_system` in tick 1
- `changing_system` runs [after/before] `detecting_system` in tick 2

## System Pausing
2 x 3 design:
- Resources vs. Components
- Added vs. Changed vs. Mutated
- `changing_system` runs in tick 1, then is disabled by run criteria
- `detecting_system` is disabled by run criteria until it is run once during tick 3
- verify at the end of tick 3

## Addition Causes Mutation

2 design:
- Resources vs. Components
- `adding_system_1` adds a component / resource
- `adding system_2` adds the same component / resource
- verify the `Mutated` flag at the end of the tick
- verify the `Added` flag at the end of the tick

First check tests for: #333
Second check tests for: #1443

## Changes Made By Commands

- `adding_system` runs in Update in tick 1, and sends a command to add a component 
- `detecting_system` runs in Update in tick 1 and 2, after `adding_system`
- We can't detect the changes in tick 1, since they haven't been processed yet
- If we were to track these changes as being emitted by `adding_system`, we can't detect the changes in tick 2 either, since `detecting_system` has already run once after `adding_system` :( 

# Benchmarks

See: [general advice](https://github.com/bevyengine/bevy/blob/master/docs/profiling.md), [Criterion crate](https://github.com/bheisler/criterion.rs)

There are several critical parameters to vary: 
1. entity count (1 to 10^9)
2. fraction of entities that are changed (0% to 100%)
3. cost to perform work on changed entities, i.e. workload (1 ns to 1s)

1 and 2 should be varied between benchmark runs. 3 can be added on computationally.

We want to measure:
- memory cost
- run time

We should collect these measurements across several frames (100?) to reduce bootup effects and accurately measure the mean, variance and drift.

Entity-component change detection is much more important to benchmark than resource change detection, due to the orders of magnitude higher number of pieces of data.

No change detection at all should be included in benchmarks as a second control for cases where missing changes is unacceptable.

## Graphs
1. y: performance, x: log_10(entity count), color: proposal, facet: performance metric. Set cost to perform work to 0. 
2. y: run time, x: cost to perform work, color: proposal, facet: fraction changed. Set number of entities to 10^6
3. y: memory, x: frames, color: proposal

# Conclusions
1. Is the theoretical categorization of the proposals correct according to our tests?
2. How does the performance of the proposals compare without any load?
3. How does the performance of the proposals compare with realistic loads?
4. At what workload does more exact change tracking become worth the (presumably) higher overhead?
5. When does adding change-detection to save on work become worthwhile?
6. Is there enough divergence in performance between the best solutions in each class to ship more than one change-tracking solution?

# Implementation Plan

1. Write a test suite.
2. Verify that tests fail for existing approach.
3. Write a benchmark suite.
4. Get performance numbers for existing approach.
5. Implement, test and benchmark various solutions using a Git branch per proposal.
6. Create a draft PR with all solutions and present results to team.
7. Select a solution and replace existing change detection.

Co-authored-by: Brice DAVIER <[email protected]>
Co-authored-by: Carter Anderson <[email protected]>
@alice-i-cecile
Copy link
Member

Closed by #1471!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-ECS Entities, components, systems, and events
Projects
None yet
Development

No branches or pull requests