Use change lists instead of ticks for detecting when meshes need to be re-specialized and/or re-queued.#22966
Conversation
re-specialized and/or re-queued. Right now, every frame, all specialization and queuing systems iterate over all entities visible from a view and check to see whether they need to be updated by consulting a set of change ticks and comparing them to the current change ticks. To handle cases in which a mesh needs to be removed from the bins, a separate final *sweep* pass then finds entities that no longer exist and removes them manually from the bins. This process is complex, error-prone, and slow, as it involves visiting all visible entities multiple times every frame. This PR changes the setup so that, instead of examining change ticks, the visibility logic pushes the set of added and removed entities to each view explicitly. The visibility system determines which meshes need to be added and removed by first sorting the list of visible entities, then performing an O(n) diff process on the last frame's visible entities and this frame's visible entity list. The end result is that the specialization and queuing systems only process the entities that they need to every frame. If a mesh was visible last frame, remained visible this frame, and didn't change its mesh or material, then it's generally not examined at all. Not only is this significantly faster for virtually all realistic scenes, but it's also much simpler. In order to achieve the benefits of not examining every visible mesh every frame, I made sorted render passes retained via an `IndexMap`. This allows entities to be removed and added via random access while still allowing the list to be sorted by distance. Note that I had to remove the radix sort because `IndexMap` doesn't currently support that; I believe the enormous speed benefits of this patch outweigh any minor sorting regressions from this. I tested this PR by running `scene_viewer` on a test scene with many meshes and materials and implementing a material shuffler that randomly switches the materials around. I tested the following cases: * Moving the camera so that meshes become visible and invisible. * Switching opaque materials on meshes. * Moving meshes from opaque to alpha masked and vice versa. * Moving meshes from binned render passes to sorted render passes (i.e. transparent). * All of the above while the meshes were off screen, then moving them on screen to ensure that the changes took effect. This PR brings the `specialize_shadows` time on the `bevy_city` demo from 12.87 ms per frame to 0.1261 ms per frame, a 102x speedup. It brings the `queue_shadows` time on the same demo from 12.34 ms per frame to 0.1102 ms, a 111x speedup. Mean frame time goes from 50.16 ms to 23.26 ms, a 2.16x speedup.
|
It looks like your PR is a breaking change, but you didn't provide a migration guide. Please review the instructions for writing migration guides, then expand or revise the content in the migration guides directory to reflect your changes. |
|
Your PR caused a change in the graphical output of an example or rendering test. This might be intentional, but it could also mean that something broke! If it's expected, please add the M-Deliberate-Rendering-Change label. If this change seems unrelated to your PR, you can consider updating your PR to target the latest main branch, either by rebasing or merging main into it. |
2 similar comments
|
Your PR caused a change in the graphical output of an example or rendering test. This might be intentional, but it could also mean that something broke! If it's expected, please add the M-Deliberate-Rendering-Change label. If this change seems unrelated to your PR, you can consider updating your PR to target the latest main branch, either by rebasing or merging main into it. |
|
Your PR caused a change in the graphical output of an example or rendering test. This might be intentional, but it could also mean that something broke! If it's expected, please add the M-Deliberate-Rendering-Change label. If this change seems unrelated to your PR, you can consider updating your PR to target the latest main branch, either by rebasing or merging main into it. |
|
Your PR caused a change in the graphical output of an example or rendering test. This might be intentional, but it could also mean that something broke! If it's expected, please add the M-Deliberate-Rendering-Change label. If this change seems unrelated to your PR, you can consider updating your PR to target the latest main branch, either by rebasing or merging main into it. |
tychedelia
left a comment
There was a problem hiding this comment.
First pass looks great. This is substantially more understandable and I think a major improvement. LMK if you need any help getting CI green.
|
OK, I had to substantially redo this in order to fix regressions: The biggest problem is that we weren't properly handling the case in which a mesh was added to The second issue is that UI elements use render entities to differentiate between different items in the same phase with the same main entity. This was causing problems because the Finally, immediate mode gizmos weren't associated with a main entity at all, so the sorted render phase The reworked patch still fixes the performance problems on |
|
Your PR caused a change in the graphical output of an example or rendering test. This might be intentional, but it could also mean that something broke! If it's expected, please add the M-Deliberate-Rendering-Change label. If this change seems unrelated to your PR, you can consider updating your PR to target the latest main branch, either by rebasing or merging main into it. |
1 similar comment
|
Your PR caused a change in the graphical output of an example or rendering test. This might be intentional, but it could also mean that something broke! If it's expected, please add the M-Deliberate-Rendering-Change label. If this change seems unrelated to your PR, you can consider updating your PR to target the latest main branch, either by rebasing or merging main into it. |
|
Your PR caused a change in the graphical output of an example or rendering test. This might be intentional, but it could also mean that something broke! If it's expected, please add the M-Deliberate-Rendering-Change label. If this change seems unrelated to your PR, you can consider updating your PR to target the latest main branch, either by rebasing or merging main into it. |
There was a problem hiding this comment.
LGTM! (once the comments are resolved)
I can confirm that bevy_city is running much better now. I went from 31 fps to 75fps with cars moving. And I can also now toggle shadow maps without issues other than an expected lag spike when it toggles.
There's definitely a lot of logic being duplicated but I'm not sure how to fully avoid it. At least the 2d/3d unifcation should cut that down a lot.
|
Your PR caused a change in the graphical output of an example or rendering test. This might be intentional, but it could also mean that something broke! If it's expected, please add the M-Deliberate-Rendering-Change label. If this change seems unrelated to your PR, you can consider updating your PR to target the latest main branch, either by rebasing or merging main into it. |
|
Now that this is reviewed, I added a migration guide. Should be good to go now. |
|
Your PR caused a change in the graphical output of an example or rendering test. This might be intentional, but it could also mean that something broke! If it's expected, please add the M-Deliberate-Rendering-Change label. If this change seems unrelated to your PR, you can consider updating your PR to target the latest main branch, either by rebasing or merging main into it. |
1 similar comment
|
Your PR caused a change in the graphical output of an example or rendering test. This might be intentional, but it could also mean that something broke! If it's expected, please add the M-Deliberate-Rendering-Change label. If this change seems unrelated to your PR, you can consider updating your PR to target the latest main branch, either by rebasing or merging main into it. |
|
As the example run just before the merge show, this broke some examples
I tested a few manually, the error seems to be |
|
Some of those are fixed by #23070 |
Some fixes for 2d post #22966. This is complicated because, unlike 3d, 2d material systems are generic over their material and not type erased. Which means we have to be defensive against all the weird scenarios that can happen when different material systems race against each other. I am not confident this is correct. The ultimate solution is just to get rid of all these bespoke 2d systems. The main fixes are: - `material_bind_group_id` is no longer reset to default every frame. `extract_mesh2d` was setting `material_bind_group_id` on every extraction. Previously this was overwritten later during queuing every frame, but with dirty tracking, entities aren't re-queued every frame so the default sticks. - `prepare_for_new_frame` moved to a non-generic system. - More careful about not adding wrong material type entities to the pending queue where they'll churn forever without getting specialized. - Dequeue no longer cross-type stomps. When a view is dirty, we were returning all visible entities, which was problematic in the context of multiple materials.
…memory. (#22988) Extracting meshes to the render world is done in two phases: first, Bevy does *extraction*, which pulls information from the main world ECS to thread-local buffers in the render world; second, Bevy does *collection*, which processes those buffers in parallel to update the GPU buffers and other information. Unfortunately, the `RenderMeshInstances` buffer that contains information that the CPU and GPU need to know to draw the meshes properly currently isn't thread-safe. Therefore, the parallel worker threads send completed data through a channel to a single consumer thread, which updates the `RenderMeshInstances` tables. This is a sequential bottleneck on large scenes (especially since `RenderMeshInstance` is bigger than it ought to be). This PR tackles the problem directly by making `RenderMeshInstances` partially thread-safe and allowing the worker threads to update mesh instance data directly, via shared memory. Note the use of "partially": the `RenderMeshInstances` buffer can still only grow to accommodate *new* meshes on a single thread with this patch. However, *existing* meshes can, with one exception (changing render layers), can be updated directly. The thread safety is accomplished via a new trait, `AtomicPod`, and a new buffer type, `AtomicBufferVec`. `AtomicPod` is a trait that describes a type that can be bitcast onto an array of `AtomicU32`s, known as the *blob* type. With only a single exception, this is implemented entirely in safe code, via `bytemuck`. This patch introduces a new helper macro, `impl_atomic_pod!`, that automates the implementation of `AtomicPod` types and provides accessors and mutators so that the blob type feels like the original POD type as much as possible. The single use of unsafe code is to upload the blob of `AtomicU32`s to the GPU, as no safe method can currently cast an `AtomicU32` array to a `u8` array to pass to `write_buffer`. The actual *conversion* between the POD type and the blob type is entirely safe code that's automatically generated. Note that on x86-64 and AArch64, a relaxed load and store to an `AtomicU32` location produces the exact same machine code as a regular load and store to a memory location. The downside of this PR, besides the minor inconvenience of accessing `RenderMeshInstances` through helper methods, is that *new* mesh instances can't be added to the `RenderMeshInstances` table while the workers are running anymore, only afterward. This could regress the performance in cases in which many new objects are queued. I believe this is an easily worthwhile tradeoff. However, we could improve the situation via heuristics (e.g. detecting the number of meshes that became visible all at once) in the future if we wanted to. On `bevy_city`, this PR increases the performance of `collect_meshes_for_gpu_building` from 7.74 ms to 4.03 ms, a 1.92x speedup. Most importantly, the sequential part of `collect_meshes_for_gpu_building` is entirely eliminated. Performance of a `bevy_city` frame with #22966 applied: <img width="2756" height="1800" alt="Screenshot 2026-02-16 202420" src="https://github.com/user-attachments/assets/a61b19c8-df98-4d8d-8cfc-4647ccf9990f" /> Notice the sequential mesh collection bottleneck. Before this PR and after: <img width="2756" height="1800" alt="Screenshot 2026-02-16 195459" src="https://github.com/user-attachments/assets/91595254-3506-4e6e-9fb8-af1827b3c970" /> --------- Co-authored-by: charlotte <charlotte.c.mcelwain@gmail.com>
# Objective In bevyengine#22966 the semantics of the `add` method on `SortedRenderPhase` was changed from the items being cleared at the end of the frame, to items being retained until they are removed. A new `add_transient` method was added with the old functionality, but this is a big migration hazard because it's a major semantic change that doesn't give any compiler errors or warnings. I discovered this after updating `bevy_vector_shapes` to the RC and seeing drawings not being cleared properly. ## Solution Rename `add` to `add_retained` both for clarity and to make old uses not compile, so the affected users will know to look in the migration guide. A sentence about this should be added to the migration guide if/when this is backported to the 0.19 release branch. ## Testing It compiles (hopefully).
# Objective In #22966 the semantics of the `add` method on `SortedRenderPhase` was changed from the items being cleared at the end of the frame, to items being retained until they are removed. A new `add_transient` method was added with the old functionality, but this is a big migration hazard because it's a major semantic change that doesn't give any compiler errors or warnings. I discovered this after updating `bevy_vector_shapes` to the RC and seeing drawings not being cleared properly. ## Solution Rename `add` to `add_retained` both for clarity and to make old uses not compile, so the affected users will know to look in the migration guide. A sentence about this should be added to the migration guide if/when this is backported to the 0.19 release branch. ## Testing It compiles (hopefully).
Right now, every frame, all specialization and queuing systems iterate over all entities visible from a view and check to see whether they need to be updated by consulting a set of change ticks and comparing them to the current change ticks. To handle cases in which a mesh needs to be removed from the bins, a separate final sweep pass then finds entities that no longer exist and removes them manually from the bins. This process is complex, error-prone, and slow, as it involves visiting all visible entities multiple times every frame.
This PR changes the setup so that, instead of examining change ticks, the visibility logic pushes the set of added and removed entities to each view explicitly. The visibility system determines which meshes need to be added and removed by first sorting the list of visible entities, then performing an O(n) diff process on the last frame's visible entities and this frame's visible entity list. The end result is that the specialization and queuing systems only process the entities that they need to every frame. If a mesh was visible last frame, remained visible this frame, and didn't change its mesh or material, then it's generally not examined at all. Not only is this significantly faster for virtually all realistic scenes, but it's also much simpler.
In order to achieve the benefits of not examining every visible mesh every frame, I made sorted render passes retained via an
IndexMap. This allows entities to be removed and added via random access while still allowing the list to be sorted by distance. Note that I had to remove the radix sort becauseIndexMapdoesn't currently support that; I believe the enormous speed benefits of this patch outweigh any minor sorting regressions from this.I tested this PR by running
scene_vieweron a test scene with many meshes and materials and implementing a material shuffler that randomly switches the materials around. I tested the following cases:Moving the camera so that meshes become visible and invisible.
Switching opaque materials on meshes.
Moving meshes from opaque to alpha masked and vice versa.
Moving meshes from binned render passes to sorted render passes (i.e. transparent).
All of the above while the meshes were off screen, then moving them on screen to ensure that the changes took effect.
This PR brings the
specialize_shadowstime on thebevy_citydemo from 12.87 ms per frame to 0.1261 ms per frame, a 102x speedup. It brings thequeue_shadowstime on the same demo from 12.34 ms per frame to 0.1102 ms, a 111x speedup. Mean frame time goes from 50.16 ms to 23.26 ms, a 2.16x speedup.specialize_shadowsinbevy_citybefore and after:queue_shadowsinbevy_citybefore and after:Frame graph of

bevy_citybefore:Frame graph of

bevy_cityafter: