While creating larger and larger factories during my tests I noticed that the rendering became rather slow when there’s many visible Structure
s.
The factory I was playing with initially took around 50ms
to render, after all optimizations described below I got it down to about 10ms
. So there’s a x5
improvement in the render performance.
The NodeRenderer
Initially all render Node
s implemented a render()
function which would use OpenGL to render the content of the Node
.
While a BranchNode
would call render()
of all its child Node
s.
This made it impossible to alter the render order without having to change the Node
structure, since BranchNode
s would just render their children in the order they were added. This caused a lot of state changes regarding used textures and shaders.
This approach also made it impossible to perform back to front rendering to support transparency. I had to be very careful in which order I’d push child nodes to not break anything there (and in some cases transparency still remained broken).
I therefore dropped the render()
function from Node
s and introduced a dedicated NodeRenderer
. It can render any Node
passed to it. Those Node
s usually are ‘root’ Node
s.
impl NodeRenderer {
...
pub fn render(
&self,
screen_size: Size<Screen<f64>>,
view: &[f32; 16],
perspective: &[f32; 16],
node: &Node,
) {
...
}
...
With its generic traverse
function that accepts callbacks for every Node
type:
fn traverse<FTS, FTW, FC, FT, FMT>(
...
functions: &mut TraverseFunctions<FTS, FTW, FC, FT, FMT>,
node: &Node,
) where
FTS: FnMut(&State, &TextNodeScreen),
FTW: FnMut(&State, &TextNodeWorld),
FC: FnMut(&State, &ColorNode),
FT: FnMut(&State, &TextureNode),
FMT: FnMut(&State, &MultiTextureNode),
{
match node {
Node::TextScreen(x) => (functions.text_screen)(state, x),
Node::TextWorld(x) => (functions.text_world)(state, x),
Node::Color(x) => (functions.color)(state, x),
Node::Texture(x) => (functions.texture)(state, x),
Node::MultiTexture(x) => (functions.multi_texture)(state, x),
Node::Branch(x) => {
...
for child in x.children.iter() {
Self::traverse(&child_state, functions, child.as_ref())
}
}
}
}
It’s now possible to both analyze nodes and render some of them. This way one can now for example only render nodes that display text, or only render nodes with a specific Depth
for back to front rendering.
This fixes both the transparency issues mentioned above and can be used to minimize state changes for shaders and textures.
Some examples:
// Somewhere in NodeRenderer::render()
let mut render_text = TraverseFunctions {
text_screen: |s, n| self.render_text_screen(s, n),
text_world: |s, n| self.render_text_world(s, n),
color: |_s, _n| (),
texture: |_s, _n| (),
multi_texture: |_s, _n| (),
};
Self::traverse(&state, &mut render_text, node);
...
let mut render_colors_opague = TraverseFunctions {
text_screen: |_s, _n| (),
text_world: |_s, _n| (),
color: |s, n| {
if n.is_opaque(s) {
self.render_color(s, n)
}
},
texture: |_s, _n| (),
multi_texture: |_s, _n| (),
};
Self::traverse(&state, &mut render_colors_opague, node);
...
for (depth, count) in self.depth_counts.borrow().iter().rev() {
if count.n_color > 0 {
let mut render_colors_transparent_w_depth = TraverseFunctions {
text_screen: |_s, _n| (),
text_world: |_s, _n| (),
color: |s, n| {
if n.is_transparent(s) && f32::from(n.abs_depth(s)) == depth.0 {
self.render_color(s, n)
}
},
texture: |_s, _n| (),
multi_texture: |_s, _n| (),
};
Self::traverse(&state, &mut render_colors_transparent_w_depth, node);
...
MultiTextureNode creation
For rendering the Tile
s of a Planet
I introduced the MultiTextureNode
. It holds many (Texture, Matrix)
pairs and can be used to efficiently render those.
Thanks to the NodeRenderer
it’s now possible to create a MultiTextureNode
from TextureNode
s when it makes sense.
For example if all TextureNode
s have the same Depth
and there’s at least a specific number of such nodes.
for (depth, count) in self.depth_counts.borrow().iter().rev() {
...
if count.n_texture > 0 {
if count.n_texture < CONVERT_TO_MULTI_TEXTURE_COUNT.into() {
...
} else {
...
let mut collect_texture_tiles_of_depth = TraverseFunctions {
text_screen: |_s, _n| (),
text_world: |_s, _n| (),
color: |_s, _n| (),
texture: |s, n| {
if f32::from(n.abs_depth(s)) == depth.0 {
let model = n
.transformation
.as_ref()
.map(|x| x.clone() * s.model.clone())
.unwrap_or(s.model.clone());
self.multi_texture
.borrow_mut()
.textures
.push((n.texture, model.transposed())) //@todo should not need transposed here
}
},
multi_texture: |_s, _n| (),
};
Self::traverse(&state, &mut collect_texture_tiles_of_depth, node);
let node = self.multi_texture.borrow();
self.render_multi_texture(&state, &node);
}
Texture Atlas
In the past I introduced a texture atlas for more efficient rendering.
It is now used in all cases. Factor Y
now uses a single texture for all ‘texture’ render operations.
Reduced BranchNode count
Since every BranchNode
holds a Vec
of its child nodes, allocations are required for creating it.
If there’s a lot of BranchNode
nesting the data isn’t tightly packed and becomes inefficient.
I tried to reduce the number of BranchNode
s where possible, further improving the performance.
Hide Items if zoomed out
I added a new render mode where Structure
‘s Item
s aren’t rendered. So e.g. Belt
s are shown as empty.
This mode is only enabled when zoomed out quite far and barely noticable.
But this reduces the number of render Node
s and therefore improves the render performance.