Improved render performance

<
>
April 9, 2022

While creating larger and larger factories during my tests I noticed that the rendering became rather slow when there’s many visible Structures.
The factory I was playing with initially took around 50ms to render, after all optimizations described below I got it down to about 10ms. So there’s a x5 improvement in the render performance.

The NodeRenderer

Initially all render Nodes implemented a render() function which would use OpenGL to render the content of the Node.
While a BranchNode would call render() of all its child Nodes.
This made it impossible to alter the render order without having to change the Node structure, since BranchNodes would just render their children in the order they were added. This caused a lot of state changes regarding used textures and shaders.
This approach also made it impossible to perform back to front rendering to support transparency. I had to be very careful in which order I’d push child nodes to not break anything there (and in some cases transparency still remained broken).

I therefore dropped the render() function from Nodes and introduced a dedicated NodeRenderer. It can render any Node passed to it. Those Nodes usually are ‘root’ Nodes.

impl NodeRenderer {
    ...
    pub fn render(
        &self,
        screen_size: Size<Screen<f64>>,
        view: &[f32; 16],
        perspective: &[f32; 16],
        node: &Node,
    ) {
        ...
    }
    ...

With its generic traverse function that accepts callbacks for every Node type:

    fn traverse<FTS, FTW, FC, FT, FMT>(
        ...
        functions: &mut TraverseFunctions<FTS, FTW, FC, FT, FMT>,
        node: &Node,
    ) where
        FTS: FnMut(&State, &TextNodeScreen),
        FTW: FnMut(&State, &TextNodeWorld),
        FC: FnMut(&State, &ColorNode),
        FT: FnMut(&State, &TextureNode),
        FMT: FnMut(&State, &MultiTextureNode),
    {
        match node {
            Node::TextScreen(x) => (functions.text_screen)(state, x),
            Node::TextWorld(x) => (functions.text_world)(state, x),
            Node::Color(x) => (functions.color)(state, x),
            Node::Texture(x) => (functions.texture)(state, x),
            Node::MultiTexture(x) => (functions.multi_texture)(state, x),
            Node::Branch(x) => {
                ...
                for child in x.children.iter() {
                    Self::traverse(&child_state, functions, child.as_ref())
                }
            }
        }
    }

It’s now possible to both analyze nodes and render some of them. This way one can now for example only render nodes that display text, or only render nodes with a specific Depth for back to front rendering.
This fixes both the transparency issues mentioned above and can be used to minimize state changes for shaders and textures.
Some examples:

// Somewhere in NodeRenderer::render()
let mut render_text = TraverseFunctions {
    text_screen: |s, n| self.render_text_screen(s, n),
    text_world: |s, n| self.render_text_world(s, n),
    color: |_s, _n| (),
    texture: |_s, _n| (),
    multi_texture: |_s, _n| (),
};

Self::traverse(&state, &mut render_text, node);
...
let mut render_colors_opague = TraverseFunctions {
    text_screen: |_s, _n| (),
    text_world: |_s, _n| (),
    color: |s, n| {
        if n.is_opaque(s) {
            self.render_color(s, n)
        }
    },
    texture: |_s, _n| (),
    multi_texture: |_s, _n| (),
};

Self::traverse(&state, &mut render_colors_opague, node);
...
for (depth, count) in self.depth_counts.borrow().iter().rev() {
    if count.n_color > 0 {
        let mut render_colors_transparent_w_depth = TraverseFunctions {
            text_screen: |_s, _n| (),
            text_world: |_s, _n| (),
            color: |s, n| {
                if n.is_transparent(s) && f32::from(n.abs_depth(s)) == depth.0 {
                    self.render_color(s, n)
                }
            },
            texture: |_s, _n| (),
            multi_texture: |_s, _n| (),
        };

        Self::traverse(&state, &mut render_colors_transparent_w_depth, node);
...

MultiTextureNode creation

For rendering the Tiles of a Planet I introduced the MultiTextureNode. It holds many (Texture, Matrix) pairs and can be used to efficiently render those.
Thanks to the NodeRenderer it’s now possible to create a MultiTextureNode from TextureNodes when it makes sense.
For example if all TextureNodes have the same Depth and there’s at least a specific number of such nodes.

for (depth, count) in self.depth_counts.borrow().iter().rev() {
    ...
    if count.n_texture > 0 {
        if count.n_texture < CONVERT_TO_MULTI_TEXTURE_COUNT.into() {
            ...
        } else {
            ...
            let mut collect_texture_tiles_of_depth = TraverseFunctions {
                text_screen: |_s, _n| (),
                text_world: |_s, _n| (),
                color: |_s, _n| (),
                texture: |s, n| {
                    if f32::from(n.abs_depth(s)) == depth.0 {
                        let model = n
                            .transformation
                            .as_ref()
                            .map(|x| x.clone() * s.model.clone())
                            .unwrap_or(s.model.clone());
                        self.multi_texture
                            .borrow_mut()
                            .textures
                            .push((n.texture, model.transposed())) //@todo should not need transposed here
                    }
                },
                multi_texture: |_s, _n| (),
            };

            Self::traverse(&state, &mut collect_texture_tiles_of_depth, node);

            let node = self.multi_texture.borrow();
            self.render_multi_texture(&state, &node);
        }

Texture Atlas

In the past I introduced a texture atlas for more efficient rendering.
It is now used in all cases. Factor Y now uses a single texture for all ‘texture’ render operations.

Reduced BranchNode count

Since every BranchNode holds a Vec of its child nodes, allocations are required for creating it.
If there’s a lot of BranchNode nesting the data isn’t tightly packed and becomes inefficient.
I tried to reduce the number of BranchNodes where possible, further improving the performance.

Hide Items if zoomed out

I added a new render mode where Structure‘s Items aren’t rendered. So e.g. Belts are shown as empty.
This mode is only enabled when zoomed out quite far and barely noticable.
But this reduces the number of render Nodes and therefore improves the render performance.