Improved performance

<
>
June 26, 2022

Since due to the addition of Generators I now could easily create very large factories and noticed some performance issues, which I spent some time on to resolve them. I used both the Generator described in the previous post and cargo bench with criterion and a simple Belt ‘chain’ simulation similar to the Generator output.

Removal of StructureMutHandle

The simulator requires mutable access to Structures to both call tick() and to perform interactions between two Structures such as an Arm placing Items on a Belt. But this mutable access also means that it would be possible for the simulator to e.g. replace an Arm with a Furnace. Doing so would invalidate other data such as the occupied positions or the visibility tree.
To make this impossible I previously introduced StructureMutHandle implementing both AsRef<Structure> and AsMut<Structure>, which panics on destruction in case the size or type of the wrapped Structure changes. Containers would then only return StructureMutHandles instead of &mut Structure.
This makes it impossible to accidentally introduce such bugs.
Sadly this was also slow, which is why I removed this as part of the optimization. This change reduced the simulation time by around 43%:

simulate tick           time:   [11.628 ms 11.644 ms 11.661 ms]
                        change: [-43.566% -43.213% -42.950%] (p = 0.00 < 0.05)

Interaction pairs in Vec

For the Simulation I’m keeping track of ‘Interaction pairs’, which is a pair of StructureIndex of Structures that might interact with each other.
If an Arm of index 3 points into a Furnace of index 17, the pair (3, 17) is created.
To keep those unique I was using a BTreeSet. Since these pairs are visited on every iteration tick, it’s very important that this is done quickly.
I therefore switched to a Vec to keep track of those pairs.
To ensure they still remain unique, I’m first filling the BTreeSet as before, but then copy the data over to the Vec for later read access.

pub struct StructContainer {
    ...
-    interaction_pairs: BTreeSet<(StructureIndex, StructureIndex)>,
+    interaction_pairs: Vec<(StructureIndex, StructureIndex)>,
+    interaction_pairs_set: BTreeSet<(StructureIndex, StructureIndex)>,
     interaction_pairs_dirty: bool,
     ...
}

This change reduced the time by roughly 8%:

simulate tick           time:   [10.681 ms 10.715 ms 10.755 ms]
                        change: [-8.2976% -7.9761% -7.6246%] (p = 0.00 < 0.05)

Directional interaction pairs

Interaction pairs were stored without a ‘direction’. They were stored as (LOWER_INDEX, HIGHER_INDEX), so both 3 -> 17 and 17 -> 3 would create the entry (3, 17).
On simulation step the entry (3, 17) would cause interactions in both directions. This was very wasteful especially for uni-directional interaction partners.
I therefore changed the code to potentially store both directions of an interaction and to assume completeness during the simulation step.
This reduced the time by yet another ~34%:

simulate tick           time:   [7.0449 ms 7.0765 ms 7.1145 ms]
                        change: [-34.347% -33.960% -33.576%] (p = 0.00 < 0.05)

BitVec ‘in influence range’

Since Structures have to be within Influence range to function, I’m keeping track of that in a BitVec.
I assumed that there’s some overhead to convert from single bits to bools and changed this to a simple Vec<bool>. I still was very surprised by just how much this affected the speed:

simulate tick           time:   [4.2800 ms 4.2973 ms 4.3182 ms]
                        change: [-39.679% -39.273% -38.888%] (p = 0.00 < 0.05)

Yet another >39% reduction in time, just by switching to Vec<bool>.

In total

Comparing the baseline commit with latest:

simulate tick           time:   [4.1768 ms 4.1903 ms 4.2055 ms]
                        change: [-79.615% -79.500% -79.395%] (p = 0.00 < 0.05)

Nearly a 5x speedup just from the changes described above.

Calculation of ‘in influence range’

Due to the introduction of Generators I noticed that the addition of Structures becomes very slow once there are many.
I still had the following code running to recalculate the ‘in influence’ state:

pub fn pos_in_influence_range(&self, ms: &MS, pos: &Pos<i64>) -> bool {
    //@todo inefficient
    self.elements().iter().any(|element| match &element.sm {
        ...

To check if a position is ‘in influence range’, all Structures were iterated. This is especially bad when re-calculating the flags: For all occupied positions of every single Structure, all Structures were visited. This has quadratic complexity and explains the slowdown I noticed.
To fix this I added yet another search tree for Influence:

...
pub struct InfluenceElement {
    pub pos: Pos<f64>,
    pub range: u8,
    pub index: StructureIndex,
}

impl InfluenceElement {
    pub fn new(ms: &MS, index: StructureIndex, e: &ContainerElement) -> Option<Self> {
        let range = e.sm.influence_range();
        if range > 0 {
            Some(Self {
                pos: e.center(ms),
                range,
                index,
            })
        } else {
            None
        }
    }
}

impl HasBoundingBox2DMaybe for InfluenceElement {
    fn bounding_box_maybe(&self) -> Option<BoundingBox2D> {
        ...
    }
}
impl HasBoundingBox2D for InfluenceElement {
    fn bounding_box(&self) -> BoundingBox2D {
        ...
    }
}
...
pub struct StructContainer {
    ...
    influence_tree: Cell<AABBTree2D<InfluenceElement>>,
    influence_tree_dirty: Cell<bool>,

    in_influence_range: Cell<Vec<bool>>,
    in_influence_range_dirty: Cell<bool>,
    ...
}

I also changed the checks to work with areas instead of checking positions individually:

pub fn fully_in_influence_range(&self, ms: &MS, area: Area) -> bool {
    self.update_influence_tree(ms);
    let influence_tree = self.influence_tree.take();

    let bb = BoundingBox2D::new(
        &Point2D::new(area.min_x, area.min_y),
        &Point2D::new(area.max_x, area.max_y),
    )
    .unwrap_condition(); //we assume area never 0 sized

    let result = influence_tree.any(&|element| {
        let bb_element = element.bounding_box();
        bb == bb_element || bb_element.has_inside(&bb)
    });

    self.influence_tree.set(influence_tree);

    result
}
...
fn update_influence_tree(&self, ms: &MS) {
    if !self.influence_tree_dirty.get() {
        return;
    }

    ...

    self.influence_tree_dirty.set(false);
}

Add or Remove of Structures now performs way better than before.
Several dirty flags are still set on Add / Remove causing full rebuilts. In the future I’d like to instead incrementally update the trees, which should greatly improve the performance.