Since due to the addition of Generator
s I now could easily create very large factories and noticed some performance issues,
which I spent some time on to resolve them. I used both the Generator
described in the previous post and cargo bench
with criterion and a simple Belt
‘chain’ simulation similar to the Generator
output.
Removal of StructureMutHandle
The simulator requires mutable access to Structure
s to both call tick()
and to perform interactions between two Structures
such as an Arm
placing Item
s on a Belt
.
But this mutable access also means that it would be possible for the simulator to e.g. replace an Arm
with a Furnace
. Doing so would invalidate other data such as the occupied positions or the visibility tree.
To make this impossible I previously introduced StructureMutHandle
implementing both AsRef<Structure>
and AsMut<Structure>
, which panics on destruction in case the size or type of the wrapped Structure
changes. Containers would then only return StructureMutHandle
s instead of &mut Structure
.
This makes it impossible to accidentally introduce such bugs.
Sadly this was also slow, which is why I removed this as part of the optimization. This change reduced the simulation time by around 43%
:
simulate tick time: [11.628 ms 11.644 ms 11.661 ms]
change: [-43.566% -43.213% -42.950%] (p = 0.00 < 0.05)
Interaction pairs in Vec
For the Simulation I’m keeping track of ‘Interaction pairs’, which is a pair of StructureIndex
of Structures that might interact with each other.
If an Arm
of index 3
points into a Furnace
of index 17
, the pair (3, 17)
is created.
To keep those unique I was using a BTreeSet
. Since these pairs are visited on every iteration tick, it’s very important that this is done quickly.
I therefore switched to a Vec
to keep track of those pairs.
To ensure they still remain unique, I’m first filling the BTreeSet
as before, but then copy the data over to the Vec
for later read access.
pub struct StructContainer {
...
- interaction_pairs: BTreeSet<(StructureIndex, StructureIndex)>,
+ interaction_pairs: Vec<(StructureIndex, StructureIndex)>,
+ interaction_pairs_set: BTreeSet<(StructureIndex, StructureIndex)>,
interaction_pairs_dirty: bool,
...
}
This change reduced the time by roughly 8%
:
simulate tick time: [10.681 ms 10.715 ms 10.755 ms]
change: [-8.2976% -7.9761% -7.6246%] (p = 0.00 < 0.05)
Directional interaction pairs
Interaction pairs were stored without a ‘direction’. They were stored as (LOWER_INDEX, HIGHER_INDEX)
, so both 3 -> 17
and 17 -> 3
would create the entry (3, 17)
.
On simulation step the entry (3, 17)
would cause interactions in both directions. This was very wasteful especially for uni-directional interaction partners.
I therefore changed the code to potentially store both directions of an interaction and to assume completeness during the simulation step.
This reduced the time by yet another ~34%
:
simulate tick time: [7.0449 ms 7.0765 ms 7.1145 ms]
change: [-34.347% -33.960% -33.576%] (p = 0.00 < 0.05)
BitVec
‘in influence range’
Since Structure
s have to be within Influence
range to function, I’m keeping track of that in a BitVec
.
I assumed that there’s some overhead to convert from single bits to bools and changed this to a simple Vec<bool>
. I still was very surprised by just how much this affected the speed:
simulate tick time: [4.2800 ms 4.2973 ms 4.3182 ms]
change: [-39.679% -39.273% -38.888%] (p = 0.00 < 0.05)
Yet another >39%
reduction in time, just by switching to Vec<bool>
.
In total
Comparing the baseline commit with latest:
simulate tick time: [4.1768 ms 4.1903 ms 4.2055 ms]
change: [-79.615% -79.500% -79.395%] (p = 0.00 < 0.05)
Nearly a 5x
speedup just from the changes described above.
Calculation of ‘in influence range’
Due to the introduction of Generator
s I noticed that the addition of Structure
s becomes very slow once there are many.
I still had the following code running to recalculate the ‘in influence’ state:
pub fn pos_in_influence_range(&self, ms: &MS, pos: &Pos<i64>) -> bool {
//@todo inefficient
self.elements().iter().any(|element| match &element.sm {
...
To check if a position is ‘in influence range’, all Structure
s were iterated. This is especially bad when re-calculating the flags: For all occupied positions of every single Structure
, all Structure
s were visited. This has quadratic complexity and explains the slowdown I noticed.
To fix this I added yet another search tree for Influence
:
...
pub struct InfluenceElement {
pub pos: Pos<f64>,
pub range: u8,
pub index: StructureIndex,
}
impl InfluenceElement {
pub fn new(ms: &MS, index: StructureIndex, e: &ContainerElement) -> Option<Self> {
let range = e.sm.influence_range();
if range > 0 {
Some(Self {
pos: e.center(ms),
range,
index,
})
} else {
None
}
}
}
impl HasBoundingBox2DMaybe for InfluenceElement {
fn bounding_box_maybe(&self) -> Option<BoundingBox2D> {
...
}
}
impl HasBoundingBox2D for InfluenceElement {
fn bounding_box(&self) -> BoundingBox2D {
...
}
}
...
pub struct StructContainer {
...
influence_tree: Cell<AABBTree2D<InfluenceElement>>,
influence_tree_dirty: Cell<bool>,
in_influence_range: Cell<Vec<bool>>,
in_influence_range_dirty: Cell<bool>,
...
}
I also changed the checks to work with areas instead of checking positions individually:
pub fn fully_in_influence_range(&self, ms: &MS, area: Area) -> bool {
self.update_influence_tree(ms);
let influence_tree = self.influence_tree.take();
let bb = BoundingBox2D::new(
&Point2D::new(area.min_x, area.min_y),
&Point2D::new(area.max_x, area.max_y),
)
.unwrap_condition(); //we assume area never 0 sized
let result = influence_tree.any(&|element| {
let bb_element = element.bounding_box();
bb == bb_element || bb_element.has_inside(&bb)
});
self.influence_tree.set(influence_tree);
result
}
...
fn update_influence_tree(&self, ms: &MS) {
if !self.influence_tree_dirty.get() {
return;
}
...
self.influence_tree_dirty.set(false);
}
Add
or Remove
of Structure
s now performs way better than before.
Several dirty flags are still set on Add
/ Remove
causing full rebuilts. In the future I’d like to instead incrementally update the trees, which should greatly improve the performance.