Skip to content

Commit

Permalink
LightIntervalTree now performs a linear scan on small subtrees
Browse files Browse the repository at this point in the history
README updated with new perfomance numbers
  • Loading branch information
jamarino committed Jul 18, 2023
1 parent 44d2b41 commit 3b78812
Show file tree
Hide file tree
Showing 2 changed files with 49 additions and 30 deletions.
24 changes: 14 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,8 @@ tree.Query(150); // result is {1, 2, 3}

See performance section further down for more details.

The following graphs are based on benchmarks of trees with 250000 dense intervals.

### Query performance

```mermaid
Expand All @@ -36,7 +38,7 @@ gantt
section Quick
13.30 mil : 0, 13304949
section Light
8.275 mil : 0, 8275405
9.653 mil : 0, 9653441
section Reference
1.692 mil : 0, 1692734
```
Expand Down Expand Up @@ -145,17 +147,17 @@ Loading data into `LightIntervalTree` and `QuickIntervalTree` is not only quicke

| Method | TreeType | DataType | Mean | Allocated |
|--------|-----------|----------|-----------:|----------:|
| Query | light | dense | 120.84 ns | 107 B |
| Query | light | medium | 90.18 ns | 50 B |
| Query | light | sparse | 72.14 ns | 14 B |
| Query | light | dense | 103.59 ns | 107 B |
| Query | light | medium | 80.23 ns | 50 B |
| Query | light | sparse | 66.03 ns | 14 B |
| Query | quick | dense | 75.16 ns | 107 B |
| Query | quick | medium | 62.57 ns | 50 B |
| Query | quick | sparse | 52.13 ns | 14 B |
| Query | reference | dense | 590.76 ns | 1,256 B |
| Query | reference | medium | 454.76 ns | 996 B |
| Query | reference | sparse | 321.63 ns | 704 B |

`LightIntervalTree` is about 4-5 times quicker to query. `QuickIntervalTree` manages 6-8 times faster queries, and pulls ahead in dense datasets.
`LightIntervalTree` is about 4-6 times quicker to query. `QuickIntervalTree` manages 6-8 times faster queries, and pulls ahead in dense datasets.

## Thread Safety

Expand All @@ -177,13 +179,15 @@ When using trees in a concurrent environment, please be sure to initialise the t
A few key design decisions were made to reduce the memory usage.

1. Avoid keeping duplicate data
* `RangeTree` keeps a full copy of intervals, in case the tree needs to be rebuilt following the addition or removal of an interval. `LightIntervalTree` only stores intervals as part of the underlying tree structure.
* `RangeTree` keeps a full, unused copy of intervals, in case the tree needs to be rebuilt following the addition or removal of an interval.
* `LightIntervalTree` only stores intervals once, embedding tree information directly into the stored intervals.
* `QuickIntervalTree` directly uses the stored intervals, but also duplicates part of the intervals in order to store a reverse-order, needed to optimize searching.
1. Model tree nodes as value types (`struct`) rather than objects (`class`)
* Objects suffer memory overhead in the form of type and method information
* Since `struct`s cannot reference themselves an index (`int`) is used to reference other nodes
1. Store nodes and intervals in indexable arrays, use indexes rather than references as pointers
* Pointers in 64-bit systems take up 8 bytes of storage, `int`s only take 4 bytes
* Storing value types in Lists/Arrays may improve CPU caching since elements are co-located
1. Nodes store their intervals in linked lists
* Nodes use indexes to point to the first interval in their list. Each interval stores an additional index pointing to the next interval (if present) to form a "linked list".
* For sparse trees this means that the majority of nodes will be storing two ints (one in the node and one in the single interval for that node) as opposed to allocating a 1-length array and storing an 8 byte pointer to said array.
* Storing value types in Lists/Arrays improves CPU caching since elements are co-located in memory
1. Nodes reference their intervals by index and length
* Rather than allocating an array object for each node to store intervals in, all intervals are stored in a single array. All related intervals are grouped, and each node keeps an index and count to point to the related intervals.
* For sparse trees this means that the majority of nodes will be storing two ints as opposed to allocating a 1-length array and storing an 8 byte pointer to said array.
55 changes: 35 additions & 20 deletions Source/IntervalTree/LightIntervalTree.cs
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
using System;
using System.Collections;

namespace Jamarino.IntervalTree;
Expand Down Expand Up @@ -46,36 +45,52 @@ public IEnumerable<TValue> Query(TKey target)
var max = stack[stackIndex--];
var min = stack[stackIndex--];

var center = (min + max + 1) / 2;
var interval = _intervals[center];
var span = max - min;
if (span < 6) // At small subtree sizes a linear scan is faster
{
for (var i = min; i <= max; i++)
{
var interval = _intervals[i];

var compareMax = target.CompareTo(interval.Max);
if (compareMax > 0) continue; // target larger than Max, bail
var compareFrom = target.CompareTo(interval.From);
if (compareFrom < 0)
break;

if (center - min > 0)
var compareTo = target.CompareTo(interval.To);
if (compareTo > 0)
continue;

results ??= new List<TValue>();
results.Add(interval.Value);
}
}
else
{
var center = (min + max + 1) / 2;
var interval = _intervals[center];

var compareMax = target.CompareTo(interval.Max);
if (compareMax > 0) continue; // target larger than Max, bail

// search left
stack[++stackIndex] = min;
stack[++stackIndex] = center - 1;
}
stack[++stackIndex] = center - 1;

// check current node
var compareFrom = target.CompareTo(interval.From);
var compareTo = target.CompareTo(interval.To);
// check current node
var compareFrom = target.CompareTo(interval.From);
var compareTo = target.CompareTo(interval.To);

if (compareFrom >= 0 && compareTo <= 0)
{
results ??= new List<TValue>();
results.Add(interval.Value);
}
if (compareFrom >= 0 && compareTo <= 0)
{
results ??= new List<TValue>();
results.Add(interval.Value);
}

if (compareFrom < 0) continue; // target smaller than From, bail
if (compareFrom < 0) continue; // target smaller than From, bail

if (max - center > 0)
{
// search right
stack[++stackIndex] = center + 1;
stack[++stackIndex] = max;
stack[++stackIndex] = max;
}
}

Expand Down

0 comments on commit 3b78812

Please sign in to comment.