AVL Trees – 4/4: Addition Features and Benchmarking
It is important to benchmark our code, especially general purpose algorithms and data structures. But first, some remarks about QoL features.
Reverse Iterator
Our reverse iterator is identical to the forwards iterator, except that the increment and decrement operators are the reverse of the ones of the forwards iterator. The Begin
function is also slightly different, here is what it looks like.
template <typename T>
ReverseIterator<T> RBegin(Node<T> *root)
{
ReverseIterator<T> it {root};
move_it_rightmost(&it);
return it;
}
Fancy typedefs
It’s useful to provide some easily accessible type names scoped in the Node
class.
template <typename T>
struct Node {
typedef T value_type;
typedef T& reference;
typedef T&& rv_reference;
typedef T const& const_reference;
typedef Iterator<T> iterator;
typedef Iterator<T const> const_iterator;
typedef ReverseIterator<T> reverse_iterator;
typedef ReverseIterator<T const> const_reverse_iterator;
// stuff...
};
Benchmarks
We will benchmark four methods:
- Find
- Insertion
- Deletion
- Iteration
There are many tools/frameworks you could use to benchmark, we will simply time the code with the STL high_precision_clock
. We can compare our AVL tree to std::set
, which is typically (but not always) implemented as a Red-Black tree. I am testing this on Ubuntu 16.04 with g++ 6.3, with no compiler optimization flags.
If you don’t know what a red black tree is, it supports the same methods with the same time complexity as an AVL tree, but usually AVL trees are faster at finding nodes and slower at inserting/deleting them. That is roughly how we should expect the performances to compare. That being said, it is almost certain that specific optimizations/tradeoffs were made to std::set
, so our expectations will probably be wrong.
If you want to run the benchmarking code, cd into the root directory (/AVL) of the sample code and run:
make benchmark
Find
Here is the results of the Find
. This is roughly what we expected if we were comparing AVL trees to Red Black trees. Our code runs about 3-3.5 times faster.
------------------------------- Find ------------------------------
Number of Nodes -- AVL -- STL
100000 -- 0.0124s -- 0.0497s
1000000 -- 0.1525s -- 0.5591s
10000000 -- 2.1993s -- 6.8327s
Insertion
So this is somewhat unexpected, because our tree is actually faster. It is definitely possible that std::set
adds additional overhead to optimize for iterations, which for most purposes is the most frequent use case of the container. If that is the case, our code will run faster for deletion as well.
----------------------------- Insertion ---------------------------
Number of Nodes -- AVL -- STL
100000 -- 0.0488s -- 0.0845s
1000000 -- 0.6065s -- 0.9933s
10000000 -- 7.6535s -- 11.719s
Deletion
Yep, now the question is just how much faster std::set
will be at iteration.
----------------------------- Deletion ----------------------------
Number of Nodes -- AVL -- STL
100000 -- 0.0348s -- 0.0833s
1000000 -- 0.3994s -- 0.9957s
10000000 -- 4.8205s -- 11.396s
Iteration
Uhhhhhhhhhhhhhhhh nothing to see here.
----------------------------- Iteration ---------------------------
Number of Nodes -- AVL -- STL
100000 -- 0.0036s -- 3.36e-07s
1000000 -- 0.0358s -- 3.25e-07s
10000000 -- 0.3752s -- 3.42e-07s
…
Wait, the STL times aren’t changing. Is the compiler optimizing the loop away? When I change the benchmark function, to the code below, the results don’t change much (slight increase in runtime, but still no correlation between number of nodes and runtime).
I don’t think the compiler would optimize this loop away:
// in STL benchmark func
volatile ulong sum = 0;
auto start = chrono::high_resolution_clock::now();
for (;it != stl_set.end(); ++it) { sum += 1; }
auto end = chrono::high_resolution_clock::now();
In fact, the results of STL iteration is so fast, the values are probably smaller than the uncertainity of high_resolution_clock
. What happens if I time nothing?
auto start = chrono::high_resolution_clock::now();
auto end = chrono::high_resolution_clock::now();
cout << chrono::duration<double>(end - start).count() << "s\n";
Result: 3.95e-07s
Hmm, I’m not sure what to make of this. It could be that std::set
is even faster than e-7 (certainly possible), or the compiler may still be optmizing the loop away. I’d have to (learn x86) and go have a look at the assembly.
Hmm, I’m not sure what to make of this. It could be that std::set
is even faster than e-7, or the compiler may still be optmizing the loop away. I’d have to go have a look at the assembly.
Conclusion
That’s basically it for AVL trees. If you want to challenge yourself, see if you can implement the optimizations hinted at through out the tutorial, or modify the design of the tree to make iteration faster. If you think this was actually interesting, or if I made a mistake somewhere, leave a comment (or don’t because I haven’t set that up yet), and I’ll continue implementing random data structures or algorithms. Thanks for reading.