What are Segment Trees?
Binary Indexed Trees were cool, but it only did sum queries :(
Segment Trees are meant to be able to do minimum, maximum, and sum queries which is pretty awesome
It also works in O(log n) time for queries and updating the array in between these queries just like Binary Indexed Trees
The downside though is it’ll require more memory, and it’s a little bit harder to implement
For me personally, while it is harder to implement, I found it much easier to understand than Binary Indexed Trees
How a Segment Tree works
A Segment Tree at it’s core is just a binary tree
Each parent, will have 2 child nodes or less, but for our implementation we’ll make sure they all have exactly 2 child nodes
In order to achieve this we have to make sure the length of the array is a power of 2, so it can be 1, 2, 4, 8, etc.
If it isn’t, and it has something like 6 elements, then we can just add 2 elements to the end with values of 0
Let’s use this array as an example: [5, 8, 6, 3, 2, 7, 2, 6]
Here’s how we’ll represent it as a tree:
Keep in mind that this tree is only meant for sum queries it won’t work for minimum or maximum queries, but we’ll see how to do those later
In the tree you’ll also see that the parent node is simply the sum of the two children nodes
Let’s say we want to get the sum from index 2 to index 7 (inclusive, and it’s 0 indexed):
Since we’ve computed the sums for both of the child nodes we can basically split up the array into ranges for which we’ve computed the values
In this example we’ split it up into the first range with 6 and 3 who’s sum is 9, and then the other range 2, 7, 2, and 6 who’s sum is 17
Since we know both of these sums we just add them together and get 26
Updating values is also quite simple
Let’s say we want to update the value at index 5 with value 7 to be a 10
We can first change that value:
After that we can move up a level to update the parent node:
And so on until we reach the root node:
Boom, we’ve updated the tree in O(log n) time!
Awesome stuff
Now, it’s time to implement it
Implementing a Segment Tree
Sum Queries
So far we’ve visually been looking at a tree, but from the code, it’ll all be represented as an array
Here’s what it would look like (let’s call this array tree):
Notice that it’s one indexed, but in the code you can just add an empty element at index 0
The way the tree is represented here is that tree[1] is the root node, it’s child nodes are tree[2] and tree[3], then the child nodes of tree[2] are tree[4] and tree[5], and so on
Here are some important things for traversing the tree:
- Getting to the two child nodes — You can use tree[2n] and tree[2n + 1] where n is the index of the current element. Let’s say n is 2 with value 22, it’s children are then tree[4] and tree[5] which have values 13 and 9, which is the correct answer
- Going up levels — To go from the current node to it’s parent node we can use tree[n / 2] where n is the index of the current element. If n is 8 with value 5 then tree[n / 2] is 4 with value 13 which is the parent
Using this we can use the following code to get the sum within the range of a to b:
int sum(int a, int b) {
a += n; b += n;
int s = 0;
while (a <= b) {
if (a % 2 == 1) s += tree[a++];
if (b % 2 == 0) s += tree[b--];
a /= 2; b /= 2;
}
return s;
} Let’s go through it
We start off by doing a +=n and b +=n this is simply making this range start at the very last level
Let’s say we want to find the range from index 3 to 7, in that case a would become 11 and b would become 15 since the length of the array, n, is 8:
We also have a variable s which contains the sum
Then we continue in the loop while the variable a is less than b because if the pointer a goes to the right of b it means we’ve reached the root node, and we’re done
Now, here’s the confusing part
The two if statements, we’re checking if a % 2 ==1 and if b % 2 == 0, and only then do we add the current node to the sum
Why these specific conditions?
Let’s go through the first iteration to see how it works
We’re checking if a % 2 == 1, and since a is 11, this condition is true
This means we select the current element 3
What if we didn’t select it though?
Well, which element would you select then?
The parent node, 9, is the sum of it’s two child nodes, but those aren’t a part of the range
If it’s still confusing then let’s look at b
The condition b % 2 == 0 will be false since b is 15, and that’s good!
If we did select the current element 6, then that’s wrong, because after that what would we have done?
Selected it’s parent node 8?
That would give us the wrong answer because if you do 8 + 6 you’re basically saying you want the range from 14 to 15, and also the element at 15
This means you’re basically adding a duplicate element, and that’s wrong
I hope this explains why we’re doing that modulo, we just want to check whether we should select the current node, or wait, and select some other parent node which contains the sum for the range we want
If we do go into the if statement then we also increment a and decrement b
This is because if we don’t then a might continue going to the left of the tree and that will be out of the range of our query, and therefore would have no meaning
This is why we’re basically “nudging” a to stay inside the range
The same logic applies to the variable b because it would also continue drifting off to the right, but if we decrement it then it’ll stay within the range
That’s quite a bit, but try debugging the code, that’s what helped me
The next lines of code are pretty simple, doing a /= 2 and b /=2 simply moves them up a level, and then we just return the sum s
Minimum and maximum queries
To do a minimum or maximum query we use the exact same code as before
The only thing we change is how the tree is constructed
For minimum queries here’s what the tree would look like using the original array:
Now we simply just query it the same as before starting at the bottom, and updating the minimum value as we go up
Updating values
Here’s the code for updating a value k by x:
void add(int k, int x) {
k += n;
tree[k] += x;
for (k /= 2; k >= 1; k /= 2) {
tree[k] = tree[2*k]+tree[2*k+1];
}
} This code is much simpler
We again start at the bottom of the tree by doing k += n
Then update the current value there by x
Now we just want to move up the tree while updating the values
To do that we first move up a level with k /= 2
Then update the value at tree[k] by adding it’s two children using the expressions we had seen before
We keep on doing this until we reach the top of the tree, and we’ve updated the tree in O(log n) time!
Conclusion
The code behind Segment Trees might be a little daunting at first, I personally found it very confusing to understand, but try debugging it line by line and creating diagrams, it’ll really help
If you have any questions, comments, or feedback then feel free to reach out to me on LinkedIn!
I also hope this entire series on Range Queries helped you, I’m also going to be releasing an article on Bit Manipulation soon!
Bye ✌️