Getting the gist of segment trees

Segment Trees

Wow they seem overly complex to learn about. They’re a very easy concept, but there is a paucity of resources for explaining them.

Have a gander at the resources at the bottom for the ones I came across. Took me ages to figure out the differences between them.

Here, I assume you know about Binary Trees and recursion on Binary Trees.

Any code is in python which is about as close to pseudo code as you can get.

tl;dr A segmented tree in one diagram

Structuring an array (length 8) in a segmented tree. The indexes of the array are the leaf nodes.

The parent nodes here are sums of the children (but could be another function).

This can be coded as a normal binary tree.

       28
     /   \
   6       22
  / \     / \
 1   5   9   13
/ \ / \ / \ / \
0 1 2 3 4 5 6 7

Step 1: Create a tree with specific indexing strategy

We are going to use an example for storing 2^3 = 8 values indexed by [0,7] from an array in a segment tree.

We’re using 8 to ensure the segment tree we will create is a “perfect” binary tree to make things easier, or at least prettier. All segmented trees are ‘full‘ – all nodes have 0 or 2 children.

The values will get added as leaves of the tree.

So we want a tree that has 8 leaves.

To learn about how we construct them we’ll first number the nodes of a tree in depth wise (across the levels).

Some (Geeks for Geeks) start this at 0. League of Programmers start this at 1.

A perfect binary tree with 8 leaves has 4 levels and 2^4 – 1 = 15 nodes. The eight leaves are the values so we will use the 15 – 8 = 7 nodes for the segment nodes.

A binary tree with 7 nodes indexed by level, starting at 1 and with the 8 values of our array as the leaves:

       1 <= parent
     /   \
   2       3 <= i
  / \     / \
 4   5   6   7 <= children
/ \ / \ / \ / \
0 1 2 3 4 5 6 7

For each node i that we have indexed, e.g node i = 3, then

     parent = i/2 = 1 (integer division)
 left child = 2*i = 6
right child = 2*i+1 = 7

Alternatively, If you index the nodes from 0:

       0 <= parent
     /   \
   1       2 <= i
  / \     / \
 3   4   5   6 <= children
/ \ / \ / \ / \
0 1 2 3 4 5 6 7

Here for i = 2:

parent = (i-1)/2 (integer division)
  left = 2*i + 1
 right = 2*i + 2

The difference between starting from 0 or 1 confused me a lot when I was looking at the various tutorials. I’m going to stick with indexing from 1.

Step 2: Define what the segments are for each of the nodes.

Reminder about ranges:
() = ‘open’ boundary, [] = ‘closed’ boundary. We’re only using closed boundaries here.

x in (a,b] => a < x <= b
x in [a,b) => a <= x < b
N.B. [a,a] = a

The root node i = 1 refers to the whole array range [0,7]. Then node i = 2 refers to the lower half [0,3] and node i = 3 refers to the upper half [4,7]

N.B. [0,7] is the range 0-7 but it can also refer to a python list with 0 and 7 as values. Hopefully the context should make it obvious.

So, so far we have:

       1
     [0,7]
     /   \
   2       3
 [0,3]   [4,7]

Then replacing all indexes of the tree with their corresponding segments:

         [0,7]
        /     \
   [0,3]       [4,7]
   /   \       /   \
[0,1] [2,3]  [4,5] [6,7]
 / \   / \    / \   / \
 0 1   2 3    4 5   6 7

Step 3: Create a function to store the segment of values

We want to set the value of the parent nodes to indicate the range of values that are in the leaves of the current sub-tree. This is quite well explained on Slide 16 of the League of Programmers’ slides:

“Each internal node represents some merging of the leaf nodes”

There are four common ones: sum, min, max, GCD (Greatest Common Divisor).

In more general terms, for node i = 3, node value = f(a[4], a[5], a[6], a[7]) for the array a.

However this function f can be anything, e.g. other possibilities are count or finding the mid point of the range ((max - min) / 2)

Min is used for a the the Range Minimum Query, Geeks for Geeks have a Coded Example.

Example using sum:

       28 (= 6+22)
     /   \
   6       22 (= 9+13)
  / \     / \
 1   5   9   13 (= 6+7)
/ \ / \ / \ / \
0 1 2 3 4 5 6 7

This corresponds to the similar tree in the Geeks for Geeks tutorial.

Step 4: Worked example to ‘fill in the blanks’

We’re going to use count as the function. We want to fill in the ith blank position (starting from 1).

We are given an array of [3,2,1,1] for which blank to fill for the following sorted values [1,2,3,4].

    blanks = [3,2,1,1]
    #         |
    #         ^
    values = [1,2,3,4]

So 1 wants to go into the 3rd blank space. Then 2 wants to go into the 2nd blank space of the remaining 3 spaces.

Step 1: [_,_,1,_] <= 1 goes in 3rd blank space
Step 2: [_,2,1,_] <= 2 goes in 2nd blank space remaining
Step 3: [3,2,1,_] <= 3 goes in 1st blank space remaining
Step 4: [3,2,1,4] <= 4 goes in 1st blank space remaining

       0 <= count of values in segment [1,4]
     /   \
   0       0
  / \     / \
 1   2   3   4
 _   _   _   _

Fill in 1:

       1* <= count increased by 1
     /   \
   0       1*  <= count increased by 1
  / \     / \
 1   2   3   4
 _   _   1*  _ <= 1 placed in 3rd blank

Fill in 2:

       2*
     /   \
   1*      1
  / \     / \
 1   2   3   4
 _   2*  1   _

Fill in 3:

       3*
     /   \
   2*      1
  / \     / \
 1   2   3   4
 3*  2   1   _

Fill in 4:

       4*
     /   \
   2      2*
  / \     / \
 1   2   3   4
 3   2   1   4*

Step 5: Code for ‘fill in the blanks’

We’re assuming that the tree has been created for us. The code for this is at the end.

To programmatically decide which blank to fill, we need to know if the ith value is in the left or right segment. Here’s the starting point again.

Initially, the counts are all zero. So for the root node we want to know if 3 is greater than or less than 4/2 (half the length of the array). If it’s equal, i.e. 2, then we want to put it in the left segment. Also if it’s less than i.e. 1 put it in the left half. If its greater than i.e. [3,4] then it wants to go in the right segment.

    N = len(segment)
    if N/2 < i:
        # go right
    else: # i <= N/2
        # go left
       0 <= looking for 3rd leaf node, i = 3 > N/2 so look right
     /   \
   0       0
  / \     / \
 1   2   3   4
 _   _   _   _

At the second level, the count is still zero. But now we are only considering half the values. so N = 2, but now we want to know if the initial i = 3 goes in the 1st or 2nd segment. We’ve discarded half the values so we can subtract half the original N = 4 values from i. So at the first level of recursion, we alter i for the next level to be, i = i - N/2. Going left requires no change.

The length of the segments, if even, will divide in two, so N = N/2. If odd, then N/2 (using integer division) go left and the rest (N - N/2) go right so we split range [1,N] into [1,N/2] and [N/2+1,N]. So [1,3] => 1 and [2,3]

       0
     /   \
   0       0 <= i = 1, N = 2, i <= N/2 so look left
  / \     / \
 1   2   3   4
 _   _   _   _
    def fillBlank(self, segment, i, N):
        # N = len(segment) # we pass this in
        if i <= N/2:
            # go left
            N = N/2
            self.fillBlank(segment.left, i, N)
        else: # N/2 < i
            # go right
            i = i - N/2
            N = N - N/2
            self.fillBlank(segment.right, i, N)

Then when we hit the leaf then we can set the value.

    # N.B. the new params blanks and value
    # blanks is passed in by reference
    # value is going to fill the blank
    def fillBlank(self, segment, i, N, blanks, value):
        if not segment.left and not segment.right:
            # N == 1
            # we're dealing with array indexes, so minus 1
            blanks[segment.value - 1] = value
            return
        # ...
       0
     /   \
   0       0 <= i = 1, N = 2, i <= N/2 so look left
  / \     / \
 1   2   3   4
 _   _   1   _ <= blanks[2] = 1

Now, to complete our first pass through the tree, we should have updated the counts (the values of each of the nodes) by one as we passed through it. So the full method becomes:

    def fillBlank(self, segment, i, N, blanks, value):
        if not segment.left and not segment.right:
            # N == 1
            # we're dealing with array indexes, so minus 1
            blanks[segment.value - 1] = value
            return
        # increment the count of this segment
        segment.value += 1
        # N = len(segment) # we pass this in
        if i <= N/2:
            # go left
            N = N/2
            self.fillBlank(segment.left, i, N, blanks, value)
        else: # N/2 < i
            # go right
            i = i - N/2
            N = N - N/2
            self.fillBlank(segment.right, i, N, blanks, value)

However this doesn’t quite work for all the following passes through the tree, e.g.searching for the 1st blank space for the very final step – which wants to go into the blank space which is at the end of the array.

To do this we take into account the count of the child segments. If the left segment has been filled up but the right segment is empty, then we automatically want to look in the right segment.

So the condition i <= N/2 becomes i <= N/2 - X where X = segment.left.value and when going right, we alter the ith element similarly i = i - (N/2 - X).

Finally we don’t care about this conditional check when the segment only has two values in it as those two values are just the leaves (which don’t contain counts). This works since all segmented trees are ‘full’ (every node has 0 or 2 children).

    def fillBlank(self, segment, i, N, blanks, value):
        if not segment.left and not segment.right:
            # N == 1
            # we're dealing with array indexes, so minus 1
            blanks[segment.value - 1] = value
            return
        # increment the count of this segment
        segment.value += 1
        # N = len(segment) # we pass this in
        # this section gets more complicated from edge cases
        if N > 3:
            X = segment.left.val
        elif N == 3:
            # left node is leaf
            X = segment.val - segment.right.val
        elif N == 2 and blanks[segment.left.val - 1]:
            # need to check if left has been filled
            X = 1
        else:
            X = 0

        if i <= N/2 - X: # X subtracted
            # go left
            N = N/2
            self.fillBlank(segment.left, i, N, blanks, value)
        else: # N/2 < i
            # go right
            i = i - (N/2 - X)
            N = N - N/2
            self.fillBlank(segment.right, i, N, blanks, value)

Here’s the first step of the second pass:

       1 <= looking for 2nd leaf node, X=0, N=4, i=2 <= N/2 so look left
     /   \
   0       1
  / \     / \
 1   2   3   4
 _   _   1   _

We then follow the recursion through, which should populate the blanks list.

Resources

view raw
segment-trees.md
hosted with ❤ by GitHub

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.