written 2.7 years ago by | • modified 2.7 years ago |
OPTIMAL BINARY SEARCH TREES
A Binary Search Tree is one of the most important data structures which contain a set of elements with the operations of searching, insertion, and deletion.
An Optimal Binary Search Tree is the one for which the average number of comparisons in a search is the smallest possible, if the probability of searching each elements is given.
Consider four keys A, B, C, and D to be searched for with probabilities $0.1,0.2,0.4$, and $0.3$, respectively.
The below figure depicts two out of 14 possible binary search trees containing these keys.(A) (B)
- The average number of comparisons in a successful search in the first of these trees is
$0.1 \cdot 1+0.2 \cdot 2+0.4 \cdot 3+0.3 \cdot 4=2.9$,
and for the second one it is
$0.1 .2+0.2 .1+0.4 .2+0.3 .3=2.1$.
Neither of these two trees is, in fact, optimal.
The total number of binary search trees with $\mathrm{n}$ keys is equal to the nth Catalan number,
$$ c(n)=\frac{1}{n+1}\left(\begin{array}{c} 2 n \\ n \end{array}\right) \quad \text { for } n\gt0, \quad c(0)=1, $$
Let $a_{1}, \ldots, a_{n}$ be distinct keys ordered from the smallest to the largest and let $p_{1}, \ldots, p_{n}$ be the probabilities of searching for them.
Let $\mathrm{C}(\mathrm{i}, \mathrm{j})$ be the smallest average number of comparisons made in a successful search in a binary search tree $T_{i}^{j}$.- Suppose the root contains key $a_{k}$, the left subtree $T_{i}^{k-1}$ contains keys $a_{i}, \ldots, a_{k-1}$ optimally arranged, and the right subtree $T_{j}^{k+1}$ contains keys $a_{k+1}, \ldots, a_{j}$ also optimally arranged.
If we count tree levels starting with 1 to make the comparison numbers equal the keys' levels, the following recurrence relation is obtained: $$ \begin{aligned} C(i, j)=& \min _{i \leq k \leq j}\left\{p_{k} \cdot 1+\sum_{s=i}^{k-1} p_{s} \cdot\left(\text { level of } a_{s} \text { in } T_{i}^{k-1}+1\right)\right.\\ &\left.+\sum_{s=k+1}^{j} p_{s} \cdot\left(\text { level of } a_{s} \text { in } T_{k+1}^{j}+1\right)\right\} \\ =& \min _{i \leq k \leq j}\left\{\sum_{s=i}^{k-1} p_{s} \cdot \text { level of } a_{s} \text { in } T_{i}^{k-1}+\sum_{s=k+1}^{j} p_{s} \cdot \text { level of } a_{s} \text { in } T_{k+1}^{j}+\sum_{s=i}^{j} p_{s}\right\} \\ =& \min _{i \leq k \leq j}\{C(i, k-1)+C(k+1, j)\}+\sum_{s=i}^{j} p_{s} . \end{aligned} $$
The recurrence relation is $$ C(i, j)=\min _{i \leq k \leq j}\{C(i, k-1)+C(k+1, j)\}+\sum_{s=i}^{j} p_{s} \quad \text { for } 1 \leq i \leq j \leq n . $$
Here $\mathrm{C}(\mathrm{i}, \mathrm{i}-1)=0$ for $1 \leq \mathrm{i} \leq \mathrm{n}+1$ [represents empty tree] and $\mathrm{C}(\mathrm{i}, \mathrm{i})=\mathrm{p}_{\mathrm{i}}$, for $1 \leq \mathrm{i} \leq \mathrm{n}$ [represents an one node tree],
The algorithm computes $C(1, n)$, the average number of comparisons for successful searches in the optimal binary tree.
To get the optimal tree, another two-dimensional table to record the value of $\mathrm{k}$ for which it is minimum.
$\underline{\text { EXAMPLE }}$
Construct an optimal binary search tree for the given set of keys
Key | A | B | C | D |
---|---|---|---|---|
Probability | 0.1 | 0.2 | 0.4 | 0.3 |
Initial tables will be: here $\mathrm{C}(\mathrm{i}, \mathrm{i}-1)=0$
for $1 \leq \mathrm{i} \leq \mathrm{n}+1 \& \mathrm{C}(\mathrm{i}, \mathrm{i})=\mathrm{p}_{\mathrm{i}}$
0 | 1 | 2 | 3 | 4 | |
---|---|---|---|---|---|
1 | 0 | 0.1 | |||
2 | 0 | 0.2 | |||
3 | 0 | 0.4 | |||
4 | 0 | 0.3 | |||
5 | 0 | ||||
0 | 1 | 2 | 3 | 4 |
0 | 1 | 2 | 3 | 4 | |
---|---|---|---|---|---|
1 | 1 | ||||
2 | 2 | ||||
3 | 3 | ||||
4 | 4 | ||||
5 |
$$\begin{aligned} \mathrm{C}(1,2) &=\min \left\{\begin{array}{l}\text { for } \mathrm{k}=1: \mathrm{C}[1,0]+\mathrm{C}[2,2]+\sum_{s=1}^{2} p_{s} \\ \left.\text { for } \mathrm{k}=2: \mathbb{C}_{2}, 1,1\right]+\mathrm{C}[3,2]+\sum_{s=1}^{2} p_{s}\end{array}\right.\\ &=\min [0+0.2+0.3,0.1+0+0.3] \\ &=\min [0.5,0.4] \\ &=0.4 \\ \mathrm{C}(2,3) &=\min \left\{\begin{array}{l}\text { for } \mathrm{k}=2: \mathrm{C}[2,1]+\mathrm{C}[3,3]+\sum_{s=2}^{3} p_{s} \\ \text { for } \mathrm{k}=3: \mathrm{C}[2,2]+\mathrm{C}[4,3]+\sum_{s=2}^{3} p_{s}\end{array}\right.\\ &=\min [0+0.4+0.6,0.2+0+0.6] \\ &=\min [1.0,0.8] \\ &=0.8 \end{aligned}$$
$$ \begin{aligned} C(1,4) &=\min \left\{\begin{array}{l} \text { for } \mathrm{k}=1: \mathrm{C}[1,0]+\mathrm{C}[2,4]+\sum_{s=1}^{4} p_{s} \\ \text { for } \mathrm{k}=2: \mathrm{C}[1,1]+\mathrm{C}[3,4]+\sum_{s=1}^{4} p_{s} \\ \text { for } \mathrm{k}=3: \mathrm{C}[1,2]+\mathrm{C}[4,4]+\sum_{s=1}^{4} p_{s} \\ \text { for } \mathrm{k}=4: \mathrm{C}[1,3]+\mathrm{C}[5,4]+\sum_{s=1}^{4} p_{s} \end{array}\right.\\ &=\min [0+1.4+1.0,0.1+1.0+1.0,0.4+0.3+1.0,1.1+0+1.0] \\ &=\min [2.4,2.1,1.3 .2 .1] \\ &=1.7 \end{aligned} $$
Now the tables becomes
0 | 1 | 2 | 3 | 4 | |
---|---|---|---|---|---|
1 | 0 | 0.1 | 0.4 | ||
2 | 0 | 0.2 | 0.8 | ||
3 | 0 | 0.4 | 1.0 | ||
4 | 0 | 0.3 | |||
5 | 0 |
0 | 1 | 2 | 3 | 4 | |
---|---|---|---|---|---|
1 | 1 | 2 | |||
2 | 2 | 3 | |||
3 | 3 | 3 | |||
4 | 4 | ||||
5 |
$\begin{aligned} C(1,4) &=\min \left\{\begin{array}{l}\text { for } \mathrm{k}=1: \mathrm{C}[1,0]+\mathrm{C}[2,4]+\sum_{s=1}^{4} p_{s} \\ \text { for } \mathrm{k}=2: \mathrm{C}[1,1]+\mathrm{C}[3,4]+\sum_{s=1}^{4} p_{s} \\ \text { for } \mathrm{k}=3: \mathrm{C}[1,2]+\mathrm{C}[4,4]+\sum_{s=1}^{4} p_{s} \\ \text { for } \mathrm{k}=4: \mathrm{C}[1,3]+\mathrm{C}[5,4]+\sum_{s=1}^{4} p_{s}\end{array}\right.\\ &=\min [0+1.4+1.0,0.1+1.0+1.0,0.4+0.3+1.0,1.1+0+1.0] \\ &=\min [2.4,2.1,1.3 .2 .1] \\ &=1.7 \end{aligned}$
$\begin{aligned} C(1,4) &=\min \left\{\begin{array}{l}\text { for } \mathrm{k}=1: \mathrm{C}[1,0]+\mathrm{C}[2,4]+\sum_{s=1}^{4} p_{s} \\ \text { for } \mathrm{k}=2: \mathrm{C}[1,1]+\mathrm{C}[3,4]+\sum_{s=1}^{4} p_{s} \\ \text { for } \mathrm{k}=3: \mathrm{C}[1,2]+\mathrm{C}[4,4]+\sum_{s=1}^{4} p_{s} \\ \text { for } \mathrm{k}=4: \mathrm{C}[1,3]+\mathrm{C}[5,4]+\sum_{s=1}^{4} p_{s}\end{array}\right.\\ &=\min [0+1.4+1.0,0.1+1.0+1.0,0.4+0.3+1.0,1.1+0+1.0] \\ &=\min [2.4,2.1,1.3 .2 .1] \\ &=1.7 \end{aligned}$
Now the tables becomes
0 | 1 | 2 | 3 | 4 | |
---|---|---|---|---|---|
1 | 1 | 2 | |||
2 | 2 | 3 | |||
3 | 3 | 3 | |||
4 | 4 | ||||
5 |
0 | 1 | 2 | 3 | 4 | |
---|---|---|---|---|---|
1 | 1 | 2 | 3 | 3 | |
2 | 2 | 3 | 3 | ||
3 | 3 | 3 | |||
4 | 4 | ||||
5 |
Thus, the average number of key comparisons in the optimal tree is equal to $1.7$.
Since $R(1,4)=3$, the root of the optimal tree contains the third key, i.e., $C$. - Since its a binary search tree, Its left subtree is made up of keys A and B, and its right subtree contains just the key D - To find the specific structure of these subtrees, - In the root table since $R(1,2)=2$, the root of the optimal tree containing $A$ and $B$ is $B$,- Thus, the average number of key comparisons in the optimal tree is equal to $1.7$. - Since $R(1,4)=3$, the root of the optimal tree contains the third key, i.e., $C$. - Since its a binary search tree, Its left subtree is made up of keys A and B, and its right subtree contains just the key D - To find the specific structure of these subtrees, - In the root table since $R(1,2)=2$, the root of the optimal tree containing $A$ and $B$ is $B$, with $\mathrm{A}$ being its left child. - Since R $(4,4)=4$, the root of this one-node optimal tree is its only key D.
The below figure represents the optimal tree