Firstly, what is a binary tree?
A binary tree is a simple data structure where every node points to two more nodes, culminating in some type of final data type (usually null or nil).
4 5 6 7
A badly unbalanced binary tree might look more like this:
4 5 null
null null null 6 7
Both of these examples are not really sorted, and thus are not very useful as binary search trees. They look sorted in this format, and there is an order to them, but it’s not what I’ll be talking about when I’m trying to get to grips with the data structure in this post. A properly-sorted balanced binary tree would look like this:
1 3 5 7
This is much more useful, as it means that we can find our way around the tree by checking the value held by the current node against the target value. For instance, in the case of 5, we check to see if it’s larger than (4). It is, so we go right to the node with the value (6). 5 is smaller than 6 so we go left, and we’re there!
In a sorted list, we would need to make 5 checks to find our number, while for the BST we only need to make 3 (2 to traverse the tree, one to make sure we’ve ended on the correct value).
In an unsorted or unbalanced tree this would take a lot longer, and if the value didn’t exist in an unsorted tree the operation would take O(n) time to return #f. In a balanced, sorted binary search tree it takes O(log n) time to search for the value, even if the value doesn’t exist.
So, trees are most useful when they’re already sorted and balanced – or ideally automatically self-sorting and balancing.
Binary trees are closely related to a clever data structure called a skip list. The concept behind a skip list (crudely) is that a list is created:
(list 1 2 3 4 5 6 7 8 9)
Followed by a second list with only every other number, to speed up traversal times:
(list 1 3 5 7 9)
If we reach a number in the skip list which is larger than the target number, we know that we’ve moved one place past the target number. If the number matches, we’ve found the number. At a stroke, we’ve halved the search time for this list!
If you build your skip list to its intended height…
(list 1 5)
(list 1 5 9)
(list 1 3 5 7 9)
(list 1 2 3 4 5 6 7 8 9)
Looks a lot like a balanced binary tree!
So how do we make a binary tree in Racket?
(struct node (x left right)
Obviously there’s more to a binary tree than this, but that’s the basics of it. Once you have a node with a place for values, a left node and a right node, you’ve basically got a binary tree.
You can also extend the trees by hand:
(define tree (node 1 (node 2 null null) (node 3 null null)))
This is pretty rubbish though. If we treated linked lists like this in Racket, we’d have to make everything using (cons x (cons y null)), when what people actually do in the real world is write (list x y).
For this next part, I first created a leaf struct. Leaves should typically be very populous, so I wanted to identify them to save a large number of function calls.
(define-struct/contract leaf ([x (not/c null?)])
Just the value, and it’s not allowed to be null.
Next, we create a function which takes a list and spits out a sorted, unbalanced tree.
;Modified quicksort to create ordered (but not balanced) binary search tree (define (unsorted-list->binary-tree xs) (if (null? xs) null (if (null? (cdr xs)) (leaf (car xs)) (let* ([hd (car xs)] [tail (cdr xs)] [left (filter (lambda (x) (< x hd)) tail)] [right (filter (lambda (x) (>= x hd)) tail)]) (node hd (unsorted-list->binary-tree left) (unsorted-list->binary-tree right))))))
A simple modified racket quicksort function, similar to the quicksort I’ve discussed here before.
> (unsorted-list->binary-tree (list 1 2 3 4 5 6))
(node 1 '() (node 2 '() (node 3 '() (node 4 '() (node 5 '() (leaf 6))))))
>(unsorted-list->binary-tree (list 4 5 7 12 1 2))
(node 4 (node 1 '() (leaf 2)) (node 5 '() (node 7 '() (leaf 12))))
Meanwhile, turning a sorted list into a balanced tree is just as easy:
(define (sorted-list->balanced-tree xs) (if (null? xs) null (if (null? (cdr xs)) (leaf (car xs)) (let* ([n (floor (/ (length xs) 2))] [mid (list-ref xs n)] [left (take xs n)] [right (drop xs (+ n 1))]) (node mid (sorted-list->balanced-tree left) (sorted-list->balanced-tree right))))))
We don’t know what the median of a list is until it’s sorted, meaning I can’t balance a list that isn’t sorted without seeking out a different algorithm or data structure. What we could do is use a slightly different data structure, the Red-Black tree, but I’m still getting to grips with how and why exactly that structure works, so I’d rather not get into that now.
In other words, we might as well take an already-sorted list, then balance it.
If we don’t want to just feed lists into our tree, we make our tree-making function variadic. A variadic function is a function able to take a variable number of arguments.
In Racket, it’s simple:
;To create a binary tree from a variadic argument (define binary-tree (lambda xs (letrec ([f (lambda (xs) (if (null? xs) null (let* ([hd (car xs)] [tail (cdr xs)] [left (filter (lambda (x) (< x hd)) tail)] [right (filter (lambda (x) (>= x hd)) tail)]) (if (null? tail) (leaf hd) (node hd (f left) (f right))))))]) (f xs))))
If you’re wondering why I’ve bothered with the local recursive function (f xs) here, it’s because the variadic argument is a list as soon as it is passed to the function. Thus, if we tried recursing on it our function would get confused and think there was only one argument and stop recursing – unless we put a cond statement in to differentiate between lists and other types of input and…yes. This could get counter-productive.
This function works in the exact same way as the one we saw earlier, only it takes different numbers of arguments rather than different lengths of lists.
Finally, here is how I create a balanced tree with variadic arguments and of different types, and also a kth-select algorithm for any ordered tree in Racket:
;Requires a sign (e.g. <, >, <=, string<?) as the first argument to sort the rest of the arguments (define balanced-tree (lambda xs (letrec ([f (lambda (xs) (if (null? xs) null (let* ([n (floor (/ (length xs) 2))] [mid (list-ref xs n)] [left (take xs n)] [right (drop xs (+ n 1))]) (if (null? (cdr xs)) (leaf mid) (node mid (f left) (f right))))))]) (f (sort (cdr xs) (car xs)))))) ;count-tree-members - useful for kth selection algorithm (define (count-tree-members ts) (cond [(null? ts) 0] [(leaf? ts) 1] [(node? ts) (+ 1 (count-tree-members (node-left ts)) (count-tree-members (node-right ts)))])) ;selection algorithm on tree (define (exn:fail "Error! Value k is too large or small for the search tree." (current-continuation-marks))) (define (kth-select ts k) (letrec ([f (lambda (ts k) (cond [(null? ts) (raise )] [(leaf? ts) (if (= k 1) (leaf-x ts) (raise ))] [(node? ts) (let ([left (add1 (count-tree-members (node-left ts)))]) (cond [(= k left) (node-x ts)] [(> k left) (f (node-right ts) (- k left))] [(< k left) (f (node-left ts) k)]))]))]) (f ts k)))
In the mean-time, I’ve been fiddling around with small improvements to the Nearest Neighbour algorithm for the Travelling Salesman Problem (for fun, I’m not a crank I promise!), and continuing to work on my game in Java.