5.2 Sorting Algorithms Suppose that we defined a predicate Orderwhich introduces a relation of (complete) order on some set of Refal terms; thus <Order t1 t2> is T , if the pair t1 t2 is in good order, and F otherwise. A relation of order is reflexive, antisymmetric and transitive. Given such a relation, we may wish to order a given list of terms so that whenever t1 precedes t2 in the list, <Order t1 t2> is true. In programming, this ordering procedure is usually referred to as sorting. The terms to be ordered may be, e.g., parenthesized strings with the lexicographic relation Order , or just numbers with Order being the relation less or equal. The most straightforward way to order a sequence of terms is to transpose neighboring terms as long as they are not in order:
This function is good as a definition of the problem, but taken as its solution it is grossly inefficient. Indeed, suppose we have made a step of the Refal machine that transposed tX and tY . Now all pairs of neighbors in e1 tY , with the possible exception of the last one (made by the last term in e1 and tY ), must be in good order. Nevertheless, at the next step the Refal machine will check all these pairs before it comes to the next inverted pair. An improvement suggests itself: to enclose the ready part in parentheses and compare the last term of the ready (ordered) part with the first term from the still unordered part:
This algorithm is much better but it can be further improved. After we transpose sX and sY , the ready part e1 shrinks by one term and will continue to do so while sY makes disorder with the last term of e1 . When sY finds its place, however, the whole original e1 will be ordered; therefore, we can improve the algorithm by restoring the original position of parentheses. Of course, this cannot be done if we do not somehow fix their position in the beginning. Thus we come to the algorithm known as insertion sorting: maintaining the initial part of the list in the right order, and inserting each next term in its place by moving it from right to left:
With this algorithm the average number of steps required to sort n terms is roughly n2 / 2. There are algorithms, such as the merge-sort, which do it in a number of steps proportional to n log n. The idea of the merge-sort is to divide and conquer. In order to sort a list, divide it into two approximately equal parts. Then sort the parts separately and merge them together into one sorted list. When we merge two ordered lists we need not go repeatedly through any of them. We simply compare the first elements in both lists and pick up the least one (more precisely, the one which is not greater than the other). Thus the total number of operations in a merge will be proportional to n . To sort the halves of the original list we divide them into quarters, etc. Since the total number of terms on all levels of divisions is always n, the time required for all mergings at each level is proportional to n. The number of levels of division necessary to come to single-element lists is log n. Therefore, the total time is n log n. In an implementation of this idea it is better to go in the opposite direction: we first form two-term ordered lists, then merge them into four-term lists, etc.:
There are more algorithms for sorting. Let us consider the quick-sort. We take the first term in the list t1 and try to put it in its final place. For this purpose we partition the remaining list into those terms which must precede t1 -- call this part e.Left -- and those which must follow it -- e.Right . Then the place of t1 is in between these two lists. Now apply this procedure recursively to the two sublists and concatenate:
Partitioning takes an amount of time that is proportional to the length of the list. The sum of times for partitioning at one recursive level is roughly n. Since e.Left and e.Right are, on the average, equal in length it will take roughly log n levels of recursion. Therefore, the full time is proportional to n log n. |