1.3 Pattern Matching

Patternexpressions (or just patterns) of Refal differ from object expressions in that they may include freevariables . Here are a few examples of free variables:

  e1  e.Data1  e.data1  sX  s.Free-var  t1  t.25

A free variable is formed by a type sign, a dot, and an index. The index of a variable is an identifier or a whole number. If the index is a one-letter identifier or a one-digit number, the dot can be skipped. An identifier used as an index after the dot may start with a lower case letter as well as a capital letter. There are three type signs. They are represented by the lower-case letters s , t , and e ; the corresponding variables are referred to as s-, t-, and e-variables. The difference between them is in the sets of their admissible values. An s-variable can take only a single symbol as its value. A t-variable takes any term as its value (recall that a term is either a symbol or an expression in structure brackets). An e-variable can take any expression as its value.

A pattern expression can be viewed as a set of object expressions which are obtained by substituting admissible values for all variables. Thus the pattern A e1represents the set of all object expressions starting with the symbol A . All entries of the same variable must be replaced by the same value. The pattern s1 e2 s1 can be seen as the set of all object expressions which start and end with the same symbol and have at least two symbols.

But pattern expressions also have another function: they describe a decomposition of an object expression, during which the variables in the pattern are identified with some parts of the object expression, i.e., take certain values. Viewed in this way, A e1 is a procedure which checks that the first symbol of a given object expression is A , and assigns the remaining part of it to e1 as its value. This procedure is known as patternmatching.

Given an object expression E and a pattern expression P , we denote the operation of matching E to P  as E : P  where the colon is one more special sign of Refal; the matching sign. We also say that E is being recognized as (a particular case of) P . In the pair E : P  we call E the argument and P the patternof the matching operation.

If there exists a substitution S for the variables in P such that applying S to P yields E , then we say that the matching, or recognition, is successful. The variables in P take on the values prescribed by S . Otherwise, the matching fails. If there is more than one way of assigning values to the free variables in the pattern in order to achieve matching, then the one is chosen in which the leftmost e-variable takes the shortest value. If this does not resolve the ambiguity, the same selection is done on the next (from the left) e-variable, etc. In other words, when we perform matching, we scan the argument E  from left to right and pick the first successful match of E to P .

Here are a few examples. The matching


  A B C : A e1
succeeds with e1 taking the value B C . The matching


  ('ABC') '++' : sX e1
fails, because the left parenthesis is not a symbol, so it cannot be matched to sX . Neither can it be ignored. However,


  ('ABC') '++' : (sX e1) e.Out
succeeds, with sX becoming 'A' , e1 becoming 'BC', and e.Out becoming '++' .

The recognition of '++' as s1 e2 s1 succeeds -- with s1 taking the value '+' and e2 the empty expression. But


  '+' : s1 e2 s1
fails.

The following matching:


  A B '+' C '+' D E F  :  e1 '+' e2
is an example of a situation where there is more than one way to assign values for matching. According to the definition above, the symbol '+' in the pattern will be identified with the first symbol '+' in the argument (the left-to-right matching). Thus e1 will take A Bas its value and e2 will become C '+' D E F.

In the following example:


  A B '-' (C '+' D E F)  :  e1 '+' e2
the matching will fail because the only symbol '+' of the argument is inside parentheses. To match it to the '+' in the pattern, we would have to assign to e1 and e2 some values that were unbalanced with respect to parentheses, and this is impossible. Refal takes structure brackets very seriously. Parts of expressions that are not expressions themselves cannot be treated separately in any context.

Exercise 1.3 Write patterns that can be described in words as follows: (a) an expression ending with two identical symbols; (b) an expression which contains at least two identical terms on the top level of structure; (c) a non-empty expression.

Exercise 1.4 Find the results of the following matchings:


(a) 'abbab' : e1 tX tX e2
(b) 'ab(b)ab' : e1 tX tX e2
(c) A(B) C D (A (B)) : e6 e4(e6)
(d) '16 0' : 16 eZ