Comparing Text Objects

If you have two atoms, two character codes, or two lists of character codes to compare, the following built-in predicates can be used:

lexicographically less than
not less than
lexicographically greater than
not greater than
identical to
not identical to
three-way comparison

For example,

     | ?- a @< b.
     | ?- a @> b.
     | ?- compare(R, "fred", "jim").
     R = <
     | ?- 0'a @< 0'z.

There are several points to note about these built-in comparison predicates:

  1. They are sensitive to alphabetic case
  2. If the two terms being compared are of different types, it is the types that are compared (integer < atom < list).
  3. They behave as though the two operands were converted to character lists and the shorter operand were padded on the right with -1's.

It would be useful to have routines that ignored alphabetic case. to_lower/2 and to_upper/2 in library(ctypes) (lib-txp-ctypes) may be useful in writing your own.

library(strings) provides two string comparison predicates that address the other two points.

compare_strings(-Relation, +Text1, +Text2)
takes two text objects and compares them, binding Relation to

if Text1 lexicographically precedes Text2
if Text1 and Text2 are identical (but for type)
if Text1 lexicographically follows Text2

compare_strings(-Relation, +Text1, +Text2, +Pad)
is the same as compare_strings/3, except that it takes an additional parameter, which is a character code (an integer). In effect, it pads the shorter of Text1 or Text2 on the right with the padding character Pad, and calls compare_strings/3 on the result.

We could have defined compare_strings/[3,4] this way:

     compare_strings(Relation, Text1, Text2) :-
             name(Text1), name(Text1, Name1),
             name(Text2), name(Text2, Name2),
             compare(Relation, Name1, Name2).
     compare_strings(Relation, Text1, Text2, Pad) :-
             name(Text1), name(Text1, Name1),
             name(Text2), name(Text2, Name2),
             pad(Name1, Name2, Pad, Full1, Full2),
             compare(Relation, Full1, Full2).
     pad(Name1, [], Pad, Name1, Full2) :- !,
             pad(Name1, Pad, Full2).
     pad([], Name2, Pad, Full1, Name2) :-
             pad(Name2, Pad, Full1).
     pad([Char1|Name1], [Char2|Name2], Pad,
         [Char1|Full1], [Char2|Full2]) :-
             pad(Name1, Name2, Pad, Full1, Full2).
     pad([], _, []).
     pad([_|X], Pad, [Pad|Y]) :-
             pad(X, Pad, Y).

The point of compare_strings/4 is that some programming languages define string comparison to use blank padding (Pad=32), while others define it to use NUL padding (Pad=0), and yet others use lexicographic comparison (Pad= -1) as compare/3 and compare_strings/3 do; compare_strings/4 allows you to specify whichever is most useful for your application.

The host language function used to implement these operations is considerably more general. You may want to experiment with it.

Here are some examples:

     | ?- % illustrating the difference between compare/3
     |    % and compare_strings/3
     |    compare(R1, fred, jim),
     |    compare(R2, "fred", "jim").
     R1 = <,
     R2 = <
     | ?- compare_strings(R1, fred, jim).
     R1 = <
     | ?- compare_strings(R2, "fred", "jim").
     ! Type error in argument 2 of compare_strings/3
     ! atom expected, but [102,114,101,100] found
     ! Goal:  compare_strings(_286,[102,114,101,100],
     | ?- % illustrating compare_strings/4
     |    Space is " ",
     |    compare_strings(R, ' ', ''),
     |    compare_strings(S, ' ', '', Space).
     Space = 32,
     R = <,
     S = =

Another convention is sometimes used, in which the lengths of the atoms are compared first, and the text is examined only for atoms of the same length. You could program it thus:

     xpl_compare(Relation, Text1, Text2) :-
             /* this is not in library(strings) */
             string_length(Text1, Length1),
             string_length(Text2, Length2),
             (   Length1 =:= Length2 ->
                 compare_strings(Relation, Text1, Text2)
             ;   compare(Relation, Length1, Length2)