Quintus Prolog

The "span" family

The midstring, substring, and subchars families give you a way of taking strings apart when you know the lengths of the substrings you want, or when you know a particular substring.

The "span" family gives you another way of taking strings apart. The family contains three sub-families: span_left/[3,4,5], which scan from the left, span_right/[3,4,5], which scan from the right, and span_trim/[3,4,5], which scans from both ends towards the middle.

span_left(+Text, +Set, ?LenA, ?LenB, ?LenC)

span_left(+Text, +Set, ?LenA, ?LenB)

span_left(+Text, +Set, ?LenA)

are true when

Text is a text object,
Set specifies a set of characters (see below), and
Text can be broken into three pieces A, B, C, such that
- LenA is the length of A,
- LenB is the length of B,
- LenC is the length of C,
- no character in A belongs to the Set,
- every character in B belongs to the Set,
- B is not empty (so some character of Text must belong to the Set),
- C contains the rest of Text (it may contain characters from Set), and
- A and B are as long as possible.

The Set is

an atom A. A character belongs to such a Set if and only if it occurs in the name of A. The atom '' represents an empty Set.
a non-empty list of character codes [C1,...,Cn]. A character belongs to set a Set if and only if it occurs among the character codes C1,...,Cn.
not(X), where X is an atom or non-empty list of characters. A character belongs to such a Set if and only if it does not belong to the set X.

The first two arguments must be instantiated. Given them, the remaining three arguments are uniquely determined. The last three arguments give you a picture of how the text is divided:

                 |   LenA    |    LenB     |  LenC   |
         Text=    a a a a a a B B B B B B B c c c c c
                              \____Set____/

where Set embraces the characters in the B substring. By design, the Set argument occupies the same position in the argument list of this predicate that B does in the argument list of substring/[4,5] or midstring/[3,4,5,6]. The fact that the last three arguments of span_left/5 follow this convention means that you can use midstring/[3,4,5,6], substring/[4,5], or subchars/[4,5] to extract whichever substring interests you.

For example, to skip leading spaces in String, yielding Trimmed, you would write

     | ?- span_left(String, not(" "), Before),
     |    substring(String, Trimmed, Before, _, 0).

Note that this fails if there are no non-blank characters in String. To extract the first blank-delimited Token from String, yielding a Token and the Rest of the string, you would write

     | ?- span_left(String, not(" "), Before, Length, After),
     |    substring(String, Token, Before, Length, After),
     |    substring(String, Rest, _, After, 0).

span_right(+Text, +Set, ?LenA, ?LenB, ?LenC)

span_right(+Text, +Set, ?LenB, ?LenC)

span_right(+Text, +Set, ?LenC)

are true when

Text is a text object,
Set specifies a set of characters, and
Text can be broken into three pieces A, B, C, such that
- LenA is the length of A,
- LenB is the length of B,
- LenC is the length of C,
- no character in C belongs to the Set,
- every character in B belongs to the Set,
- B is not empty (so some character of Text must belong to the Set), and
- C and B are as long as possible.

These three predicates are exactly like span_left/[3,4,5] except that they work from right to left instead of from left to right. In particular, the picture

                 |   LenA    |    LenB     |  LenC   |
         Text=    a a a a a a B B B B B B B c c c c c
                              \____Set____/

applies.

Finally, there are predicates that scan from both ends:

span_trim(+Text, +Set, ?LenA, ?LenB, ?LenC)

is true when

Text is a text object,
Set specifies a set of characters, and
Text can be broken into three pieces A, B, C, such that
- LenA is the length of A,
- LenB is the length of B,
- LenC is the length of C,
- every character in A belongs to the Set,
- every character in C belongs to the Set,
- A and C are as long as possible, and
- B is not empty.

The Set argument of span_trim/5 has the same form as the Set argument of span_left/[3,4,5] or span_right/[3,4,5], but there is an important difference in how it is used: in span_trim/5 the Set specifies the characters that are to be trimmed away. The picture is

                 |   LenA    |    LenB     |  LenC   |
         Text=    a a a a a a B B B B B B B c c c c c
                  \___Set___/               \__Set__/

There is a special case of span_trim/5 that enables you to strip particular characters from both ends of a string. These unwanted characters are designated in Set in span_trim/3:

     span_trim(String, Set, Trimmed) :-
             span_trim(String, Set, Before, Length, After),
             substring(String, Trimmed, Before, Length, After).

A further specialization, span_trim/2, is intended for trimming blanks from fixed-length records:

     span_trim(String, Trimmed) :-
              span_trim(String, " ", Before, Length, After),
              substring(String, Trimmed, Before, Length, After).

For example,

     | ?- span_trim('  abc    ', " ", B, L, A).
     B = 2
     L = 3
     A = 4
     
     | ?- substring('  abc    ', Trimmed, 2, 3, 4).
     Trimmed = abc
     
     | ?- span_trim(' an   example ', Trimmed).
     Trimmed = 'an   example'

Note that the last example leaves the group of three internal blanks intact. There are no predicates in library(strings) for compressing such blanks.

In manipulating text objects, do not neglect the possibility of combining the "span" family with subchars/[4,5] or midstring/[3,4,5,6].