Classifying Characters -- library(ctypes)

One of the problems facing anyone who uses Prolog on more than one system is that different operating systems use different characters to signal the end of a line or the end of a file. We have

     Dialect         DEC-10 Prolog   SICStus Prolog  Quintus Prolog
     OS              (TOPS-10)       (UNIX,Windows)  (UNIX,Windows)
     
     end-of-line     31 (^_)         10 (LF, ^J)     10 (LF, ^J)
     end-of-file     26 (^Z)         -1              -1
     

Windows note: From an application program's point of view, each line in the file is terminated with a single <LFD>. However, what's actually stored in the file is the sequence <RET><LFD>.

A prudent Prolog programmer will try to avoid writing these constants into his program. Indeed, a prudent Prolog programmer will try to avoid relying too much on the fact that Prolog uses the ASCII character set.

Quintus Prolog addresses these problems by imitating the programming language C. The package library(ctypes) defines predicates that recognize or enumerate certain types of characters. Where possible, the names and the character sets have been borrowed from C.

Except as indicated, all of the predicates in library(ctypes) check the type of a given character, or backtrack over all the characters of the appropriate type if given a variable.


is_endfile(-Char)
Char is the end-of-file character. There is only one such character. If get0/1 returns it, the end of the input file has been reached, and the file should not be read further. No special significance is attached to this character on output; it might not be a valid output character at all (as in Quintus Prolog) or it might simply be written out along with other text.

The need for this predicate is largely obviated by the built-in predicate at_end_of_file/[0,1] in Release 3.

is_newline(-Char)
Char is the end-of-line character. There is only one such character. You can rely on it not being space, tab, or any printing character. It is returned by get0/1 at the end of an input line. The end-of-line character is a valid output character, and when written to a file ends the current output line. It should not be used to start lines, only to end them.

The need for this predicate is largely obviated by the built-in predicate skip_line/[0,1] in Release 3.

is_newpage(-Char)
Char is the end-of-page character. There is at most one such character, and when it is defined at all it is the ASCII "formfeed" character. On some systems there may be no end-of-page character. This character is returned by get0/1 at the end of an input page. It is a valid output character, and when written to a file ends the current output page. It should not be used to start pages, only to end them.
is_endline(+Char)
Some systems permit more than one end-of-line character for terminal input; one of them is always C's "newline" character, another is the end-of-file character (^D or ^Z) if typed anywhere but as the first character of a line, and the last is the "eol" character, which the user can set with the stty(1) command.

is_endline/1 accepts most ASCII control characters, but not space, tab, or delete, which covers all the line terminators likely to arise in practice. It should only be used to recognize line terminators; if passed a variable, it will raise an error exception.

The need for this predicate is largely obviated by the built-in predicate at_end_of_line/[0,1] in Release 3.

is_alnum(?Char)
is true when Char is the ASCII code of a letter or digit. It may be used to recognize alphanumerics or to enumerate them. Underscore _ is not an alphanumeric character. (See is_csym/1 below.)
is_alpha(?Char)
is true when Char is the ASCII code of a letter. It may be used to recognize letters or to enumerate them. Underscore _ is not a letter. (See is_csymf/1 below.)
is_ascii(?Char)
is true when Char is in the range 0..127. If Char is a variable, is_ascii/1 (like most of the predicates in this section) will try to bind it to each of the acceptable values in turn (that is, it will enumerate them). Whether the end-of-file character satisfies is_ascii/1 or not is system-dependent.
is_char(?Char)
is true when Char is a character code in whatever the range happens to be. (In this version: ISO 8859/1.)
is_cntrl(?Char)
is true when Char is an ASCII control character; that is, when Char is the code for DEL (127) or else is in the range 0..31. Space is not a control character.
is_csym(?Char)
is true when Char is the code for a character that can appear in an identifier. C identifiers are identical to Prolog identifiers that start with a letter. Put another way, Char is a letter, digit, or underscore. There are C compilers that allow other characters in identifiers, such as $. In such a system, C's version of iscsym/1 will accept those additional characters, but Prolog's will not.
is_csymf(?Char)
is true when Char is the code for a character that can appear as the first character of a C or Prolog identifier. Put another way, Char is a letter or an underscore.
is_digit(?Char)
is true when Char is the code for a decimal digit; that is, a character in the range 0..9.
is_digit(?Char, ?Weight)
is true when Char is the character code of a decimal digit, and Weight is its decimal value.
is_digit(?Char, ?Base, ?Weight)
is true when Char is the code for a digit in the given Base. Base should be an integer in the range 2..36. The digits (that is, the possible values of Char) are 0..9, A..Z, and a..z, where the case of a letter is ignored. Weight is the value of Char considered as a digit in that base, given as a decimal number. For example,
          is_digit(97 /* a */, 16, 10)
          is_digit(52 /* 4 */, 10,  4)
          is_digit(70 /* F */, 16, 15)
          

This is a genuine relation; it may be used all possible ways. You can even use it to enumerate all the triples that satisfy the relation. Each argument must be either a variable or an integer.

is_graph(?Char)
is true when Char is the code for a "graphic" character, that is, for any printing character other than space. The graphic characters are the letters and digits, plus
          !  "  #  $  %  &  '  (  )  *  ;  <  =  >  ?  @
          [  \  ]  ^  _  `  {  |  }  ~  +  ,  -  .  /  :
          

is_lower(+Char)
is true when Char is the code for a lowercase letter, a..z.
is_paren(?Left, ?Right)
is true when Left and Right together form one of the delimiter pairs ( and ), [ and ], or { and }.
is_period(?Char)
is_period/1 recognizes each of the three punctuation marks that can end an English sentence. That is, is_period(Char) is true when Char is an exclamation point (!), a question mark (?), or a period (.). Note that if you want to test specifically for a period character, you should use the goal
          Char is "."
          

is_print(?Char)
is true when Char is any of the ASCII "printing" characters, that is, anything except a control character. All the "graphic" characters are "printing" characters, and so is the space character. When written to ordinary terminals, each printing character takes exactly one column, and Prolog code for lining up output in nice columns is entitled to rely on this. The width of a tab, and the depiction of other control characters than tab or newline, is not defined.
is_punct(?Char)
is true when Char is the code for a non-alphanumeric printing character; that is, Char is a space or one of the characters listed explicitly under is_graph/1. Note that underscore is a "punct" and so is the space character. The reason for this is that C defines it that way, and this package eschews innovation for innovation's sake.
is_quote(?Char)
is true when Char is one of the quotation marks ` (back-quote), ' (single-quote), or " (double-quote).
is_space(?Char)
is true when Char is the code for a white space character. This includes tab (9, ^I), linefeed (10, ^J), vertical tab (11, ^K), formfeed (12, ^L), carriage return (13, ^M), and space (32). These constitute the C definition of white space. For compatibility with DEC-10 Prolog, is_space/1 also accepts the (31, ^_) character.
is_upper(?Char)
is true when Char is the code for an uppercase letter, A..Z.
is_white(?Char)
is true when Char is a space or a tab. The reason for distinguishing between this and is_space/1 is that if you skip over characters satisfying is_space/1 you will also be skipping over the ends of lines and pages (though at least you will not run off the end of the file), while if you skip over characters satisfying is_white/1 you will stop at the end of the current line.
to_lower(?Char, ?Lower)
is true when Char is any ASCII character code, and Lower is the lowercase equivalent of Char. The lowercase equivalent of an uppercase letter is the corresponding lowercase letter. The lowercase equivalent of any other character is the same character. If you have a string (list of character codes) X, you can obtain a version of X with uppercase letters mapped to lowercase letters and other characters left alone by calling the library routine
          maplist(to_lower, X, LowerCasedX)
          

In normal use of to_lower/2, Char is bound. If Char is uninstantiated, to_lower/2 will still work correctly, but will be less efficient. If you want to convert a lowercase letter Kl to its uppercase version Ku, do not use to_lower/2; to_lower(Ku, 97) has two solutions: 65 (A) and 97 (a). Use to_upper/2 instead.

to_upper(?Char, ?Upper)
is true when Char is any ASCII character code, and Upper is the uppercase equivalent of Char. The uppercase equivalent of a lowercase letter is the corresponding uppercase letter. The uppercase equivalent of any other character is the same character. If you have a string (list of character codes) X, you can obtain a version of X with lowercase letters mapped to uppercase and other characters left alone by calling the library routine
          maplist(X, to_upper, UpperCasedX)
          

The System V macro isxdigit() is not represented in this package because isdigit/3 subsumes it. The System V macros _tolower() and _toupper() are not represented because to_lower/2 and to_upper/2 subsume them.

The predicates needed for portability between operating systems are

Remember: is_endfile/1 and is_endline/1 are for recognizing the end of an input file or the end of an input line, while is_newline/1 and is_newpage/1 return the character that you should give to put/1 to end a line or page of output.