Prolog's grammar rules provide a convenient notation for expressing definite clause grammars, which are useful for the analysis of both artificial and natural languages.
The usual way one attempts to make precise the definition of a language, whether it is a natural language or a programming lanaguage, is through a collection of rules called a "grammar". The rules of a grammar define which strings of words or symbols are valid sentences of the language. In addition, the grammar generally analyzes the sentence into a structure that makes its meaning more explicit.
A fundamental class of grammar is the context-free grammar (CFG), familiar to the computing community in the notation of "BNF" (Backus-Naur form). In CFGs, the words, or basic symbols, of the language are identified by "terminal symbols", while categories of phrases of the language are identified by non-terminal symbols. Each rule of a CFG expresses a possible form for a non-terminal, as a sequence of terminals and non-terminals. The analysis of a string according to a CFG is a parse tree, showing the constitutent phrases of the string and their hierarchical relationships.
Context-free grammars (CFGs) consist of a series of rules of the form:
nt --> body.
where nt is a non-terminal symbol and body is a sequence of one or more items separated by commas. Each item is either a non-terminal symbol or a sequence of terminal symbols. The meaning of the rule is that body is a possible form for a phrase of type nt. A non-terminal symbol is written as a Prolog atom, while a sequence of terminals is written as a Prolog list, whereas a terminal may be any Prolog term.
Definite clause grammars (DCGs) are a generalization of context-free grammars and rules corresponding to DCGs are referred to as "Grammar Rules". A grammar rule in Prolog takes the general form
head --> body.
meaning "a possible form for head is body".
Both body and head are
sequences of one or more items linked by the standard Prolog
conjunction operator ,
(comma).
Definite clause grammars extend context-free grammars in the following ways:
[]
. If the terminal symbols
are ASCII character codes, such lists can be written (as elsewhere) as
strings. An empty sequence is written as the empty list ([]
or
""
).
{
and }
).
;
(semicolon)
as in Prolog. (The disjunction operator can also be written as |
(vertical-bar).)
!
(exclamation point) may be included in the
right-hand side of a grammar
rule, as in a Prolog clause. The cut symbol does not need to be
enclosed in curly brackets. The conditional arrow ->
can also be used
in grammar rules, without the curly brackets. However, all other control
predicates, repeat/0
for example, can only be used within curly
brackets. If you use the goal repeat/0
without the brackets it will be
taken to be a non-terminal symbol.