Augmented Syntax
Diagram Grammars
James A. Mason
Professor Emeritus of
Computer
Science
York University
I have coined the term "Augmented Syntax Diagram" (abbreviated ASD) to
describe a cross between Augmented Transition Networks (ATNs)
[Woods,
1970; Bates, 1978], which have been used to represent grammars of
natural
languages such as English, and syntax diagrams which have been
used
to represent grammars of programming languages [Goldberg & Robson,
1983; Jensen & Wirth, 1975; Wikipedia article "Syntax Diagram"].
While ASD grammars are equally as powerful
as ATNs, they are conceptually simpler. They involve fewer primitives,
and they permit a grammar to be combined with the lexicon of a
language,
rather than being represented as a structure separate from the lexicon.
They can represent any context-free grammar, and in addition they can
be
augmented like ATNs with arbitrary computational tests and actions
which
are to be performed as utterances are parsed.
Figure 1 shows part of an ATN grammar and Figure 2 shows its ASD
equivalent.
In both cases only the networks are shown, not the augmentations. In
both
kinds of network, initial nodes are shown with bold outlines. S stands
for "Sentence", NP for "Noun Phrase", VP for "Verb Phrase", PP for
"Prepositional
Phrase", DET for "Determiner", and ADJ for "Adjective".
Figure 1
Figure 2
Both of the figures are incomplete in that they involve non-terminal
vocabulary
elements PP and VP which represent phrase types, but they do not show
the
grammar networks for recognizing those phrase types. Those parts of the
grammar networks have been omitted here for brevity.
The following are some of the main differences between ASD grammars
and ATN grammars:
-
ASD networks have node labels but no edge labels;
ATN networks require both node (state) labels and edge (arc) labels.
The node labels of an ASD grammar correspond to the edge labels of
an ATN.
- ASD networks are designed for bottom-up parsing;
ATN networks are more suitable for top-down parsing.
So the label for the phrase that is recognized by a part of the network
is placed after a final node in an ASD network, but in an initial
node in an ATN.
- ASD networks have null or dummy nodes (labeled with $$ symbols
inside --
shown as $ in the diagrams above), which operate like JUMP arcs in ATN
grammars. In both cases they match empty strings in the utterance being
parsed.
-
ASD networks require only three kinds of node labels: (1) terminal
vocabulary
elements like "by" in the example, (2) non-terminal vocabulary elements
which represent phrase types, and (3) $$ (shown as $ in the diagrams
above)
to represent null nodes. ASD networks use numeric subscripts, or
instance
numbers, in node labels to distinguish between different nodes that are
labeled with the same vocabulary element.
ATN networks require five kinds of arc labels: (1) PUSH arcs, (2) CAT
arcs, (3) WORD arcs, (4) JUMP arcs, and (5) POP arcs, in addition to
node
labels.
- ASD networks can be represented in non-graphical form by a single
lexicon
containing both terminal and non-terminal vocabulary elements;
ATN networks require a representation for the grammar that is separate
from the lexicon.
I have implemented efficient ASD parsers in several programming
languages,
including Java, LISP and Smalltalk. I have also implemented graphical
editors
for ASD grammars which run in Java, and in Smalltalk/V or Visual
Smalltalk
under Microsoft Windows. I also have examples of ASD grammars for parts
of the English language, as well as examples of small applications
involving
English-language understanding which I have implemented with ASD
grammars.
References
-
Bates, M., The theory and practice of augmented transition network
grammars.
In Natural Language Communication with Computers, L. Bolc, Ed.,
Springer-Verlag, 1978, pp. 191-259.
-
Goldberg, A. and Robson, D., Smalltalk-80: The Language and its
Implementation,
Addison-Wesley, 1983.
-
Jensen, K. and Wirth, N., Pascal User Manual and Report (2nd
Edition),
Springer-Verlag, 1975.
-
Mason, J.A., Software to Understand English, York University,
1995,
282pp.
- Syntax
Diagram, Widipedia: http://en.wikipedia.org/wiki/Syntax_diagram
-
Woods, W.A., Transition network grammars for natural language analysis,
Communications
of the ACM 13:10 (Oct. 1970), pp. 591-606.
created 1996 Aug 30
last revised 2011 Jan 1