ASDParser algorithm overview
Copyright 1995, 1999, 2008
James A. Mason
last modified 2010 Aug 21 (retargeting
a link at the bottom)
Given an ASD grammar, a phrase to parse and a list of phrase types, one of which is expected to correspond to the
phrase as a whole, the ASDParser performs a depth-first
search through the grammar, starting with a grammar node
that matches the first item (e.g., word) in the phrase, trying to reach
a final node in the grammar
whose end label matches one of the expected
phrase types at the same time that the end of the
entire phrase has been reached. If the search is
successful, the parser has found a first
parse for the given phrase. It can then be made to backtrack and try to find additional
successful parsings of the given phrase. (If desired, the parsing
algorithm can be modified easily to proceed in a breadth-first manner, as will
be indicated later in these notes.)
A parse of a given phrase proceeds in a
left-to-right, bottom-up manner, with the initial phrase structure consisting of a
list of the words in the given phrase. The parsing algorithm
consists of a basic loop, on
each iteration of which the parse is advanced, if possible, in one of
four different ways:
- to an initial node in
the grammar which corresponds to the next item in the phrase structure,
thus beginning a subphrase;
- to a non-initial node in
the grammar which matches the next item in the phrase structure;
- to a dummy node in the
grammar which follows the current node, and results in insertion of a
dummy item into the phrase structure; or
- from a final node in the
grammar, which results in replacement of the current subphrase by its
syntactic type, at the top level of the phrase structure, with the
subphrase hanging below (i.e. dependent on) the replacement item.
For example (using the grammar for English
cardinal numbers), if the phrase structure before the advance is.
CARDINAL(1) MULTIPLIER(1) and(1) DECADE(1) UNIT(2)
|
|
| |
UNIT(1)
hundred(1)
twenty(1) one(1)
}
five(1)
and the advance is from node UNIT(2) in the grammar,
corresponding
to the item at
the end of the top level of the phrase structure, then the
new phrase
structure becomes
CARDINAL(1) MULTIPLIER(1) and(1) CARDINAL
|
|
|
UNIT(1)
hundred(1) -----------------
}
DECADE(1) UNIT(2)
five(1)
| |
twenty(1) one(1)
At
this
point,
the new item, CARDINAL, at the top level in the
phrase structure
does not yet
have an associated instance number, and the parse continues
from node and(1) in the grammar.
If none of these four possibilities for advancing the parse is
available, the parse has reached a
dead end. Then it must backtrack
to an earlier point of local ambiguity, if there have been any, and try
an alternative there which hasn't been tried yet.
The semantic action
algorithms for nodes in the grammar are applied in cases (1), (2), and
(3), before the parse completes its advance to the kind of node in
question. Evaluating the semantic action augmentation may
indicate that the current direction in which the parser is searching
through the grammar should not be
continued,
and that the parser should backtrack to the most recent point of local
ambiguity, if any, and try searching in a different direction from
there. Semantic action augmentations can even indicate that the
entire parse should be terminated,
if
the
grammar writer decides that is appropriate in certain
cases.
The semantic value
computations for final nodes in the grammar are applied in case (4),
when a subphrase is replaced at the top level of the phrase structure
by its syntactic type. The value computed becomes the semantic
value for the new top-level item that represents the completed
subphrase.
To represent semantic values or syntactic features
(e.g., singular vs. plural of nouns or verbs, or the case of nouns)
during a parse, the parser provides a mechanism for the semantic action and semantic value augmentations of a
grammar to set and access pairs of "features" and "values". A feature can be any string, and its
corresponding value can be any
Java object.. ASDParser provides the following instance functions
to set and access feature values and to access semantic values of nodes
in a phrase structure. At each stage in the parsing process, the
parser maintains a set of feature-value pairs for the current subphrase
being parsed at the top level of the phrase structure. That
subphrase corresponds to the current search path through a component of
the grammar.
void set(java.lang.String featureName,
java.lang.Object value)
//
sets contents of a
'feature' with the
specified name to the specified value
java.lang.Object get(java.lang.String featureName)
//
returns the value of
'feature' with specified name
java.lang.Object valueOf(java.lang.String featureName)
//
returns the value of
'feature' with specified name
java.util.HashMap features() //
returns the feature-value pairs for
the current top level of the phrase structure
- ASDPraseNode currentNode() // returns the current
node at the top level of the phrase structure
The functions get and valueOf are identical; they just
provide alternative names for getting the current value of a
feature. In addition to the foregoing instance functions of
ASDParser, the class ASDPhraseNode provides a method
- java.lang.Object value() // returns the semantic
value for this node; null if none
to return the semantic value of a node in the phrase structure.
Illustrations of uses of these various instance functions are provided
in the example Numerals.java .
The algorithm to parse a given utterance (i.e., phrase) to one of a
set of expectedTypes (i.e.,
phrase type goals) has the
following basic overall structure. The semantic action NOADVANCE
indicates that the parser should stop pursuing the current parse
direction. QUIT indicates that the entire parse should be
terminated as unsuccessful. choices
is a list of (remaining) choices for advancing the parse from its
current state.
Segment the utterance into a list of words.
This is
the initial phraseStructure.
Begin at the front of the phraseStructure
word list.
Set the choices list to
empty initially.
loop until there is just one item in the phraseStructure list (at
the top level), and that item is a member of expectedTypes
if the list of choices is empty then
compute
a
list of possible choices
for advancement
from the current state of
the parse, by considering
the successors of the
current node (if any) in the grammar,
as
well
as all initial nodes corresponding to the next item
at the top level list in the phraseStructure
endif
if the list of choices is empty then
if there are points of local ambiguity
remaining
from earlier in the parse then
backtrack
to
a remaining point of local ambiguity
in the
parse, and retrieve the
list of untried
choices for advancement from that point
endif
if the list of choices is (still) empty then
terminate
the
parse as unsuccessful
endif
endif
Remove the first possibility from the list of choices
and save the rest (if any) in case of later
backtracking.
if the current possibility for
advancing the parse
corresponds to a final node in the grammar then
Evaluate
the
semantic
value expression for the
current
grammar node, to compute the semantic value of the
completed (sub)phrase.
if the semantic value computation yields
QUIT then
terminate
the
parse as unsuccessful
else if the semantic value computation yields
NOADVANCE then
backtrack
to
the most recent point of local ambiguity
in the parse and retrieve the list of untried
choices for advancement from that point
else
Perform a
replacement advance (type 4 above), and install
the phrase
type of the completed (sub)phrase and the
computed
semantic value in the new top-level node of the
phraseStructure,
with
the (sub)phrase
linked below
the new
node.
if the replacement was at the beginning
of the
overall
phrase
then
prepare
to
continue the parse from the beginning
of
the
new top-level word list in the phraseStructure
else
prepare
to
continue the parse from the item in the
top-level word list of the
phraseStructure just
before the replacement
endif
Set
the
list of choices for
advancement of the parse
to empty.
endif
else
Apply
the
current possibility for advancing the parse,
according to its type (1), (2), or (3) above,
and evaluate the semantic action for the node to which
the parser has just moved in
the grammar.
if the semantic action yields QUIT then
terminate
the
parse as unsuccessful
else if the semantic action yields NOADVANCE then
backtrack
to
a remaining point of local ambiguity
in the parse (if any) and retrieve the list
of untried
choices for advancement from that point. If
there are
no such remaining points of
local ambiguity,
terminate the parse as
unsuccessful
else
set
the
list of choices for advancement of the parse
to empty
endif
endif
endloop
This version of the algorithm finds a first successful parse, if one
exists. It does not check for other successful parses that might
be found subsequently, by backtracking to points of local ambiguity
recorded during the first parse. So in ASDParser the algorithm
has been modified to search for additional successful parses if
desired. This has been done by making the body of the main loop
into a callable procedure by itself, so it can be called again, after
backtracking, to try to find additional parses. The algorithm for
that procedure is as follows:
advance:
if the list of choices for advancing the parse is empty then
compute
a list of possible choices for
advancement
from the current state of the parse, by considering
the successors of the current node (if any) in
the grammar,
as well
as all initial nodes corresponding to the next item
at the top level in the phraseStructure list
endif
if the list of choices is empty then
return
NOADVANCE
endif
Remove the first possibility from the list of choices. If any
other possibilities remain in the list, the parsing process is
at a point of (remaining) local ambiguity; save the remaining
possibilities in case of later backtracking.
[Note:
The data structure that is used for recording points
of
local ambiguity determines whether the parse proceeds
in
a depth-first or a breadth-first manner. ASDParser
uses a stack (last-in-first-out list), which makes
it search
depth-first. If a queue (first-in-first-out list) were used
instead, the search would proceed breadth-first.]
if the current possibility for
advancing the parse
corresponds to a final node in the grammar then
evaluate
the semantic
value expression for the
current node to
compute
the semantic value of the completed phrase
if the semantic value computed is
NOADVANCE or QUIT then
return that value as the value of this
procedure
else
perform
a
replacement advance (type 4 above), and install
the phrase type of the
completed phrase and the computed
semantic value in the new
top-level node of the
phraseStructure,
with the subphrase linked below the
new node.
if the replacement was at the beginning
of the
overall
phrase then
prepare
to
continue the parse from the beginning
of the new
top level list of the phraseStructure
else
prepare
to
continue the parse from the item
just before the new top-level node of the
phraseStructure
endif
endif
return
SUCCEED
else
apply the current possibility for advancing the parse,
according to its type (1), (2), or (3) above,
and evaluate the semantic action for the node to which
the parser has just moved in the grammar.
if the semantic action yields QUIT or
NOADVANCE then
return
the
result of the semantic action
(QUIT or NOADVANCE)
endif
return SUCCEED
endif
end of advance
Then initialize
and backup are defined as
shown below, and a general parse
procedure is defined to find either the first or the next successful
parse of a given phrase:
initialize:
Segment the utterance into a list of words.
This is
the initial phraseStructure.
Begin at the front
of the word list.
Set the list of choices to empty.
end
of
initialize
backup:
Backtrack to the
most recent point of local ambiguity
(if any) in the parse, and retrieve the list of untried
choices
for advancement from that point.
Return true if successful, false if unsuccessful.
[Note: This
procedure uses the data structure from the
advance
procedure that is used to
keep track of remaining
points of local ambiguity. It takes the top item when
the data structure is a stack. It should take the front
item if the data structure is a queue.]
end
of
backup
parse(expectedTypes):
loop
result := advance()
if result = QUIT then
return
false to indicate an unsuccessful parse
elseif result = SUCCEED then
if there is just one item in the list
at
the top level of the phraseStructure,
and that item is a member of expectedTypes then
return
true to indicate a successful parse
endif
elseif result = NOADVANCE then
if not backup() then
return
false to indicate an unsuccessful parse
endif
endif
endloop
end of parse
Of course, the initialize
procedure is called only once, before the attempt to find a first successful parse.
Attempts to find additional successful parses are made by just calling parse(expectedTypes) again after it
returns true.
Note: The parsing algorithm imposes a few restrictions on how ASD grammars should be written that
are to be used with ASDParser:
- Dummy nodes in the grammar should never be initial nodes. The parser will
only use initial nodes in the grammar that correspond to actual words
in the given utterance.
- For each initial node in
the grammar, there should be at least one final node reachable from it.
Otherwise, the parses starting at that initial node can't possibly
succeed.
- Since dummy (null) nodes in the grammar don't match any words in
the
utterance being parsed, there must not be cycles of dummy nodes in the
grammar. Otherwise the parser can get into an unending loop in
such a cycle.
ASD notes and documentation
ASD home page