ASDParser algorithm details - Part 3: Global state and operation of
the parsing process
Copyright 2008
James A. Mason
The global state of the
parsing process is represented by the following instance variables of ASDParser.
(Its
other global variables are just names for
various String constants.)
private String stringToBeParsed;
// the utterance currently being parsed
private Stack
backstack;
//
for saving ASDParseState instances for backtracking
private ArrayList expectedTypes;
//
a list of strings that are phrase type names that
//
can be goals of the current parse
private ASDGrammar ASDLexicon;
//
the grammar/lexicon to be used for parsing
private int currentParseStepNumber;
//
the number of the current step in a parse
private boolean saveUniquelyParsedSubphrases;
// a switch that indicates whether or
not the parser is to save
// uniquely-parsed subphrases when it
backtracks to an earlier
// state of the parse, so that those
subphrases need not be
// re-parsed when they are encountered
again
private
ASDParseState state;
//
the (local) rest of the state of the current parse
private ASDSemantics
semantics;
//
the object which is to evaluate semantic action
//
and semantic value strings from the grammar/lexicon;
//
may be the ASDParser instance itself.
private Object application;
//
the target for application-specific messages,
//
if semantics is the ASDParser itself
Globally, the parsing
algorithm performs the following actions:
- load an ASDGrammar from a specified file,
- initialize a phrase structure for a given string that is to be
parsed and record the list of expectedTypes for the phrase,
- from each local state of the parse, determine the choices for
advancing the parse to the next local state,
- invoke the appropriate subalgorithm (See Part 2) to advance the parse at
each step,
- determine when the parse has ended successfully, has ended in
failure, or has reached a dead end and needs to backtrack,
- keep track (on the backstack) of local states of
the parse for which there remain alternative choices for advancing the
parse that have not yet been tried,
- send semantic action and semantic value strings to the semantics
object to be evaluated, and
- possibly invoke functions of the application
object before, during, or after a parse.
Action 3, determining the choices
for advancing the parse from each local state, is particularly tricky
because ASD grammars can contain dummy
nodes that do not correspond to actual words (or names of phrase types)
in the phrase structure, and because a new list of choices must be
computed, after an advance of type final
(See Part 2.) at a
node in the phrase structure for which a list of choices has already
been computed when it was the current
node earlier in the parsing process. For example, consider the
grammar in
Figure 1, which has been contrived to illustrate the problem
.
Figure
1
Suppose the string "a b d" is
to be parsed with this grammar to one of the expected types X, Y,
or
Z. The following
trace shows when the choice for advancing to the dummy node [$$ 1] is made by
ASDParser. The trace shows output from the ASDTester utility program (See
the software page.) which has been
annotated with the choices for advancement of the parse at each
step. For compactness of presentation, the phrase structure trees
in the trace are shown turned on their sides. In each phrase
structure the header node is at the top, the top-level phrase structure
nodes are listed in a column below the header, and lower levels in the
phrase structure are indicated by indenting. The characters *-> indicate the current node at the top level of
the phrase structure. The symbol nil beside a vocabulary element
(i.e., a word or phrase type) indicates that no corresponding grammar
node has yet been associated with that vocabulary element; otherwise, a
number indicates the node in the grammar with which that vocabulary
element is currently associated.
Initialized phrase structure:
(a b d)
*->nil nil
a nil
b nil
d nil
choices: (INITIAL,[a 1])
1 - parse advanced to:
(a b d)
nil nil
*->a 1
b nil
d nil
choices: (INITIAL,[b 1],), (DUMMY,[$$ 1],)
2 - parse advanced to:
(a b d)
nil nil
a 1
*->b 1
d nil
choices: (FINAL,,R)
3 - parse advanced to:
(a b d)
nil nil
*->a 1
R nil
b 1
d nil
choices: (NONDUMMY,[R 1],), but not (DUMMY,[$$
1],) because the latter choice has already been included after step 1
4 - parse advanced to:
(a b d)
nil nil
a 1
*->R 1
b 1
d nil
choices: (FINAL,,X)
5 - parse advanced to:
((a b) d)
*->nil nil
X nil
a 1
R 1
b 1
d nil
choices: none
parse backed up to step 1:
(a b d)
nil nil
*->a 1
b 1
d nil
remaining choices: (DUMMY,[$$ 1],)
6 - parse advanced to:
(a b d)
nil nil
a 1
*->$$ 1
b 1
d nil
choices: (NONDUMMY,[b 2],), (INITIAL,[b 1],}
7 - parse advanced to:
(a b d)
nil nil
a 1
$$ 1
*->b 2
d nil
choices: none
parse backed up to step 6:
(a b d)
nil nil
a 1
*->$$ 1
b 2
d nil
remaining choices: (INITIAL,[b 1],}
8 - parse advanced to:
(a b d)
nil nil
a 1
$$ 1
*->b 1
d nil
choices: (FINAL,,X)
9 - parse advanced to:
(a b d)
nil nil
a 1
*->$$ 1
R nil
b 1
d nil
choices: (NONDUMMY,[R 2],)
10 - parse advanced to:
(a b d)
nil nil
a 1
$$ 1
*->R 2
b 1
d nil
choices: (NONDUMMY,[d 1],)
11 - parse advanced to:
(a b d)
nil nil
a 1
$$ 1
R 2
b 1
*->d 1
choices: (FINAL,,Z)
12 - parse advanced to:
(a b d)
*->nil nil
Z nil
a 1
$$ 1
R 2
b 1
d 1
Successful parse after 12 new and 12 total advance steps,
and 2 new and 2 total backup steps.
[After an attempt to advance the parse another step,]
Parse failed after 0 new and 12 total advance steps,
and 0 new and 2 total backup steps, leaving structure:
(a b d)
*->nil nil
Z nil
a 1
$$ 1
R 2
b 1
d 1
Notice that if the choice (DUMMY,[$$
1],) were to be
included after step 3, then the backup after step 5 would go to step 3
and a later backup after
step 12 (assuming that the parser were called again to find other
possible successful parses) would go to step 1, with the result that
the dummy node [$$ 1]
would be inserted again into the top level of the
phrase structure, and exactly the same successful parse
(a b d)
*->nil nil
Z nil
a 1
$$ 1
R 2
b 1
d 1
that was found after step 12 would be arrived at a second time,
redundantly
Based on the foregoing discussion, the source code for the advance
function of ASDParser was written as shown below. It calls the choices function, also shown below,
to compute the
choices for advancing the parse another step and then calls the
appropriate sub-algorithm to perform one of the four types of advance
that was discussed in Part 2.
Notice
that if there is more than one choice for advancing the parse,
then an entry is put onto the top of the backstack
before the advance is performed, so that the other choice(s) can be
tried if the parse later backtracks to the current local state of the
parse. Notice also that, in the case of advances of types NONDUMMY, DUMMY, and INITIAL, the advance
function causes semantic action
(if any) of the grammar node that has just been entered to be
executed. Finally, notice that the advance
function sets the value of state.nextNodeSubphrase
to point to where the subphrase
pointer of the next node in the phrase structure (if any) is
pointing.. That is done before
the local parse state is pushed onto the backstack,
and before the actual advance of the local parse state is
performed. It will enable the backup function,
described in Part 4, to
determine whether or not advances of type FINAL have been made permanently, as discussed in
connection with the advanceFinal function in Part 2, or only temporarily.
The choices function itself
has two parameters, the first of which indicates whether or not choices
to advance to dummy nodes in
the grammar are to be included. The second parameter, which is
provided with a null argument value here, will be
discussed later on, in connection with the backup
function of ASDParser (See Part 4.)
/**
Attempts to advance the parse state one step.
@return SUCCEED if successful, NOADVANCE if unsuccessful but parse
should continue after backup, QUIT if parse should quit.
*/
public String advance()
{ if
(state.currentChoices == null) // choices not yet computed
state.currentChoices
= choices(true, null);
if
(state.currentChoices.size() == 0)
return
NOADVANCE; // no choices for advancing from this state
//
Remove the next advance choice from the queue of
//
current choices:
ASDParseChoice tryChoice
=
(ASDParseChoice) state.currentChoices.remove(0);
state.advanceCase = tryChoice.advanceType;
//
the type of advance choice: INITIAL, FINAL, DUMMY, or NONDUMMY
if
(state.currentChoices.size() > 0) // other choices remain
{ state.unique = false; // subphrase cannot be parsed uniquely
if
(state.currentNode.nextNode() != null)
//
current node in the phrase structure is not the last
//
one; prepare for a possible permanent final advance
//
that may occur right after the current node:
state.nextNodeSubphrase
=
state.currentNode.nextNode().subphrase();
else
state.nextNodeSubphrase
= null;
backstack.push((ASDParseState)state.clone());
}
if
(state.advanceCase == FINAL) // a subphrase has ended
{ String val = advanceFinal(tryChoice.completedType);
//
returns SUCCEED, NOADVANCE, or QUIT
if
(val == NOADVANCE || val == QUIT) return val;
}
else
{ if (state.advanceCase == NONDUMMY)
advanceNonDummy(tryChoice.nextNode);
else
if (state.advanceCase == DUMMY)
advanceDummy(tryChoice.nextNode);
else
if (state.advanceCase == INITIAL)
advanceInitial(tryChoice.nextNode);
String
action
=
tryChoice.nextNode.semanticAction();
if
(semantics != null && action != null &&
action.length() > 0)
{
String resultOfAction = semantics.semanticAction(action);
if
(resultOfAction == NOADVANCE || resultOfAction == QUIT)
return
resultOfAction;
}
}
++currentParseStepNumber;
return SUCCEED;
} // end advance.
/**
Computes queue of choices for advancing from current parse state.
Includes advances to dummy nodes if includeDummies is true.
If
dummies is a non-null ArrayList, it includes only advances to
dummy nodes in that ArrayList.
*/
ArrayList choices(boolean includeDummies,
ArrayList dummies)
{ boolean
initialsIn = false;
ArrayList result;
if
(state.currentNode == state.phraseStructure)
//
at dummy header node
{ result = initialsForTypes(state.currentNode.nextNode().word(),
expectedTypes);
ArrayList
more = initialsForTypes(ANYTHING, expectedTypes);
for
(int j = 0; j < more.size(); ++j)
result.add(more.get(j));
return
result;
}
result = new ArrayList(10);
ASDGrammarNode grammarNode = state.currentNode.instance();
if
(grammarNode == null) // shouldn't happen
{ System.out.println(
"***
grammarNode unexpectedly null in ASDParser choices");
System.exit(0);
}
if
(grammarNode.isFinal())
{ ASDParseChoice choice = new ASDParseChoice();
choice.advanceType
= FINAL;
choice.completedType
= grammarNode.phraseType();
result.add(choice);
return
result;
}
ASDPhraseNode next = state.currentNode.nextNode();
ArrayList types = grammarNode.successorTypes();
ArrayList successors = grammarNode.successors();
if
(successors == null) // shouldn't happen
{ System.out.println(
"***
successors unexpectedly null for ASD grammar entry "
+
"for word\n'" + grammarNode.word() + "' and instance "
+
grammarNode.instance() );
System.exit(0);
}
ASDGrammarSuccessor successor;
ASDGrammarNode successorState;
for (int j = 0; j < successors.size(); ++j)
{ successor
=
(ASDGrammarSuccessor) successors.get(j);
if
(successor.getWord().equals(DUMMYWORD))
//
dummy successor
{
if (includeDummies)
{
successorState = ASDLexicon.lookupInstance(successor);
boolean
includeState = false;
if
(dummies == null) // include all dummy successors
includeState
= true;
else
// include dummy successors in the dummy vector
for
(int k = 0;
!includeState
&& k < dummies.size(); ++k)
if
(successorState ==
(ASDGrammarNode)(dummies.get(k))
)
includeState
= true;
if
(includeState)
{
ASDParseChoice choice = new ASDParseChoice();
choice.advanceType
= DUMMY;
choice.nextNode
= successorState;
result.add(choice);
}
}
}
else
// next node in grammar is not a dummy
{
if (next != null)
//
there is a next node in the phrase structure.
//
Include non-dummy successor notes in the grammar
//
that match the next word in the phrase structure
//
or ANYTHING:
{
if (next.word().equals(successor.getWord()) ||
successor.getWord().equals(ANYTHING)
)
{
successorState
=
ASDLexicon.lookupInstance(successor);
ASDParseChoice
choice = new ASDParseChoice();
choice.advanceType
= NONDUMMY;
choice.nextNode
= successorState;
result.add(choice);
}
//
Also include initial instances of the next word
//
in the phrase structure when appropriate:
if
(!initialsIn)
{
ArrayList initialsToAdd = null;
ArrayList
moreInitialsToAdd = null;
if
(types == null) // current node in grammar has
//
successors of unspecified phrase types
//
(this handles an unoptimized grammar)
{
initialsIn = true;
initialsToAdd
= initialsForTypes(next.word(), null);
moreInitialsToAdd
=
initialsForTypes(ANYTHING, null);
}
else
// current node in grammar has successors of
//
specified phrase types; include initial instances
//
of next word in phrase structure which can begin
//
subphrases of those types:
{
boolean typeMatch = false;
for
(int k = 0; !typeMatch && k < types.size(); ++k)
if
(successor.getWord()
.equals((String)(types.get(k)))
)
typeMatch
= true;
if
(typeMatch)
{
initialsToAdd
=
initialsForTypes(next.word(), types);
moreInitialsToAdd
=
initialsForTypes(ANYTHING, types);
initialsIn
= true;
}
}
if
(initialsToAdd != null)
for
(int k = 0; k < initialsToAdd.size(); ++k)
result.add(initialsToAdd.get(k));
if
(moreInitialsToAdd != null)
for
(int k = 0; k < moreInitialsToAdd.size(); ++k)
result.add(moreInitialsToAdd.get(k));
}
}
}
}
// end loop through successors
return result;
} // end choices
If you examine the choices
algorithm carefully, you will see that choices for advancing the parse
are listed in an order that is determined primarily by the sequence in
which the successors of the current grammar node are listed in the
grammar. Specifically, the sequence of choices is as follows:
- If the current node in the phrase structure is the header node,
then make the list of choices be all initial
grammar nodes for the next word (or phrase-type name) in the phrase
structure which can begin a phrase of one of the phrase types that is expected for the parse;
- or, if the current node in the phrase structure corresponds to a final node in the grammar, then
make the list of choices contain the only choice for advancing the
parse: an advance of type final
that ends a phrase whose phrase type is given beside that final node in
the grammar. Otherwise consider in turn each successor node of
the current grammar node, and
- if the successor grammar node is a dummy node, and if dummy nodes are
to be included in the list of choices, include in the list of choices a
dummy advance to the
successor grammar node, provided that is appropriate according to the
value of the dummies parameter.
Otherwise,
- if there is a next node in the phrase structure, and if its word
(or phrase type) matches the word (or phrase type) in the successor
grammar node -- or if the word in the successor grammar node is the
special keyword ANYTHING -- then include
in the list of choices an advance of type nondummy to the successor node in
the grammar. Also, if initial
advances have not yet been included in the list of choices, then add to
the list of choices advances of type initial
to all appropriate initial grammar nodes for the next word in the
phrase structure.
Note: The choices
algorithm is written carefully to prevent the same initial advance being added to the
list of choices more than once. For example when the phrase "c a q" is being parsed with the
grammar shown in Figure 2, the choices
algorithm must not include in the list of choices an initial advance to node [ a 1 ] more than once after node
[ c 1 ] -- when
considering successor node [ X 1
] and when considering successor node [ Y 1 ]. Otherwise the same
successful parse of "c a q"
would
be obtained more than once.
Figure
2
ASD
Parser algorithm details -
Contents
ASD
notes and documentation
ASD
home page
last modified 2010 Aug 21