ASDParser algorithm details

ASDParser algorithm details - Part 3: Global state and operation of the parsing process

Copyright 2008 James A. Mason

The global state of the parsing process is represented by the following instance variables of ASDParser. (Its other global variables are just names for various String constants.)

   private String stringToBeParsed;
      // the utterance currently being parsed
   private Stack backstack;
      // for saving ASDParseState instances for backtracking
   private ArrayList expectedTypes;
      // a list of strings that are phrase type names that
      // can be goals of the current parse
   private ASDGrammar ASDLexicon;
      // the grammar/lexicon to be used for parsing
   private int currentParseStepNumber;
      // the number of the current step in a parse
   private boolean saveUniquelyParsedSubphrases;
      // a switch that indicates whether or not the parser is to save
      // uniquely-parsed subphrases when it backtracks to an earlier
      // state of the parse, so that those subphrases need not be
      // re-parsed when they are encountered again
   private ASDParseState state;
      // the (local) rest of the state of the current parse
   private ASDSemantics semantics;
      // the object which is to evaluate semantic action
      // and semantic value strings from the grammar/lexicon;
      // may be the ASDParser instance itself.
   private Object application;
      // the target for application-specific messages,
      // if semantics is the ASDParser itself

Globally, the parsing algorithm performs the following actions:

load an ASDGrammar from a specified file,
initialize a phrase structure for a given string that is to be parsed and record the list of expectedTypes for the phrase,
from each local state of the parse, determine the choices for advancing the parse to the next local state,
invoke the appropriate subalgorithm (See Part 2) to advance the parse at each step,
determine when the parse has ended successfully, has ended in failure, or has reached a dead end and needs to backtrack,
keep track (on the backstack) of local states of the parse for which there remain alternative choices for advancing the parse that have not yet been tried,
send semantic action and semantic value strings to the semantics object to be evaluated, and
possibly invoke functions of the application object before, during, or after a parse.

Action 3, determining the choices for advancing the parse from each local state, is particularly tricky because ASD grammars can contain dummy nodes that do not correspond to actual words (or names of phrase types) in the phrase structure, and because a new list of choices must be computed, after an advance of type final (See Part 2.) at a node in the phrase structure for which a list of choices has already been computed when it was the current node earlier in the parsing process. For example, consider the grammar in Figure 1, which has been contrived to illustrate the problem .

[ example grammar illustrating problem of dummy choices for advancing parse ]

                                      Figure 1

Suppose the string "a b d" is to be parsed with this grammar to one of the expected types X, Y, or Z. The following trace shows when the choice for advancing to the dummy node [$$ 1] is made by ASDParser. The trace shows output from the ASDTester utility program (See the software page.) which has been annotated with the choices for advancement of the parse at each step. For compactness of presentation, the phrase structure trees in the trace are shown turned on their sides. In each phrase structure the header node is at the top, the top-level phrase structure nodes are listed in a column below the header, and lower levels in the phrase structure are indicated by indenting. The characters *-> indicate the current node at the top level of the phrase structure. The symbol nil beside a vocabulary element (i.e., a word or phrase type) indicates that no corresponding grammar node has yet been associated with that vocabulary element; otherwise, a number indicates the node in the grammar with which that vocabulary element is currently associated.

Initialized phrase structure:

(a b d)
*->nil nil
   a nil
   b nil
   d nil

choices: (INITIAL,[a 1])

1 - parse advanced to:

(a b d)
   nil nil
*->a 1
   b nil
   d nil

choices: (INITIAL,[b 1],), (DUMMY,[$$ 1],)

2 - parse advanced to:

(a b d)
   nil nil
   a 1
*->b 1
   d nil

choices: (FINAL,,R)

3 - parse advanced to:

(a b d)
   nil nil
*->a 1
   R nil
      b 1
   d nil

choices: (NONDUMMY,[R 1],), but not (DUMMY,[$$ 1],) because the latter choice has already been included after step 1

4 - parse advanced to:

(a b d)
   nil nil
   a 1
*->R 1
      b 1
   d nil

choices: (FINAL,,X)

5 - parse advanced to:

((a b) d)
*->nil nil
   X nil
      a 1
      R 1
         b 1
   d nil

choices: none

parse backed up to step 1:

(a b d)
   nil nil
*->a 1
   b 1
   d nil

remaining choices: (DUMMY,[$$ 1],)

6 - parse advanced to:

(a b d)
   nil nil
   a 1
*->$$ 1
   b 1
   d nil

choices: (NONDUMMY,[b 2],), (INITIAL,[b 1],}

7 - parse advanced to:

(a b d)
   nil nil
   a 1
   $$ 1
*->b 2
   d nil

choices: none

parse backed up to step 6:

(a b d)
   nil nil
   a 1
*->$$ 1
   b 2
   d nil

remaining choices: (INITIAL,[b 1],}

8 - parse advanced to:

(a b d)
   nil nil
   a 1
   $$ 1
*->b 1
   d nil

choices: (FINAL,,X)

9 - parse advanced to:

(a b d)
   nil nil
   a 1
*->$$ 1
   R nil
      b 1
   d nil

choices: (NONDUMMY,[R 2],)

10 - parse advanced to:

(a b d)
   nil nil
   a 1
   $$ 1
*->R 2
      b 1
   d nil

choices: (NONDUMMY,[d 1],)

11 - parse advanced to:

(a b d)
   nil nil
   a 1
   $$ 1
   R 2
      b 1
*->d 1

choices: (FINAL,,Z)

12 - parse advanced to:

(a b d)
*->nil nil
   Z nil
      a 1
      $$ 1
      R 2
         b 1
      d 1

Successful parse after 12 new and 12 total advance steps,
and 2 new and 2 total backup steps.

[After an attempt to advance the parse another step,]
Parse failed after 0 new and 12 total advance steps,
and 0 new and 2 total backup steps, leaving structure:

(a b d)
*->nil nil
   Z nil
      a 1
      $$ 1
      R 2
         b 1
      d 1

Notice that if the choice (DUMMY,[$$ 1],) were to be included after step 3, then the backup after step 5 would go to step 3 and a later backup after step 12 (assuming that the parser were called again to find other possible successful parses) would go to step 1, with the result that the dummy node [$$ 1] would be inserted again into the top level of the phrase structure, and exactly the same successful parse

(a b d)
*->nil nil
   Z nil
      a 1
      $$ 1
      R 2
         b 1
      d 1

that was found after step 12 would be arrived at a second time, redundantly

Based on the foregoing discussion, the source code for the advance function of ASDParser was written as shown below. It calls the choices function, also shown below, to compute the choices for advancing the parse another step and then calls the appropriate sub-algorithm to perform one of the four types of advance that was discussed in Part 2. Notice that if there is more than one choice for advancing the parse, then an entry is put onto the top of the backstack before the advance is performed, so that the other choice(s) can be tried if the parse later backtracks to the current local state of the parse. Notice also that, in the case of advances of types NONDUMMY, DUMMY, and INITIAL, the advance function causes semantic action (if any) of the grammar node that has just been entered to be executed. Finally, notice that the advance function sets the value of state.nextNodeSubphrase to point to where the subphrase pointer of the next node in the phrase structure (if any) is pointing.. That is done before the local parse state is pushed onto the backstack, and before the actual advance of the local parse state is performed. It will enable the backup function, described in Part 4, to determine whether or not advances of type FINAL have been made permanently, as discussed in connection with the advanceFinal function in Part 2, or only temporarily.

The choices function itself has two parameters, the first of which indicates whether or not choices to advance to dummy nodes in the grammar are to be included. The second parameter, which is provided with a null argument value here, will be discussed later on, in connection with the backup function of ASDParser (See Part 4.)

   /**
      Attempts to advance the parse state one step.
      @return SUCCEED if successful, NOADVANCE if unsuccessful but parse
      should continue after backup, QUIT if parse should quit.
    */
   public String advance()
   { if (state.currentChoices == null) // choices not yet computed
         state.currentChoices = choices(true, null);
      if (state.currentChoices.size() == 0)
         return NOADVANCE; // no choices for advancing from this state
      // Remove the next advance choice from the queue of
      // current choices:
      ASDParseChoice tryChoice
         = (ASDParseChoice) state.currentChoices.remove(0);
      state.advanceCase = tryChoice.advanceType;
         // the type of advance choice: INITIAL, FINAL, DUMMY, or NONDUMMY
      if (state.currentChoices.size() > 0) // other choices remain
      { state.unique = false; // subphrase cannot be parsed uniquely
         if (state.currentNode.nextNode() != null)
            // current node in the phrase structure is not the last
            // one; prepare for a possible permanent final advance
            // that may occur right after the current node:
            state.nextNodeSubphrase
               = state.currentNode.nextNode().subphrase();
         else
            state.nextNodeSubphrase = null;
         backstack.push((ASDParseState)state.clone());
      }
      if (state.advanceCase == FINAL) // a subphrase has ended
      { String val = advanceFinal(tryChoice.completedType);
            // returns SUCCEED, NOADVANCE, or QUIT
         if (val == NOADVANCE || val == QUIT) return val;
      }
      else
      { if (state.advanceCase == NONDUMMY)
            advanceNonDummy(tryChoice.nextNode);
         else if (state.advanceCase == DUMMY)
            advanceDummy(tryChoice.nextNode);
         else if (state.advanceCase == INITIAL)
            advanceInitial(tryChoice.nextNode);
         String action
            = tryChoice.nextNode.semanticAction();
         if (semantics != null && action != null && action.length() > 0)
         { String resultOfAction = semantics.semanticAction(action);
            if (resultOfAction == NOADVANCE || resultOfAction == QUIT)
               return resultOfAction;
         }
      }
      ++currentParseStepNumber;
      return SUCCEED;
   } // end advance.

    /**
      Computes queue of choices for advancing from current parse state.
      Includes advances to dummy nodes if includeDummies is true.
      If dummies is a non-null ArrayList, it includes only advances to
      dummy nodes in that ArrayList.
    */
   ArrayList choices(boolean includeDummies, ArrayList dummies)
   { boolean initialsIn = false;
      ArrayList result;
      if (state.currentNode == state.phraseStructure)
         // at dummy header node
      { result = initialsForTypes(state.currentNode.nextNode().word(),
                     expectedTypes);
         ArrayList more = initialsForTypes(ANYTHING, expectedTypes);
         for (int j = 0; j < more.size(); ++j)
            result.add(more.get(j));
         return result;
      }

      result = new ArrayList(10);
      ASDGrammarNode grammarNode = state.currentNode.instance();
      if (grammarNode == null) // shouldn't happen
      { System.out.println(
            "*** grammarNode unexpectedly null in ASDParser choices");
         System.exit(0);
      }
      if (grammarNode.isFinal())
      { ASDParseChoice choice = new ASDParseChoice();
         choice.advanceType = FINAL;
         choice.completedType = grammarNode.phraseType();
         result.add(choice);
         return result;
      }

      ASDPhraseNode next = state.currentNode.nextNode();
      ArrayList types = grammarNode.successorTypes();
      ArrayList successors = grammarNode.successors();
      if (successors == null) // shouldn't happen
      { System.out.println(
         "*** successors unexpectedly null for ASD grammar entry "
         + "for word\n'" + grammarNode.word() + "' and instance "
         + grammarNode.instance() );
         System.exit(0);
      }
      ASDGrammarSuccessor successor;
      ASDGrammarNode successorState;
      for (int j = 0; j < successors.size(); ++j)
      { successor
            = (ASDGrammarSuccessor) successors.get(j);
         if (successor.getWord().equals(DUMMYWORD))
            // dummy successor
         { if (includeDummies)
            { successorState = ASDLexicon.lookupInstance(successor);
               boolean includeState = false;
               if (dummies == null) // include all dummy successors
                  includeState = true;
               else // include dummy successors in the dummy vector
                  for (int k = 0;
                       !includeState && k < dummies.size(); ++k)
                     if (successorState ==
                           (ASDGrammarNode)(dummies.get(k)) )
                        includeState = true;
               if (includeState)
               { ASDParseChoice choice = new ASDParseChoice();
                  choice.advanceType = DUMMY;
                  choice.nextNode = successorState;
                  result.add(choice);
               }
            }
         }
         else // next node in grammar is not a dummy
         { if (next != null)
              // there is a next node in the phrase structure.
              // Include non-dummy successor notes in the grammar
              // that match the next word in the phrase structure
              // or ANYTHING:
            { if (next.word().equals(successor.getWord()) ||
                      successor.getWord().equals(ANYTHING) )
              { successorState
                     = ASDLexicon.lookupInstance(successor);
                 ASDParseChoice choice = new ASDParseChoice();
                 choice.advanceType = NONDUMMY;
                 choice.nextNode = successorState;
                 result.add(choice);
              }
              // Also include initial instances of the next word
              // in the phrase structure when appropriate:
              if (!initialsIn)
              { ArrayList initialsToAdd = null;
                 ArrayList moreInitialsToAdd = null;
                 if (types == null) // current node in grammar has
                        // successors of unspecified phrase types
                        // (this handles an unoptimized grammar)
                 { initialsIn = true;
                    initialsToAdd = initialsForTypes(next.word(), null);
                    moreInitialsToAdd
                        = initialsForTypes(ANYTHING, null);
                 }
                 else // current node in grammar has successors of
                     // specified phrase types; include initial instances
                     // of next word in phrase structure which can begin
                     // subphrases of those types:
                 { boolean typeMatch = false;
                    for (int k = 0; !typeMatch && k < types.size(); ++k)
                       if (successor.getWord()
                                       .equals((String)(types.get(k))) )
                          typeMatch = true;
                    if (typeMatch)
                    { initialsToAdd
                           = initialsForTypes(next.word(), types);
                       moreInitialsToAdd
                           = initialsForTypes(ANYTHING, types);
                       initialsIn = true;
                    }
                 }
                 if (initialsToAdd != null)
                    for (int k = 0; k < initialsToAdd.size(); ++k)
                       result.add(initialsToAdd.get(k));
                 if (moreInitialsToAdd != null)
                    for (int k = 0; k < moreInitialsToAdd.size(); ++k)
                       result.add(moreInitialsToAdd.get(k));
               }
            }
         }
      } // end loop through successors

      return result;
   } // end choices

If you examine the choices algorithm carefully, you will see that choices for advancing the parse are listed in an order that is determined primarily by the sequence in which the successors of the current grammar node are listed in the grammar. Specifically, the sequence of choices is as follows:

If the current node in the phrase structure is the header node, then make the list of choices be all initial grammar nodes for the next word (or phrase-type name) in the phrase structure which can begin a phrase of one of the phrase types that is expected for the parse;
or, if the current node in the phrase structure corresponds to a final node in the grammar, then make the list of choices contain the only choice for advancing the parse: an advance of type final that ends a phrase whose phrase type is given beside that final node in the grammar. Otherwise consider in turn each successor node of the current grammar node, and
if the successor grammar node is a dummy node, and if dummy nodes are to be included in the list of choices, include in the list of choices a dummy advance to the successor grammar node, provided that is appropriate according to the value of the dummies parameter. Otherwise,
if there is a next node in the phrase structure, and if its word (or phrase type) matches the word (or phrase type) in the successor grammar node -- or if the word in the successor grammar node is the special keyword ANYTHING -- then include in the list of choices an advance of type nondummy to the successor node in the grammar. Also, if initial advances have not yet been included in the list of choices, then add to the list of choices advances of type initial to all appropriate initial grammar nodes for the next word in the phrase structure.

Note: The choices algorithm is written carefully to prevent the same initial advance being added to the list of choices more than once. For example when the phrase "c a q" is being parsed with the grammar shown in Figure 2, the choices algorithm must not include in the list of choices an initial advance to node [ a 1 ] more than once after node [ c 1 ] -- when considering successor node [ X 1 ] and when considering successor node [ Y 1 ]. Otherwise the same successful parse of "c a q" would be obtained more than once.

[ example grammar for illustrating choices algorithm ]

Figure 2

ASD Parser algorithm details - Contents

ASD notes and documentation

ASD home page

last modified 2010 Aug 21