A grammar and evaluator for
arithmetic expressions
This example involves an ASD grammar, expression.grm, for
arithmetic expressions like
2.5x[100. - (.3 + 27 / (4-1))]
and a Java application, Evaluator.java, which uses
the grammar and associated semantic computations to evaluate such
expressions.
The expression grammar is best examined with ASDEditor, but here is an
image of the grammar::
NUMBER
is a phrase type reserved by ASDParser. The parser accepts any
string of contiguous digits as a NUMBER [but see Endnote 1
below]. ASDParser permits NUMBER to
occur only in initial nodes
in an ASD grammar. So the grammar redefines NUMBER here
as a DIGITSTRING . [See
also Endnote 2 below.] Consequently, the
component of the grammar that includes node (DIGITSTRING
1) defines any string of digits with an optional leading,
intermediate, or trailing decimal point as a FACTOR
[but see Endnote 3 below].
That is on the assumption that an application like Evaluator.java which uses the
grammar will first scan an utterance and construct a new utterance in
which the decimal points (as well as minus signs) have been separated
by spaces from the digits
surrounding them to become separate lexical items.
The grammar also defines three other kinds of subexpressions as FACTORs:
(1) a negated FACTOR, (2) an EXPRESSION
in square brackets, and (3) an EXPRESSION surrounded by
left and right parentheses. Because of the way in which ASD
grammar files are constructed, using parentheses to delimit items in a
file, parentheses cannot be used directly as lexical items in an ASD
grammar. So this grammar assumes that the application, in its
initial scan of a given utterance, will replace each left parenthesis
by LPAREN and each right
parenthesis by RPAREN. That same
initial scan will separate digit strings, square brackets, and special
characters (like those that represent the arithmetic operators) from
one another by spaces, prior to the utterance being parsed by ASDParser.
Other components of the grammar define a sequence of one or more FACTORs,
separated by operators x or /,
as a TERM, and a sequence of one
or more TERMs, separated by operators
+ or -,
as an EXPRESSION. Notice
how those components of the grammar express repetition and how the
dummy nodes ($$ 1) and ($$ 2)
are used as terminating nodes.
The isolated node (* 1) illustrates how easily
another symbol like the asterisk can be made synonymous with the x
that is used as the multiplication symbol elsewhere in the
grammar. That purpose does not even require a semantic action or semantic value to be associated with
the node.
Most of the nodes in the grammar have been augmented with semantic action or semantic value expressions which
correspond to member functions of the Java application, Evaluator.java,
that will use the grammar. Those can be seen by examining the
grammar with ASDEditor, or by looking at the appropriate fields of the
grammar in its text file, expression.grm. The
following table lists nodes in the grammar and the semantic actions or
semantic values associated with them. The nodes which have no
associated semantic actions or values are not listed in the table.
Grammar
Node Semantic Action
Semantic Value
($$ 1)
expression_$$_1_v
($$ 2)
expression_$$_2_v
($$ 3)
expression_$$_3_v
(+ 1)
expression_plus_1
(- 1)
expression_minus_1
(. 2)
expression_DECIMALPOINT_2
(/ 1)
expression_divided_by_1
(] 1)
expression_RIGHT_BRACKET_v
(DIGITSTRING 1) expression_DIGITSTRING_1
(DIGITSTRING 2)
expression_DIGITSTRING_2_v
(EXPRESSION 1) expression_EXPRESSION
(EXPRESSION 2) expression_EXPRESSION
(FACTOR 1)
expression_FACTOR_1
(FACTOR 2)
expression_FACTOR_2_v
(NUMBER 1)
expression_NUMBER_1_v
(RPAREN 1)
expression_RIGHT_BRACKET_v
(TERM 1)
expression_TERM_1
(x 1)
expression_times_1
For
example, the node (DIGITSTRING 1) has an
associated semantic action field, 'expression_DIGITSTRING_1',
which
corresponds to the member function with the header
public String expression_DIGITSTRING_1()
in Evaluator.java. Similarly, the node ($$ 2) has an associated semantic value field, 'expression_$$_2_v',
which
corresponds to the member function with header
public Object expression_$$_2_v()
in Evaluator.java. Note: ASDParser
requires semantic action
functions to have return type String,
while semantic value functions
must have return type Object.
Notice the naming conventions that have
been used in the example. Each semantic action or semantic value
function begins with the name of the grammar file, in this case expression
(without the .grm extension), in which the corresponding node
occurs. That is done to distinguish functions for similar nodes
which may occur in different grammar files of a modularly constructed
grammar. Underscores in the function names join the file name to
a word or phrase type like DIGITSTRING
and to the instance number of its node (1 or 2)
Another convention is that the names of semantic value functions end with "_v" to
distinguish them from semantic
action functions which may be associated with the same grammar
node. Remember, these are only
conventions. Nothing about ASD grammars or ASDParser
requires them to be followed, but they are useful conventions for
building large and modular grammars. You will notice that some of
the semantic actions and semantic values in the table violate the
naming conventions a bit. For instance, nodes (EXPRESSION 1)
and (EXPRESSION
2) both happen to invoke the same semantic action expression_EXPRESSION.
Similarly, nodes (] 1) and (RPAREN 1)
both invoke the same semantic value function expression_RIGHT_BRACKET_v.
Also, nodes like (+ 1) and (] 1)
that have special characters in them have spelled-out semantic action
and value names like expression_plus_1 and expression_RIGHT_BRACKET_v,
because the special characters can't be used in Java function names.
The actual functions from Evaluator.java
that correspond to the semantic action and semantic value fields
in the expression.grm
grammar are all listed below. [See also Endnote 4.]. These
functions,
invoked by an ASDParser as it parses an expression by following the
grammar, do the work of calculating the numerical value of an
expression. Most of the rest of the source code in Evaluator.java
is for managing the graphical user interface, in addition to some for
invoking, driving, and accessing an instance of ASDParser. You
will notice that several of the functions have bodies identical with
those of other functions. The duplication could be eliminated by
appropriately renaming one of the functions and changing the semantic
actions or semantic values in the relevant nodes of the grammar.
public Object
expression_$$_1_v()
{ return
get("previous");
}
public Object
expression_$$_2_v()
{ return
get("previous");
}
public Object
expression_$$_3_v()
{ String
integerPart = (String) get("integerPart");
return new Double(integerPart);
}
public String
expression_DECIMALPOINT_2()
{
set("integerPart", "0");
return null;
}
public String
expression_DIGITSTRING_1()
{
set("integerPart", nodeValue());
return null;
}
public Object
expression_DIGITSTRING_2_v()
{ String
integerPart = (String) get("integerPart");
String fractionPart = (String) nodeValue();
return new Double(integerPart + "." + fractionPart);
}
public String
expression_divided_by_1()
{
set("operator", currentWord());
return null;
}
public String
expression_EXPRESSION()
{ set("value",
nodeValue());
return null;
}
public String
expression_FACTOR_1()
{ Double
previous = (Double) get("previous");
String operator = (String) get("operator");
Double current = (Double) nodeValue();
if
(previous == null) // first factor
previous = current;
else
{ double resultDouble = 0;
if ("*".equals(operator))
resultDouble = previous.doubleValue() * current.doubleValue();
else if ("/".equals(operator))
{ if (current.doubleValue() == 0)
{ window.getOutputPane().append(
"\n*** Attempt to divide by 0 ***\n");
window.setValueField("*** Attempt to divide by 0 ***");
return parser.QUIT;
}
resultDouble = previous.doubleValue() / current.doubleValue();
}
else // shouldn't happen
System.out.println("Invalid operator " + operator);
previous = new Double(resultDouble);
}
set("previous", previous);
return null;
}
public Object
expression_FACTOR_2_v()
{ Double
factor = (Double) nodeValue();
return new Double(- factor.doubleValue());
}
public Object
expression_NUMBER_1_v()
{ return
currentWord();
}
public String
expression_minus_1()
{
set("operator", currentWord());
return null;
}
public String
expression_plus_1()
{
set("operator", currentWord());
return null;
}
public Object
expression_RIGHT_BRACKET_v()
{ return
get("value");
}
public String
expression_TERM_1()
{ Double
previous = (Double) get("previous");
String operator = (String) get("operator");
Double current = (Double) nodeValue();
if
(previous == null) // first term
previous = current;
else
{ double resultDouble = 0;
if ("+".equals(operator))
resultDouble = previous.doubleValue() + current.doubleValue();
else if ("-".equals(operator))
resultDouble = previous.doubleValue() - current.doubleValue();
else // shouldn't happen
System.out.println("Invalid operator " + operator);
previous = new Double(resultDouble);
}
set("previous", previous);
return null;
}
public String
expression_times_1()
{
set("operator", "*");
return null;
}
Endnotes:
- Actually ASDParser will
accept any string of contiguous digits (limited as described in Endnote
3 below), including an optional
initial minus sign as a NUMBER, but
the Evaluator.java
application separates minus signs from strings of following digits
before it passes an expression to the parser. That is done so the
grammar will not incorrectly accept expressions like 23.-45, with a
minus sign after a decimal point, ambiguously as single numbers as well
as
subtractions.
- You will notice that the keyword NUMBER
does not appear in phrase structure trees that result from parsing with
the example grammar. In place of NUMBER, the parser ensures that the actual
numbers found in the given utterance
appear in their proper places in the phrase structure tree.
- Actually, the maximum length accepted by Java for integer
constants is 9 digits, or in limited cases 10 digits. So Evaluator.java will accept at
most nine or ten digits before a decimal point and at most nine or ten
digits after a decimal point. In any case, you will notice that
long numbers result in small round-off errors when Evaluator converts numbers to
Java double
form.
- The Java functions for computing semantic actions and semantic
values call on several other functions, with names set, get, currentWord,
and nodeValue,
which are short functions defined in Evaluator.java to abbreviate
calls to functions that are defined in ASDParser. For example,
the function call get("previous")
abbreviates the direct call parser.get("previous"),
and the call currentWord()
abbreviates parser.currentNode().word().
[link to ASD home page]
page
last modified 2005 Aug 23