Syntax Directed Definition with ANTLR

Author

Ken Pu

1 Actions

Consider a production:

expr : A '+' B

We would like to register blocks of code to selected symbols so that whenever their respective parse tree node is created, the block of code is executed.

Actions assigned to production rule
  • There are two actions assigned to the symbols \(A\) and \(B\) in the production of expr.

  • When node \(A\) and \(B\) are created in the parse tree, we want to execute the actions.

  • Do we execute actions before or after children of A are constructed?

2 Actions as empty symbols

Empty symbols

If \(X:\epsilon\) is the only production of \(X\), then \(X\) is an empty symbol.

Theorem

Adding empty symbols in the productions of \(G\) does not affect the language of \(G\).

Let’s add actions to a grammar as empty symbols.

Actions as symbols
  • We add empty symbols as place holders for actions.

  • The action gets executed when its empty symbol is constructed by the top-down parser.

3 Actions in ANTLR

Actions are denoted by blocks of code.

{ print("Hello"); }

They can be embedded in the body of productions.

expr
    : A { println("A is created."); }
      '+'
      B { println("B is created."); }
    ;

Actions are treated as syntactic variables that can only derive the empty string \(\epsilon\).

Therefore, the grammar:

expr
    : A { println("A is created."); }
      '+'
      B { println("B is created."); }
    ;

is treated as:

expr    : A action1 '+' B action2 
        ;
action1 :
        ;
action2 :
        ;
Note
  • action1 and action2 trigger code executions.
  • since they are treated as symbols, grammar transformations still work as normal.
  • predictive parsing still works as normal.

4 A complete example in ANTLR

The following grammar counts the number of + and * occurrences in the input token stream.

MyGrammar.g4

grammar MyGrammar;

@members {
    int mult = 0;
    int plus = 0;
}

expr : expr '+' expr { plus += 1; }
     | expr '*' expr { mult += 1; }
     | '(' expr ')'
     | Number
     ;
     
Number : ('0' .. '9')+ ;
WS : (' ' | '\t' | '\r' | '\n') -> skip;

5 Data attributes at nodes

  • We can extend the nodes of a parse tree so that they can hold data.

  • each syntactic variable can have its own set of data attributes.

  • Here is an example where expr has a data attribute $value of type int.

  • Note, we can see that $value is computed from the children nodes.

Data attribute in parse tree

6 Attributes in ANTLR

In ANTLR, this is denoted as follows:

expr returns [int value, ...]
    : expr '+' expr
    | ...
    ;

We want to use actions to compute the $value of the top-level expr from the children expr nodes.

Q: How we do address the two children nodes?

A: use labels.

expr returns [int value, ...]
    : x=expr '+' y=expr
    | ...
    ;

7 Attributes, labels and action!

expr returns [int value]
    : x=expr '+' y=expr { $value = $x.value + $y.value; }
    | '(' expr ')'      { $value = $expr.value; }
    | Number            { $value = Integer.parseInt($Number.text); }
    ;

8 Conclusion