Top-down parsing

Author

Ken Pu

1 Review of first and follow sets

1.1 Definitions and intuitions

First-sets of strings

\[\mathrm{first}: \Sigma^*\to\mathrm{SET}(\Sigma_0\cup\{\epsilon\})\]

It maps any string \(\alpha\) to tokens (or \(\epsilon\)) that can be the first of the sentences in \(L(\alpha)\).

Remember:

If \(c\in\mathrm{first}(\alpha)\), then we have \(\alpha\Rightarrow c\dots\)

Follow-sets of syntactic variables

\[\mathrm{follow}:\Sigma_1\to\mathrm{SET}(\Sigma_0\cup\{$\})\]

\(\mathrm{follow}(X)\) is all the tokens that can possibly follow \(X\) during the derivation of a sentence of a grammar.

Remember:

If \(c\in\mathrm{follow}(X)\), then \(S\Rightarrow \dots Xc\dots\)

1.2 An example

Consider the grammar.

\[ \begin{eqnarray} S &:& a\ |\ ( S ) \end{eqnarray} \]

We can see that:

  • \(S\rightarrow a\): therefore \(a\in\mathrm{first}(S)\).
  • \(S\rightarrow (S)\): therefore \({\color{red}\bf(}\in\mathrm{first}(S)\).

\[\mathrm{first}(S) = \{\ a, \mathbf{\color{red} (}\ \}\]

  • \(S\rightarrow (S\mathbf{\color{red})}\), therefore \(\mathbf{\color{red})}\in\mathrm{follow}(S)\).

\[\mathrm{follow}(S) = \{\ \mathbf{\color{red})}\ \} \]

1.3 Another example

\[ \begin{eqnarray} S &:& XY \\ X &:& aY\ |\ bY\ |\ Y\ |\ Z \\ Y &:& a \\ Z &:& \epsilon\ |\ bZ \end{eqnarray} \]

\(\Sigma_1\) \(\mathrm{first}(\cdot)\) \(\mathrm{follow}(\cdot)\)
S \(a\), \(b\) \(\$\)
X \(a, b, \epsilon\) \(a\)
Y \(a\) \(a,\$\)
Z \(b, \epsilon\) \(a\)

2 Topdown parsing

Recall, we need to construct:

\[ \mathrm{select}: (X, c)\to\mathrm{production\ of\ }G \]

2.1 A manual attempt to select productions

Consider the grammar:

\[ \begin{eqnarray} S &:& XY \tag{1}\\ X &:& aY \tag{2}\\ &|& bY \tag{3}\\ &|& Y \tag{4}\\ &|& Z \tag{5}\\ Y &:& a \tag{6}\\ Z &:& \epsilon \tag{7}\\ &|& bZ \tag{8} \end{eqnarray} \]

Suppose we have the following situation to:

  • Need to expand a \(X\) node in a parse tree to consume a symbol \(a\).
          S
        / | \
       /  |  \
          X
        / | \
       ?  ?  ?   <--- which rule to choose?
       |
       |
       a ....
       
LL(1) Grammar

When selecting the production, we are only allowed to:

  • Pick the left most syntactic variable to expand.
  • We are only allowed to look ahead one token to make the decision.
  • There can only be one rule that can produce the next token.

Note:

  • \(\mathrm{first}(X) = \{a, b, \epsilon\}\)

Therefore, we know that \(X\Rightarrow a\dots\) However, which rule should we pick?

We can compute the first of all the alternations of \(X\), and see which contains \(a\).

  • (2): \(\mathrm{first}(aY) = \{a\}\), ✅
  • (3): \(\mathrm{first}(bY) = \{b\}\)
  • (4): \(\mathrm{first}(Y) = \{a\}\), ✅
  • (5): \(\mathrm{first}(Z) = \{b, \epsilon\}\) ❓ (see below)

By looking at the first sets, we already discovered two productions that can be used to generate the next token \(a\).

But wait, there is more. Let’s look at the follow sets.

\[ \mathrm{follow}(X) = \{a\} \]

Note that \(\epsilon\in\mathrm{first}(X)\), so \(X\Rightarrow\epsilon\). This is done by \(X:Z\). Since \(a\in\mathrm{follow}(X)\), together with the fact \(\epsilon\in\mathrm{first}(X)\), the following is possible:

\[ \begin{eqnarray} S &\Rightarrow& \dots Xa\dots \quad \mathrm{use}\ X:Z\\ &\rightarrow& \dots Za\dots \quad \mathrm{but}\ \epsilon\in\mathrm{first}(Z)\\ &\Rightarrow& \dots a\dots \quad \mathrm{produce}\ a\ \mathrm{from}\ X\\ \end{eqnarray} \]

So, it’s also possible that the derivation uses \(X:Z\) (5).

Therefore, we have three possible choices:

  • \(X: aY\)
  • \(X: Y\)
  • \(X: Z\).

This grammar is not a \(LL(1)\) grammar.

2.2 The \(LL(1)\) grammar

Definition of \(LL(1)\)
  • Given any two alternations: \(Z:\alpha | \beta\) \[ \mathrm{first}(\alpha)\cap\mathrm{first}(\beta) = \emptyset \]

  • If \(\epsilon\in\mathrm{first}(Z)\), then we must have \[ \forall(Z:\alpha)\in G\quad \mathrm{follow}(Z)\cap\mathrm{first}(\alpha)=\emptyset \]

Now, we can complete the select function.

\begin{algorithm} \caption{Select a production} \begin{algorithmic} \Require{$X\in\Sigma_1$} \Require{$a\in\Sigma_0$} \Procedure{select}{$X, a$} \IF{$\exists(X:\alpha)\ \mathrm{such\ at}\ a\in\mathrm{first}(\alpha)$} \RETURN{$X:\alpha$} \ELIF{$\epsilon\in\mathrm{first}(X)$ and $a\in\mathrm{follow}(X)$} \IF{$\exists(X:\alpha)\ \mathrm{such\ at}\ \epsilon\in\mathrm{first}(\alpha)$} \RETURN{$X\in\alpha$} \ELSE \RETURN{parsing error} \ENDIF \ELSE \RETURN{parsing error} \ENDIF \EndProcedure \end{algorithmic} \end{algorithm}

2.3 Predictive Parsing of LL(1) Grammars

Since there are only \(|\Sigma_1|\times |\Sigma_0|\) number of possible inputs to \(\mathbf{select}(X,a)\), we can precompute the selected production for all possible combinations of \(\Sigma_1\times\Sigma_0\).

This is typically represented as a table, known as the predictive parsing table.

\(a\) \(b\) \(c\) \(\dots\)
\(X\) \(\mathbf{select}(X,a)\) \(\mathbf{select}(X,b)\) \(\mathbf{select}(X,c)\) \(\dots\)
\(Y\)
\(Z\)

Thus, each call to \(\mathbf{select}(\dots)\) would only take \(\mathcal{O}(1)\) constant time.