Top-down parsing
1 Review of first and follow sets
1.1 Definitions and intuitions
Remember:
If \(c\in\mathrm{first}(\alpha)\), then we have \(\alpha\Rightarrow c\dots\)
Remember:
If \(c\in\mathrm{follow}(X)\), then \(S\Rightarrow \dots Xc\dots\)
1.2 An example
Consider the grammar.
\[ \begin{eqnarray} S &:& a\ |\ ( S ) \end{eqnarray} \]
We can see that:
- \(S\rightarrow a\): therefore \(a\in\mathrm{first}(S)\).
- \(S\rightarrow (S)\): therefore \({\color{red}\bf(}\in\mathrm{first}(S)\).
\[\mathrm{first}(S) = \{\ a, \mathbf{\color{red} (}\ \}\]
- \(S\rightarrow (S\mathbf{\color{red})}\), therefore \(\mathbf{\color{red})}\in\mathrm{follow}(S)\).
\[\mathrm{follow}(S) = \{\ \mathbf{\color{red})}\ \} \]
1.3 Another example
\[ \begin{eqnarray} S &:& XY \\ X &:& aY\ |\ bY\ |\ Y\ |\ Z \\ Y &:& a \\ Z &:& \epsilon\ |\ bZ \end{eqnarray} \]
\(\Sigma_1\) | \(\mathrm{first}(\cdot)\) | \(\mathrm{follow}(\cdot)\) |
---|---|---|
S | \(a\), \(b\) | \(\$\) |
X | \(a, b, \epsilon\) | \(a\) |
Y | \(a\) | \(a,\$\) |
Z | \(b, \epsilon\) | \(a\) |
2 Topdown parsing
Recall, we need to construct:
\[ \mathrm{select}: (X, c)\to\mathrm{production\ of\ }G \]
2.1 A manual attempt to select productions
Consider the grammar:
\[ \begin{eqnarray} S &:& XY \tag{1}\\ X &:& aY \tag{2}\\ &|& bY \tag{3}\\ &|& Y \tag{4}\\ &|& Z \tag{5}\\ Y &:& a \tag{6}\\ Z &:& \epsilon \tag{7}\\ &|& bZ \tag{8} \end{eqnarray} \]
Suppose we have the following situation to:
- Need to expand a \(X\) node in a parse tree to consume a symbol \(a\).
S
/ | \
/ | \
X
/ | \
? ? ? <--- which rule to choose?
|
|
a ....
Note:
- \(\mathrm{first}(X) = \{a, b, \epsilon\}\)
Therefore, we know that \(X\Rightarrow a\dots\) However, which rule should we pick?
We can compute the first of all the alternations of \(X\), and see which contains \(a\).
- (2): \(\mathrm{first}(aY) = \{a\}\), ✅
- (3): \(\mathrm{first}(bY) = \{b\}\) ❌
- (4): \(\mathrm{first}(Y) = \{a\}\), ✅
- (5): \(\mathrm{first}(Z) = \{b, \epsilon\}\) ❓ (see below)
By looking at the first sets, we already discovered two productions that can be used to generate the next token \(a\).
But wait, there is more. Let’s look at the follow sets.
\[ \mathrm{follow}(X) = \{a\} \]
Note that \(\epsilon\in\mathrm{first}(X)\), so \(X\Rightarrow\epsilon\). This is done by \(X:Z\). Since \(a\in\mathrm{follow}(X)\), together with the fact \(\epsilon\in\mathrm{first}(X)\), the following is possible:
\[ \begin{eqnarray} S &\Rightarrow& \dots Xa\dots \quad \mathrm{use}\ X:Z\\ &\rightarrow& \dots Za\dots \quad \mathrm{but}\ \epsilon\in\mathrm{first}(Z)\\ &\Rightarrow& \dots a\dots \quad \mathrm{produce}\ a\ \mathrm{from}\ X\\ \end{eqnarray} \]
So, it’s also possible that the derivation uses \(X:Z\) (5).
Therefore, we have three possible choices:
- \(X: aY\)
- \(X: Y\)
- \(X: Z\).
This grammar is not a \(LL(1)\) grammar.
2.2 The \(LL(1)\) grammar
Now, we can complete the select function.
2.3 Predictive Parsing of LL(1) Grammars
Since there are only \(|\Sigma_1|\times |\Sigma_0|\) number of possible inputs to \(\mathbf{select}(X,a)\), we can precompute the selected production for all possible combinations of \(\Sigma_1\times\Sigma_0\).
This is typically represented as a table, known as the predictive parsing table.
\(a\) | \(b\) | \(c\) | \(\dots\) | |
---|---|---|---|---|
\(X\) | \(\mathbf{select}(X,a)\) | \(\mathbf{select}(X,b)\) | \(\mathbf{select}(X,c)\) | \(\dots\) |
\(Y\) | ||||
\(Z\) |
Thus, each call to \(\mathbf{select}(\dots)\) would only take \(\mathcal{O}(1)\) constant time.