First and Follow Sets

Author

Ken Pu

1 General overview of top-down parsing algorithm

We want to start with:

a grammar $G$ and its start symbol $S$.
an input string $x = x_1 x_2 \dots x_n$ where $x_i\in\Sigma_0$.

We want to generate a parse tree $T$ such that its root is $S$, and the leaf nodes are $x_1, x_2, \dots, x_n$.

     S
    /|\
   .....
 /        \
|   |   |  |
x1 x2 ... xn

At each intermediate node, we have:

      X
    / | \
  z1 ... zk

where $X:z_1\dots z_k$ is a production in $G$.

The top-down parsing grows the tree from the root and downwards to the leaf nodes.

         S
       / | \
      .  .  .
      ?  ?  ?
     /   |   \
    x0 ...... xn

1.1 Parsing strategy

We need a way to select a production from $G$:

\[ \mathrm{select} : (X, \mathrm{context}) \to (X:\alpha) \]

where:

$X\in\Sigma_1$ is the leaf to expand.
$\mathrm{context}$ is the context information (more on this later) that can help us to choose the production.
$X:\alpha$ is the production to be used to expand $X$.

        S
      / | \
     .  .  .
    /   |   \
        X  <--------- how do we expand X
       /|\
        ?

1.2 Left-most expansion

Let’s first define what a frontier is of a tree:

Definition: frontier

Given a tree $T$, its frontier $\mathrm{frontier}(F)$ is a list of its leaf nodes.

For example:

The frontier would be:

\[ \mathrm{frontier}(T) = [x, y, B, y, A] \]

Heuristics in growing parse tree

We always the grow an incomplete parse tree by expanding the left most syntactic variable in the its frontier: \[ X = \mathrm{leftmost\_synvar}(\mathrm{frontier}(T)) \]

2 First and Follow sets

The first and follow sets help us figure out what tokens a syntactic variable can derive, and this will help us to select productions.

An example grammar:

Tokens: $\Sigma_0=\{\mathrm{id},\ *,\ (,\ )\}$
Syntactic variables: $\Sigma_1 = \{E,\ E',\ T,\ T',\ F\}$

\[\begin{eqnarray} E &:& T E' \\ E' &:& T E'\ |\ \epsilon \\ T &:& F T' \\ T' &:& * F T' \ |\ \epsilon \\ F &:& (\ E\ ) \ |\ \mathrm{id} \end{eqnarray}\]

2.1 First sets of strings

First sets of strings

The $\mathrm{first}$ is a function that maps strings in $(\Sigma_1\cup\Sigma_0)^*$ to sets of terminals.

\[ \mathrm{first}:\Sigma^*\to\mathbf{SET}(\Sigma_0) \]

\[ \mathrm{first}(\alpha) \simeq \{x[0] : x\in L(\alpha)\} \]

The intuition is that $\mathrm{first}(\alpha)$ is all possible terminals that can be the first symbol in $L(x)$.

What if $\alpha\Rightarrow \epsilon$?
It needs some special treatment because $\epsilon[0]$ is undefined.

We will revise the definition to handle this.

First sets (extended)

\[ \mathrm{first}:\Sigma^*\to\mathbf{SET}(\Sigma_0\cup\{\epsilon\}) \]

\[ \mathrm{first}(\alpha) = \left\{\begin{array}{ll} \{x[0] : x\in L(\alpha)\ \mathrm{and}\ x\not=\epsilon\}\cup\{\epsilon\} &\mathrm{if}\ \alpha\Rightarrow\epsilon \\ \{x[0] : x\in L(\alpha)\} &\mathrm{else} \end{array}\right. \]

2.2 Follow-sets of syntactic variables

Follow-sets of syntactic variables

The $\mathrm{follow}$ function maps syntactic variables to sets of terminals.

\[ \mathrm{follow} : \Sigma_1 \to \mathbf{SET}(\Sigma_0) \]

Given a syntactic variable $Z\in\Sigma_1$. If $S\Rightarrow \alpha {\color{red} Z a}\beta$ for some $\alpha, \beta$, and $a\in\Sigma_0$, then $a\in\mathrm{follow}(Z)$.

The intuition is that $\mathrm{follow}(Z)$ is all possible terminals that can follow $Z$ in a sentential form of the grammar $G$.

What if $S\Rightarrow\alpha Z$?
It needs special attention because nothing follows $Z$.

Follow-sets (extended)

\[ \mathrm{follow} : \Sigma_1 \to \mathbf{SET}(\Sigma_0\cup\{\$\}) \]

Given a syntactic variable $Z\in\Sigma_1$.

If $S\Rightarrow \alpha {\color{red} Z a}\beta$ for some $\alpha, \beta$, and $a\in\Sigma_0$, then $a\in\mathrm{follow}(Z).$
If $S\Rightarrow \alpha {\color{red} Z}$, then $\$\in\mathrm{follow}(Z)$

Hint: use derivation of $E$ to show that what follows $E$ can also follow $T$.

3 Algorithms to compute $\mathrm{first}$ and $\mathrm{follow}$

3.1 Computing $\mathrm{FirstSet}$ for a single symbol

\begin{algorithm} \caption{Computing the FIRST sets} \begin{algorithmic} \Require{$X\in\Sigma_1\cup\Sigma_0$} \Procedure{FirstSet}{$X$} \State $F\gets\emptyset$ \IF{$X\in\Sigma_0$} \STATE{$F\gets \{X\}$} \ELSE \FORALL{$X:Y_1 Y_2 \dots Y_k\in G$} \IF{$k = 0$} \STATE $F\gets F\cup\{\epsilon\}$ \ELSE \FORALL{$i\in[1, k]$} \STATE $F_y = \mathrm{FIRSTSET}(Y_i)$ \STATE $F\gets F\cup({\color{red} F_y-\{\epsilon\}})$ \IF{$\epsilon\not\in F_y$} \BREAK \ENDIF \ENDFOR \IF {$i = k$} \STATE $F\gets F\cup\{\epsilon\}$ \ENDIF \ENDIF \ENDFOR \ENDIF \EndProcedure \end{algorithmic} \end{algorithm}

What is the termination condition?
The algorithm terminates if and only if the grammar is not left-recursive.

3.2 $\mathrm{FirstSet}$ for strings

\begin{algorithm} \caption{Computing the FIRST sets} \begin{algorithmic} \Require{$\alpha\in(\Sigma_1\cup\Sigma_0)^*$} \Procedure{FirstSet}{$\alpha$} \STATE let $Y_1 Y_2 \dots Y_n = \alpha$ \STATE $F\gets\emptyset$ \FOR{$i\in[1\to n]$} \STATE $F_y = \mathrm{FIRSTSET}(Y_i)$ \STATE $F\gets F\cup(F_y - \{\epsilon\})$ \IF{$\epsilon\not\in F_y$} \BREAK \ENDIF \ENDFOR \IF{$i = n$} \STATE $F\gets F\cup\{\epsilon\}$ \ENDIF \EndProcedure \end{algorithmic} \end{algorithm}

3.3 Computing $\mathrm{FOLLOW}$ Sets

The follow-sets are computed by an iterative algorithm that uses the $\mathrm{FIRSTSET}$ sets. The algorithm computes all FOLLOW-sets simultaneously.

\begin{algorithm} \caption{Computing the FOLLOW sets} \begin{algorithmic} \Require{$G$ is a set of productions} \Require{$S$ is the start symbol} \Procedure{FollowSets}{$G, S$} \STATE $F\gets\mathrm{empty\ HashMap}\left<\Sigma_1\to\mathrm{SET}(\Sigma_0)\right>$ \STATE $F[S]\gets\{\$\}$ \REPEAT \FORALL{$X:Y_1 Y_2\dots Y_k\in G$} \FORALL{$i\in[1\dots k]$} \IF{$Y_i\in\Sigma_1$} \STATE \COMMENT{Update the follow-set of $Y_i$} \STATE $F'[Y_i]\gets F[Y_i]\cup\mathrm{FirstSet}(Y_{i+1}\dots Y_k)$ \IF{$i = k$ or $\epsilon\in\mathrm{FirstSet}(Y_{i+1}\dots Y_k)$} \STATE $F'[Y_i]\gets F[Y_i]\cup F[X]$ \ENDIF \ENDIF \ENDFOR \ENDFOR \IF{$F' = F$} \BREAK \ELSE \STATE $F\gets F'$ \ENDIF \UNTIL{} \EndProcedure \end{algorithmic} \end{algorithm}

First and Follow Sets

1 General overview of top-down parsing algorithm

1.1 Parsing strategy

1.2 Left-most expansion

2 First and Follow sets

2.1 First sets of strings

2.2 Follow-sets of syntactic variables

3 Algorithms to compute \(\mathrm{first}\) and \(\mathrm{follow}\)

3.1 Computing \(\mathrm{FirstSet}\) for a single symbol

3.2 \(\mathrm{FirstSet}\) for strings

3.3 Computing \(\mathrm{FOLLOW}\) Sets