- Reading CFG Rules Modifications
- Context-Free Grammar FIRST and FOLLOW Set Rules
- Predictive Parsing Table
- Predictive Top-Down Parser and Panic Mode Error Recovery
- Used Data Structures
- Combine Two Phases of Compiler
- Convert Input grammar To LL(1) Grammar Automatically
Refactors the parseCFGInput
function in the Utilities
class to improve code organization and readability. The
primary changes involve
- Introducing a lambda function
processProduction
to encapsulate common logic for processing production lines. - Replacing existing production processing logic with calls to the new lambda function.
- Adjusting the handling of new rules (lines starting with '#') to match the new processing approach.
- Ensuring each production is added to the
grammar
map by copying the vector, preventing issues with empty productions. - Adding support for processing multiple lines in the grammar.
-
Terminal Symbol
- If
x
is a terminal, thenFIRST(x) = { 'x' }
.
- If
-
Production Rule with Є (empty string)
- If
X -> Є
is a production rule, add Є toFIRST(X)
.
- If
-
Production Rule with Non-terminal Symbols
- If
X -> Y1 Y2 Y3….Yn
is a production, thenFIRST(X) = FIRST(Y1)
.
- If
-
Handling Є in FIRST Sets
- If
FIRST(Y1)
contains Є, thenFIRST(X) = { FIRST(Y1) – Є } U { FIRST(Y2) }
. - If
FIRST(Yi)
contains Є for alli = 1 to n
, then add Є toFIRST(X)
.
- If
-
Starting Non-Terminal
FOLLOW(S) = { $ }
whereS
is the starting Non-Terminal.
-
Between Non-terminals in a Production
- If
A -> pBq
is a production, wherep
,B
, andq
are any grammar symbols, then everything inFIRST(q)
except Є is inFOLLOW(B)
.
- If
-
At the End of a Production
- If
A -> pB
is a production, then everything inFOLLOW(A)
is inFOLLOW(B)
.
- If
-
Handling Є in Productions
- If
A -> pBq
is a production andFIRST(q)
contains Є, thenFOLLOW(B)
contains{ FIRST(q) – Є } U FOLLOW(A)
.
- If
These rules are crucial for constructing predictive parsing tables and resolving parsing conflicts in LL(k) grammars. They define the behavior of a predictive parser by providing information about the possible expansions and token predictions at each step in the parsing process.
Note Epsilon can be changed in how it is represented in
constants.h
.
lookUp(nonTerminal, terminal)
- Returns the production rule for the given non-terminal and terminal.
- If the given non-terminal and terminal is not present in the table, then returns value contains null productions
and Enum for the cell if it is
Synch
orEmpty
.
getCellType(nonTerminal, terminal)
- Returns the type of the cell for the given non-terminal and terminal. The type of the cell can be
Synch
,Empty
orNon Empty
.
- Returns the type of the cell for the given non-terminal and terminal. The type of the cell can be
hasProduction(nonTerminal, terminal)
- Returns true if the given non-terminal and terminal has production rule in the table.
isCellEmpty(nonTerminal, terminal)
- Returns true if the given non-terminal and terminal is empty.
isSynchronizing(nonTerminal, terminal)
- Returns true if the given non-terminal and terminal is synchronizing.
printPredictiveTable()
- Prints the predictive parsing table.
- Input: First and Follow sets of non-terminals.
- Output: Predictive Parsing Table.
- Method
- For each production
A -> α
do- For each terminal
a
inFIRST(α)
do- Add
A -> α
toM[A, a]
.
- Add
- If
Є
is inFIRST(α)
then- For each terminal
b
inFOLLOW(A)
do- Add
A -> α
toM[A, b]
.
- Add
- For each terminal
- If
Є
is inFIRST(α)
and$
is inFOLLOW(A)
then- Add
A -> α
toM[A, $]
.
- Add
- For each terminal
- If there is more than one production in
M[A, a]
then the grammar is not LL(1) and prefer the existed production. - Mark the empty cells as error.
- Mark the entries
M[A, a]
andM[A, $]
as synchronizing ifa
is in the follow set ofA
.
- For each production
S --> i C t S E | a
E --> e S | Epsilon
C --> b
Non-Terminal | First Set | Follow Set |
---|---|---|
S | a i |
$ e |
E | Epsilon e |
$ e |
C | b |
t |
- There
is Conflict at E, e --> e S
and Preferred production E, e --> Epsilon
Grammar not LL(1)
Non-Terminal | a | b | e | i | t | $ |
---|---|---|---|---|---|---|
S | a |
Synch |
i C t S E |
Synch |
||
E | Epsilon |
Epsilon |
||||
C | b |
Synch |
E --> T E`
E` --> + T E` | Epsilon
T --> F T`
T` --> * F T` | Epsilon
F --> ( E ) | id
Non-Terminal | First Set | Follow Set |
---|---|---|
E | ( id |
$ ) |
E` | + Epsilon |
$ ) |
T | ( id |
$ + ) |
T` | * Epsilon |
$ + ) |
F | ( id |
$ * + ) |
Non-Terminal | id | + | * | ( | ) | $ |
---|---|---|---|---|---|---|
E | T E' |
T E' |
Synch |
Synch |
||
E` | + T E' |
Epsilon |
Epsilon |
|||
T | F T' |
Synch |
F T' |
Synch |
Synch |
|
T` | Epsilon |
* F T' |
Epsilon |
Epsilon |
||
F | ( E ) |
Synch |
Synch |
id |
Synch |
Synch |
parseInputTokens
- Get next token from lexical.
- While stack is not empty
- If top is terminal
- If Top == curr_token: match.
- elif top == $: handle end of stack.
- else handle missing terminal.
- Else if top is non-terminal
- Cell = M[top, curr_token].
- If cell is empty: handle empty cell.
- Elif cell is sync: handle sync cell.
- Elif cell is valid production: handle valid production.
- If top is terminal
-
Handle missing terminal
- If top is terminal and does not match with the input symbol: pop stack.
-
Handle Empty cell
- If top is non-terminal and M[A, a] is empty: get next token.
-
Handle Sync cell
- If top is non-terminal and M[A, a] is sync: pop stack.
-
Handle end of stack
- If top == $ and curr_token != $: get next token until $ is encountered.
-
Handle end of input
- If curr_token == $ and stack is not empty: pop stack.
Report an error message in each of the previous cases.
-
Parsing stack
- Usage: Stack Item is a struct of string and boolean representing the token and is it terminal or not.
-
Leftmost derivation
- Structure: It’s a vector of vectors of strings.
- Usage: It holds the used productions while parsing.
-
Create
PredictiveTopDownParser
which produces the leftmost derivation for a correct input.- It takes the
PredictiveTable
and the set ofnon_terminals
as input. - If the input grammar is not LL(1), an appropriate error message will be produced.
- It's main method is
processNextToken(token)
which matches the input symbol with top of stack and applies the panic mode error recovery in case of ERROR
- It takes the
- Add a new method to get next token
getNextToken()
which saves the state of the last position in the buffer and token so next tokens can be generated. - Add position field to Token class to save the position of the token in the buffer. so can be used to generate error messages.
- Return
$
token when EOF is reached. - Change return type of
acceptToken()
,maximalMunchWithErrorRecovery
andaddToken
toToken
instead ofvoid
so the token can be used in the parser. - Save symbol table in a markdown file
SymbolTable.md
.
- Add Parser class to parse the tokens generated by the lexical analyzer.
- Save predictive parsing table in a markdown file
PredictiveTable.md
.
- Refactor the
parseInputTokens()
function. - Add the position of the curr_token to the error message while handling panic mode error recovery.
- Generating a leftmost derivation markdown into a file called
LeftmostDerivation.md
- Generating a leftmost parsing markdown into a file called
LL1ParsingOutput.md
- Get file name from command line arguments (e.g.
../input/rules.txt ../input/CFG.txt ../input/program.txt
). - Use Lex and Parser class to generate tokens and parse them.
- Added LeftRecursionRemover class that has removeLR method that takes a grammar and returns a new grammar that is left recursion free.
- Added LeftFactorer class that has leftFactor method that takes a grammar and returns a new left factored grammar.