What is Syntax Analysis in Compiler? Definition, Types, Error Handling & Recovery

The syntax analysis phase is the second phase of a compiler. It takes input from the lexical analyzer. And provides an output that serves as input to the semantic analyzer.

Syntax analysis is also referred to as syntax analyzer or parser. It reads the string of tokens from the lexical analyzer. And confirm that it can be generated from the grammar used for the source language.

Content: Syntax Analysis in Compiler Design

What is Syntax Analysis?
Types of Parser
Syntax Error Handling
Error Recover Method
Key Takeaways

What is Syntax Analysis?

The syntax analysis phase gets its input from the lexical analyzer which is a string of tokens. It verifies whether the provided string of tokens is grammatically correct or not.

If that string of tokens is grammatically incorrect, it reports the syntax error.
If that string of tokens is grammatically correct, it produces a parse tree for that string.

Later the syntax analyzer forwards this parse tree to the next front end for processing. We also refer to the syntax analyzer as the parser.

Besides building the parse tree syntax analyzer even collects information about each token. And stores this information in the symbol table. Along with this it even performs:

Type checking.
Does semantic analysis.
Generates an intermediate code.

In case, syntax analyzer identifies syntax errors, it performs error recovery methods. These methods help syntax analyzers to handle the syntax error.

Types of Parsing

The three common types of parsing are as follow:

Universal Parsing
Top-down Parsing
Bottom-up Parsing

Universal Parsing

Though universal parsing can parse any type of grammar. But it is quite ineffective to be used in the production compiler. So usually, we only use two methods for parsing top-down and bottom.

Top-down Parsing

In the top-down method, the parser builds the parse tree starting from the top. That means it starts from the root of the parse tree, traversing towards the bottom i.e. the leaves of the parse tree.

Bottom-up Parsing

In the bottom-up method, the parser builds the parse tree starting from the bottom. This implies it starts from the leaves of the parse tree, traversing upwards to the top i.e. root of the parse tree.

Note: Whatever type of parsing, the parser chooses, it starts scanning the parse tree from the left. And it will continue traversing the tree towards the right. Remember it will scan only one symbol or node at a time.

Syntax Error Handling

The compiler is specially designed to assist the programmer in tracking down the errors. To gain syntactic accuracy error handling is left to the compiler designer.

If we concentrate on error handling right from the starting of compiler designing:
It will help in simplifying the structure of the compiler.
And it will also improve its error handling capability.

Different Types of Errors

Lexical Error: It occurs when spellings of the identifiers, keywords, operators are misspelt.
Syntactic Error: It arises when you forget to put semicolons, braces, commas etc.
Semantic Error: This occurs when there is a mismatch between operators and operands.
Logical Error: It occurs when the programmer provides incorrect reasoning in the program.

However, the main task of the parser is to detect syntactic error efficiently. The error handling in parser involves:

The parser must report the errors in the program accurately.
The parser must recover from each error quickly. So that it can detect subsequent errors.
There must be minimal overhead to process a correct program.

The error handler must report the location of the error in the program. This helps the compiler to detect the line at which an error had occurred. Also, it points to the line at which error occurs.

Error Recovery Method

Even though developers design programming languages with error handling capability. But the errors are inevitable in the programs despite the programmer’s efforts. Thus, compilers are designed to track down and locate the errors in the program.

It is the parser that efficiently detects a syntactic error from the program. After detecting the error, the parser must correct or recover that error.

With the first approach, the parser can quit after detecting the first error. Thereby it leaves an informative message describing the error location.

But if the errors keep on rising, the compiler must give up after exceeding a particular limit. As it is not useful to create an avalanche of errors.

Error Recovery Strategies:

1. Panic-Mode Recovery
Here, the parser discards the input symbol one at a time until it discovers a synchronizing token. Usually, the synchronizing tokens are the delimiters. Though this method is simple. Yet it skips a large amount of input without scanning it for additional errors.

2. Phrase-Level Recovery
In phrase-level recovery, the parser performs a local correction on the remaining input.

Once the parser encounters an error it replaces the prefix of the remaining input with a string. And this is done in such a way that the parser should not stop parsing and must continue.

Well, the choice of replacement is left to the compiler designer. But it must be taken care that the replacement should not lead to an infinite loop. This method can correct almost any input string. Still, this method is not able to recover from an error that has occurred before the point of detection.

3. Error Production
There are some erroneous productions that commonly occurs. These error productions are augmented with the grammar for a language. This facilitates the parser to detect the anticipated error.

4. Global Correction
Here, the parser must make least changes while correcting an invalid string to a valid string. Although these methods are costly to implement and is only in theories.

Key Takeaway

Syntax analysis is the second phase of a compiler design.
The input provided to the syntax analyzer is a stream of tokens from the lexical analyzer.
Parser verifies if the provided stream of tokens is as per the grammar for the source language.
If no syntax error occurs in the input string of tokens, the parser generates a parse tree for the same.
If syntax error occurs then the parser handles it by applying error recovery methods.

So, this is all about the syntax analysis phase of the compiler. We have learned about its functions in the compiler. Also, we got to know how it examines the input string to verify if there are any syntactical errors present or not.

We have also discussed how the parser corrects the error with error recovery methods.