A compiler in a computer is defined as a program that intakes a program written in a language say source language and transforms it into the equivalent program but in another language says target language. Apart from this translation the compiler also specifies errors present in the source program.
In the section ahead, we will be discussing compiler in detail such as we will learn about the phases of the compiler that describes its working, types of the compiler, and advantages of a compiler.
Content: Compiler in Computer
What is a Compiler in a Computer?
A compiler in a computer simply transforms a program from its source language to a target language. While transforming a program from its source language to a target language it does not change the meaning of the source program and both, source and target program are equivalent.
Generally, a compiler is used to transform a program in a high language to a program in a low-level language. Apart from this transformation the compiler also specifies the errors present in the source program.
However, there is a wide variety of source languages and target languages. Source language ranges from the traditional programming languages such as Java, C#, Perl, Visual Basic to some specialized languages that have evolved in almost every field of a computer application such as ruby, python.
The target language can also be a programming language or it can be a machine language of a certain computer that ranges in between the microprocessor and a supercomputer.
Depending upon the structure of the compiler or its purpose we can classify compilers as single-pass, two-pass and multi-pass, we will be discussing these types in the section ahead.
Phases of a Compiler
The compiler in the computer just does not intake a source program and directly generates an equivalent target program. Instead, it operates in phases. At each phase, the compiler decomposes the source program in a process to produce a target program.
1. Lexical Analysis
The lexical analysis is the initial phase in the analysis of the source program. The lexical analysis is also referred to as linear analysis. This is because the source program is made up of a stream of characters. These streams of characters are read character by character from left to right. The sequence of characters that possess some meaning is grouped into tokens.
Now how a stream of characters is categorized into tokens?
Let us understand this with the help of an example. Consider a small assignment statement that calculates the interest (simple interest) on the principal amount.
This assignment statement above can be broken into the following tokens.
- Identifier -> Interest
- Assignment Symbol -> =
- Identifier -> Principal
- Operator -> *
- Identifier -> Rate
- Operator -> *
- Identifier -> Time
The spaces separating the characters forming tokens are eliminated during the linear analysis. The lexical analysis phase is followed by syntax analysis.
2. Syntax Analysis
The syntax analysis phase is also referred to as hierarchical analysis or parsing. In this phase, the tokens formed during the linear analysis stage are grouped into ‘grammatical phrases’ that is is represented with the help of a ‘parse tree’.
So, the tokens formed in the example above are grouped into grammatical phrases. The parse tree in the figure below represents the organization of tokens in a hierarchical way such that they collectively form a meaning.
3. Semantics Analysis
In the above two stages, we have divided the source program into tokens and has organized them hierarchically as represented in the parse tree above. In semantic analysis, certain checks are performed on the components of the program to identify whether these components collectively make meaning or not.
The most important factor of semantic analysis is ‘type checking’. The compiler makes sure that for every operator there are operands that are allowed by the source language.
4. Symbol Table Management
To maintain the records of all the entities in a program the compiler creates and maintains a data structure that we refer to as a symbol table.
The entities of the program could be:
- Identifiers
- Functions in the program
- Number of arguments in the functions
- Method of passing the arguments
- Type of the argument that will be returned by the function (if any).
The symbol table helps in determining the location of each entity present in the program easily and quickly.
5. Error Detection and Reporting
As the source program passes through the phases of the compiler several errors will be encountered. Most of the errors are encountered during the syntax and the semantics phase of the compiler.
The error detection and reporting phase deals with the error detected and let the compiler proceed and detect further errors in the source program.
6. Intermediate Code Generation
Once the source code is analyzed syntactically and semantically the next step is to generate the intermediate code of the source program. The intermediate code is just like a program for an abstract machine.
The intermediate code has several properties as discussed below:
- It must be easy to generate.
- It must be easy to translate it into a target code.
- Each instruction of the intermediate code must have at most one operator apart from the assignment operator. This helps the compiler determine the order in which the operations must be executed.
- The compiler has to generate temporary variables to store the intermediate values computed at each instruction.
- There can be some instructions in the intermediate code that has less than three operands.
Thus, the intermediate code generation phase produces the intermediate representation of the source program, handles the flow of the program and also handles the procedure calls.
7. Code Optimization
In the code optimization phase, the intermediate code is optimized to make the execution of the target code even faster. During optimization, the meaning of the program does not change. Although the amount of code that has been optimized varies from compiler to compiler.
Note: The code optimization reduces the execution time of the target program but it must not increase the compilation time thereby slowing down the compiler.
8. Code Generation
The code generation phase is the last phase of the compilation. In this phase, a relocatable machine code or an assembly code is generated which is equivalent to the source program.
The most important part of the code generation phase is to select the memory locations i.e. registers as we require them to assign variables that we use in the program.
Types of Compiler
Though there are different types of compilers in computers, we will be discussing three kinds of compilers in this context.
1. Single-Pass Compiler
As we all are aware that a compiler has six phases lexical analysis phase, syntax analysis, semantic analysis, intermediate code generation phase, code optimization and code generation phase. So, if all these phases are implemented in one single module then we refer to it as a single-pass compiler.
- The single-pass takes more space as in main memory as all its phases are implemented in one single module.
- As all the phases are implemented in one module the single-pass compiler compiles faster as it does not have to produce intermediate codes.
Example: Pascal compiler
2. Two-Pass Compiler
In a two-pass compiler, the six phases of the compiler are implemented in two modules.
- At a time only one module of a two-pass compiler is placed in the main memory and the source program is processed through this module (first module).
- The intermediate code generated by the first module is stored and the first module is replaced by the subsequent second module of the compiler in the main memory.
- Now the intermediate code generated by the first module is provided as an input to the second module which in turn generates the target program.
So, in this way the target code is generated in two passes, that’s why it is called a two-pass compiler.
3. Multi-Pass Compiler
In a multi-pass compiler, the phases of the compiler are implemented with multiple modules.
- At a time only one module is placed in the main memory and the intermediate code generated by that module is stored and then the module is replaced by is its subsequent module.
- The intermediate code generated by the previous module is provided as an input to the subsequent module.
- This process goes on until the target program is generated.
- The multi-pass compiler is slower as compared to single-pass but it consumes less space in main memory as only one module of it is placed in main memory at a time.
Advantages of a Compiler
- The compiler reports the syntax and semantic error present in the code.
- The compiler determines the memory locations to store variables in the program and intermediate results that will be generated during program execution.
- Compiler optimized the source code in order to execute the program even faster.
- The compiler also determines the flow of operations during program execution.
Key Takeaways
- A compiler transforms a program from one language (source language ) to other (target language).
- The compiler detects the error of the source program.
- The compiler optimizes the source program for faster execution.
- The compiler generates a relocatable target program.
- There are six phases of a compiler lexical analysis phase, syntax analysis, semantic analysis, intermediate code generation phase, code optimization and code generation phase.
- A compiler can be a single-pass compiler, two-pass compiler or a multipass compiler.
So, this was the basic discussion about what a compiler in a computer is? What is it used for? How doe it works? What are the types of compilers? We will be discussing more on compilers in our future content.
Leave a Reply