What is Intermediate Code Generation? Definition, Three-Address Code, Types, Benefits

Intermediate code generation is a phase in the compiler. In our earlier content ‘Compiler in Computer’, we have discussed Intermediate code generation. It gets its input from the semantic analysis phase and serves its output to the code optimizer phase.

The intermediate code generator generates some intermediate representation. And from this intermediate representation, the compiler generates the target code.

What is Intermediate Code Generation?

As we know the task of the compiler is to process the source program and translate it into a target program. Well, in this process a compiler may generate one or more intermediate representations. This representation can have various forms.

The intermediate representation must be:

Easy to produce.
Easy to translate into target code.

The intermediate representation can be of various forms. The two most important kinds of intermediate representations are:

Tree i.e. “parse trees” and “syntax tree”.
A Linear representation i.e., “three address code”.

We have discussed “trees” briefly in our previous content ‘Syntax Directed Translation’. In this context, we will focus on the linear representation code i.e. “three address codes”.

Logical Structure of Compiler

The logical structure of the compiler has two ends i.e., the front end and a back end. The front end generates the intermediate representation of the source program. And this intermediate representation helps the back end in generating the target program.

Intermediate Code Generation

The frontend end of the compiler includes:

Parser,
Static checker
Intermediate Code Generator

Earlier we have discussed parser and types of parsing i.e. top-up parsing and bottom parsing. The parser parses the input string and generates the parse tree or the syntax tree. The nodes of the syntax tree represent the operations. And the leaves represent the operands.

As the parsing proceeds some information keeps on attaching to these nodes. And we refer to it as an annotated parse tree.

Static Checker

Static checking confirms that the compiler can compile the program successfully. It identifies the programming errors earlier. This helps the programmer to rectify the error before a program executes. Static checking performs two types of checking:

Syntactic Checking
This kind of checking identifies the syntactic errors present in the program. The compiler examines the sequence of elements of the program. And assures that this sequence forms a valid statement in the source program.
Type Checking
It checks the operations present in the program. And assures that it respect the type system of the source language. And if this is not the case the compiler performs the type conversion.

- Coercion
  In coercion, the type of operands converts according to the type of operator. For example, consider the expression 2 * 3.14. Now 2 is an integer and 3.14 is a floating-point number. The coercion specified by the language converts integer 2 to floating-point 2.0. Now both the operands are floating-point, the compiler will perform the floating-point operation. This operation will provide a floating-point resultant.

- Overloading
  We have studied the concept of overloading in Java. For example, the operator ‘+’ if applied to the integer performs the addition of two integers. And if applied to the string performs concatenation of the two strings.
  Thus, the meaning of the operator changes according to the type of operands specified. In the expression x + y is even if one operand is an integer and the other is a string then there occurs a type error.

Three-Address Code

The three-address code is also a form of intermediate representation like a syntax tree. The parser generates a syntax tree. And the intermediated code generator generates the three-address code. This intermediate representation is a sequence of assembly-like instructions.

Note: It is possible for the compiler to construct a syntax tree and three-address code parallelly.

Three-Address Instructions

Three-address instruction has three operands per instruction. And each operand act like a register.

The three-address assignment instruction must have at least one operand on the right side of the instruction.
To hold the value computed by the three-address code instruction, the compiler generates a temporary name.
The three-address instructions may have less than three operands.

The three-address instruction is of the form:

x = y op z

Here,

y and z are the operands.
op is the operator.
x can be a name or a compiler-generated temporary name (location) that will hold the result.

The three-address instructions are always executed sequentially. They can even perform the conditional or unconditional jump.

Address and Instruction

The three-address code is based on two concepts addresses and the instruction.

Address:

Name: A name specifies the identity of a variable in the instructions or even the name of the source program. In the three-address code, the name in the instruction appears as the address.
Constant: There can be different types of constants that the compiler deal with. The compiler performs the necessary type of conversion when required.
Compiler-generated temporary: t is a name generated by the compiler. The compiler generates it each time there is a need for a temporary variable. This address stores the result or a value.

Instructions:

Assignment Instructions
- x = y op z, here the x, y, and z are addresses and op can be an arithmetic or logical binary operator.|
- x = op y, where x and y are the addresses and op is a unary operator.
Copy Instruction
The copy instruction x = y, where x and y are addresses and the value at the address of y is set at the address of x.
Jump Instruction
- The conditional jump, if x goto L and ifFalse x goto K. Here the instruction with label L is executed if x is true. And if x is false the instruction with label K is executed.
- The conditional jump if x relop y goto L, here relop is the relational operator. And if x stands by the relational operator then the instruction with label L is executed. If not, the instruction next to this conditional instruction is operated.
- Unconditional jump goto L, here the instruction with label L is executed next.

Advanced Instructions:

Procedure Call
The instruction of the form y = call p, n make a procedure or function call. Here p is the name of the procedure and n is the integer that indicates the number of parameters passed. The instruction of the form ‘return y’ here y indicates the returned value.
Indexed Copy Instruction
The instruction of form x = y[i], where the value at the location ith memory unit beyond the location y is set in x. In the instruction x[i] = y, the value of y is set to the ith memory location beyond the location of x.
Address and Pointer Assignment
In the instruction of form x = &y, here the L value of y is set to x. The instruction x = *y, here the r-value of y holds L value of a location say z. And the r-value of z is set to x. The instruction *x = y sets the r-value of y to the location pointed by x.

Types of Intermediate Code

We can classify the intermediate representation into two types:

High-Level Intermediate Representation
This representation is the very first intermediate representation of the source language. The syntax tree generated by the parser is the high-level representation.
The high-level intermediate representation provides a hierarchical structure of the source program. This hierarchical structure helps the task such as a static check.
Low-Level Intermediate Representation
After high-level intermediate representation, the next intermediate representation is low-level intermediate representation. The intermediate code generation phase generates the low-level intermediate representation.
The three-address code is a low-level representation. The three-address code is suitable for machine-dependent tasks. Like register allocation and instruction selection.

Benefits of Intermediate Code Generation

Consider that a compiler translates a source program directly into a target program. And it doesn’t generate an intermediate code between. Here we can execute the generated target code only on the machine on which the compiler was executed.
In this case, we need a native compiler for each of the new machines.
The intermediate code wipes out the need for the full native compiler for each machine.

So, this is all about intermediate code generation. We have discussed from where the intermediate code generation phase gets its input. And what output it generates. Further, we have also discussed the types of intermediate representation. We have concluded the topic the benefits of the intermediate code generator.