Phases of Compiler

 Phases of Compiler

Lexical Analysis

The first phase of compiler is also known as Scanner. The scanner works as a text scanner. This phase scans the source code as a stream of characters and converts it into meaningful lexemes. Lexical analyzer represents these lexemes in the form of tokens as:<Token-name, attribute-value>

  • Token: Token is a sequence of characters that represent lexical unit, which matches with the pattern, such as keywords, operators, identifiers etc.
  • Lexeme: Lexeme is instance of a token i.e., group of characters forming a token.
  • Pattern: Pattern describes the rule that the lexemes of a token takes. It is the structure that must be matched by strings.

Once a token is generated the corresponding entry is made in the symbol table.

 

Syntax Analysis

The next phase is called the Syntax Analysis or Parser. It takes the token produced by lexical analysis, as input and generates a parse tree (or syntax tree). In this phase, token arrangements are checked against the source code grammar, i.e., the parser checks if the expression made by the tokens is syntactically correct or not.

Ex. a = b + i can be represented in syntax tree form as 


Semantic Analysis

Semantic analysis checks whether the parse tree constructed thus follows the rules of language. For example, it checks type casting, type conversions issues and so on. Also, the semantic analyzer keeps track of identifiers, their types and expressions; whether identifiers are declared before use or not, etc. The semantic analyzer produces an annotated syntax tree as an output.

Semantic analysis, analyze the syntax tree to identify external evaluations and then apply rules of validity to the elemental evaluations. In this phase, actual analysis is done to determine meaning of statement. The meaning can be determined only if statement is syntactically correct.
In above example, the statement is semantically wrong. So different steps which will take place are :
  1. Convert i to real
  2.  Add to b to get result T(assume) 

Some compilers have the ability to do such conversion automatically.

Intermediate Code Generation

After semantic analysis, the compiler generates an intermediate code of the source code for the target machine. It represents a program for some abstract machine. It is in between the high-level language and the machine language. This intermediate code should be generated in such a way that it makes it easier to be translated into the target machine code. The intermediate code may be a Three Address code or Assembly code. As each statement requires maximum three addresses, so it is called three address code.

Ex. a = d * b + c

Three address code is,

       temp1 = d * b

      temp2 = temp1 + c

      a = temp2

where temp1 and temp2 are compiler generated temporary names.

Code Optimization

The next phase does code optimization, it is an optional phase. Optimization can be assumed as something that removes unnecessary code lines, and arranges the sequence of statements in order to speed up the program execution without wasting resources like CPU, memory. The output of this phase is an optimized intermediate code.

Code Generation

In this phase, the code generator takes the optimized representation of the intermediate code and maps it to the target machine language. The code generator translates the intermediate code into a sequence of re-locatable machine code. Sequence of instructions of machine code performs the task as the intermediate code would do.

Ex. a = b + c     is converted as

LOAD b

ADD c

STORE a

Now if by previous operation b is already loaded then LOAD b is redundant. Such statements can be avoided at code generation stage. To avoid this compiler has to keep track of run time contents of registers.

 Symbol Table

Symbol Table is also known as Book Keeping. It is a data-structure maintained throughout all the phases of a compiler. All the identifiers‟ names along with their information like type, size, etc., are stored here. The symbol table makes it easier for the compiler to quickly search and retrieve the identifier’s record. The symbol table is also used for scope management.

Error Hander

A parser should be able to detect and report any error in the program. It is expected that when an error is encountered, the parser should be able to handle it and do parsing with the rest of the inputs. Mostly it is expected from the parser to check for errors. But errors may be encountered at various stages of the compilation process.

Ex. translation of statement by a compiler :  a := b - c *50