- A Compiler is a Language Translator that takes as input source program and generates object program.
- Runs on one machine and generates assembly code for another machine.
- C-compiler works in 2 passes where pass 1 performs Analysis and pass 2 performs synthesis.
- C- Compiler is written 80% in C-Language and 20 % in ALP Language.
- Writing Compiler in its own language is called as boot strapping.
- C-Compiler uses combination of recursive recent parser and operator precedence parser.
Compilation Phases and Passes
- Compilation of a program proceeds through a fixed series of phases
- Each phase use an (intermediate) form of the program produced by an earlier phase
- Subsequent phases operate on lower-level code representations
- Each phase may consist of a number of passes over the program representation
- Pascal, FORTRAN, C languages designed for one-pass compilation, which explains the need for function prototypes
- Single-pass compilers need less memory to operate
- Java and ADA are multi-pass
Structure of compiler and phases of compiler
- The first phase of scanner works as a text scanner.
- This phase scans the source code as a stream of characters and converts it into meaningful lexemes.
- This phase takes as input source program and if elements in the program are correct, it generates some meaning full units.
- Lexical analyzer represents these lexemes in the form of tokens.
- Example tokens are identifiers, constants, operators, etc.
- The next phase is called the syntax analysis or parsing.
- It takes the token produced by lexical analysis as input and generates a parse tree (or syntax tree).
- In this phase, token arrangements are checked against the source code grammar, i.e. the parser checks if the expression made by the tokens is syntactically correct.
- Semantic analysis checks whether the parse tree constructed follows the rules of language.
- For example, assignment of values is between compatible data types, and adding string to an integer.
- Also, the semantic analyzer keeps track of identifiers, their types and expressions; whether identifiers are declared before use or not etc. The semantic analyzer produces an annotated syntax tree as an output.
Intermediate Code Generation
- After semantic analysis the compiler generates an intermediate code of the source code for the target machine.
- It represents a program for some abstract machine.
- It is in between the high-level language and the machine language.
- This intermediate code should be generated in such a way that it makes it easier to be translated into the target machine code.
- The next phase does code optimization of the intermediate code.
- Optimization can be assumed as something that removes unnecessary code lines, and arranges the sequence of statements in order to speed up the program execution without wasting resources (CPU, memory).
- In this phase, the code generator takes the optimized representation of the intermediate code and maps it to the target machine language.
- The code generator translates the intermediate code into a sequence of (generally) re-locatable machine code.
- Sequence of instructions of machine code performs the task as the intermediate code would do.
- It is a data-structure maintained throughout all the phases of a compiler.
- All the identifier's names along with their types are stored here.
- The symbol table makes it easier for the compiler to quickly search the identifier record and retrieve it.
- The symbol table is also used for scope management.
- It is responsible for handling of the errors which can occur in any of the compilation.