CoursifyCoursify

Lexical Analyzer Output in Compiler Design

Lexical Analyzer Output in Compiler Design

Verified Sources
May 20, 2026

In compiler design, the correct answer is (iii) a stream of tokens. A lexical analyzer, also called a scanner, reads the source program as a stream of characters and groups them into meaningful units called tokens. These tokens are then passed to the parser for syntax analysis.3

This means the lexical analyzer does not directly produce machine code, intermediate code, or a parse tree:

  • Machine code is produced much later, during code generation.
  • Intermediate code is generated in later compilation stages after syntax and semantic processing.
  • Parse trees are constructed by the syntax analyzer, not the lexical analyzer.2

A compact way to view the pipeline is:

For a source statement such as int sum = 10 + 5;, the lexical analyzer identifies token classes like INT, ID, EQ, NUM, PLUS, and SEMICOLON rather than building grammatical structure itself.2

Footnotes

  1. Token, Patterns, and Lexemes - GeeksforGeeks - Explains that lexical analysis outputs a sequence of tokens for the syntax analyzer.

  2. Introduction of Lexical Analysis - GeeksforGeeks - Describes lexical analysis as the first compiler phase that converts source code into tokens.

  3. Lesson 16: Compiler Design and Parsing | BTU - States that the lexer breaks source into tokens and the parser builds the parse tree/AST. 2 3

  4. Analysis of the source program, Phases of a compiler, Grouping of ... - Distinguishes token generation, parse trees, intermediate code generation, and final code generation. 2

  5. Compiler Design - Syntax Analysis - TutorialsPoint - Notes that the parser takes token streams as input and outputs a parse tree. 2

Lexical Analyzer – Tokenization

Correct Option

The lexical analyzer outputs a stream of tokens, so the correct answer is option (iii).2

Footnotes

  1. Token, Patterns, and Lexemes - GeeksforGeeks - Explains that lexical analysis outputs a sequence of tokens for the syntax analyzer.

  2. Introduction of Lexical Analysis - GeeksforGeeks - Describes lexical analysis as the first compiler phase that converts source code into tokens.

To understand why option (iii) is correct, it helps to distinguish among four compiler outputs often confused in exam questions:

OptionProduced by lexical analyzer?Actual producerWhy
Machine codeNoCode generatorFinal low-level target output appears near the end of compilation.
Intermediate codeNoIntermediate code generation phaseGenerated after syntax/semantic understanding of the program.
Stream of tokensYesLexical analyzerMain output of scanning/tokenization.3
Parse treeNoSyntax analyzer/parserBuilt from token stream using grammar rules.2

The lexeme is the literal text found in the source program, while the token is its category. For example, in count = count + 1;, the lexemes are count, =, count, +, 1, ;, but the token stream may be represented as ID ASSIGN ID PLUS NUM SEMICOLON.2

This abstraction is important because syntax analysis usually works on token categories rather than raw character sequences.2

Footnotes

  1. Analysis of the source program, Phases of a compiler, Grouping of ... - Distinguishes token generation, parse trees, intermediate code generation, and final code generation. 2

  2. Token, Patterns, and Lexemes - GeeksforGeeks - Explains that lexical analysis outputs a sequence of tokens for the syntax analyzer. 2 3

  3. Introduction of Lexical Analysis - GeeksforGeeks - Describes lexical analysis as the first compiler phase that converts source code into tokens.

  4. Lesson 16: Compiler Design and Parsing | BTU - States that the lexer breaks source into tokens and the parser builds the parse tree/AST. 2

  5. Compiler Design - Syntax Analysis - TutorialsPoint - Notes that the parser takes token streams as input and outputs a parse tree.

  6. Lexical analysis - Wikipedia - Summarizes tokenization, lexemes, token categories, and the role of the lexer in compiler pipelines. 2

How Lexical Analysis Produces Output

  1. 1
    Step 1

    The analyzer scans the source program from left to right as a character stream.2

    Footnotes

    1. Introduction of Lexical Analysis - GeeksforGeeks - Describes lexical analysis as the first compiler phase that converts source code into tokens.

    2. Lesson 16: Compiler Design and Parsing | BTU - States that the lexer breaks source into tokens and the parser builds the parse tree/AST.

  2. 2
    Step 2

    Character sequences matching language rules are identified as meaningful units such as keywords, identifiers, literals, and operators.2

    Footnotes

    1. Token, Patterns, and Lexemes - GeeksforGeeks - Explains that lexical analysis outputs a sequence of tokens for the syntax analyzer.

    2. Lesson 16: Compiler Design and Parsing | BTU - States that the lexer breaks source into tokens and the parser builds the parse tree/AST.

  3. 3
    Step 3

    Each recognized lexeme is assigned a token class, often with an attribute value such as a symbol-table pointer.2

    Footnotes

    1. Lesson 16: Compiler Design and Parsing | BTU - States that the lexer breaks source into tokens and the parser builds the parse tree/AST.

    2. Lexical analysis - Wikipedia - Summarizes tokenization, lexemes, token categories, and the role of the lexer in compiler pipelines.

  4. 4
    Step 4

    Whitespace and comments are typically ignored unless the language treats them as significant.2

    Footnotes

    1. Token, Patterns, and Lexemes - GeeksforGeeks - Explains that lexical analysis outputs a sequence of tokens for the syntax analyzer.

    2. Introduction of Lexical Analysis - GeeksforGeeks - Describes lexical analysis as the first compiler phase that converts source code into tokens.

  5. 5
    Step 5

    The final output is a token stream consumed by syntax analysis to check grammatical structure.3

    Footnotes

    1. Introduction of Lexical Analysis - GeeksforGeeks - Describes lexical analysis as the first compiler phase that converts source code into tokens.

    2. Lesson 16: Compiler Design and Parsing | BTU - States that the lexer breaks source into tokens and the parser builds the parse tree/AST.

    3. Compiler Design - Syntax Analysis - TutorialsPoint - Notes that the parser takes token streams as input and outputs a parse tree.

Common Exam Trap

Students often confuse the lexical analyzer with the parser. The lexer creates tokens; the parser creates parse trees or syntax trees.2

Footnotes

  1. Lesson 16: Compiler Design and Parsing | BTU - States that the lexer breaks source into tokens and the parser builds the parse tree/AST.

  2. Compiler Design - Syntax Analysis - TutorialsPoint - Notes that the parser takes token streams as input and outputs a parse tree.

A useful conceptual distinction is:

  • Lexical analysis answers: “What basic units are present?”
  • Syntax analysis answers: “How are those units structured according to grammar?”2

For example, consider the source line:

1position = initial + rate * 60

The lexical analyzer may produce a stream such as:

1<ID, position> <ASSIGN, => <ID, initial> <PLUS, +> <ID, rate> <MULT, *> <NUM, 60>

At this stage, there is no tree showing operator precedence or hierarchical structure. That structural interpretation belongs to parsing.2

This division of labor improves compiler modularity and efficiency. Regular expressions and finite automata are well-suited to token recognition, while context-free grammars and parsing algorithms are used for syntactic structure.3

Footnotes

  1. Lesson 16: Compiler Design and Parsing | BTU - States that the lexer breaks source into tokens and the parser builds the parse tree/AST. 2

  2. Compiler Design - Syntax Analysis - TutorialsPoint - Notes that the parser takes token streams as input and outputs a parse tree. 2 3

  3. Introduction of Lexical Analysis - GeeksforGeeks - Describes lexical analysis as the first compiler phase that converts source code into tokens.

  4. Lexical analysis - Wikipedia - Summarizes tokenization, lexemes, token categories, and the role of the lexer in compiler pipelines.

A lexical analyzer directly creates a parse tree. This is incorrect because grammatical structure is handled by the parser, not the scanner.

Footnotes

  1. Compiler Design - Syntax Analysis - TutorialsPoint - Notes that the parser takes token streams as input and outputs a parse tree.

Compiler Responsibility by Phase

Which phase is responsible for which output artifact

Clarifications and Exam-Focused Notes

Final Answer

The correct answer is:

(iii) a stream of tokens\boxed{\text{(iii) a stream of tokens}}

This is the standard output of the lexical analysis phase in a compiler.3

Footnotes

  1. Token, Patterns, and Lexemes - GeeksforGeeks - Explains that lexical analysis outputs a sequence of tokens for the syntax analyzer.

  2. Introduction of Lexical Analysis - GeeksforGeeks - Describes lexical analysis as the first compiler phase that converts source code into tokens.

  3. Lesson 16: Compiler Design and Parsing | BTU - States that the lexer breaks source into tokens and the parser builds the parse tree/AST.

Knowledge Check

Question 1 of 4
Q1Single choice

What is the primary output of a lexical analyzer?

Explore Related Topics

1

Which Automaton Accepts Regular Languages? The Correct Answer Is DFA

The deterministic finite automaton (DFA) is the canonical model that exactly accepts regular languages, whereas PDA, LBA, and Turing machines recognize strictly larger language families.

  • A language is regular iff some DFA M=(Q,Σ,δ,q0,F)M=(Q,\Sigma,\delta,q_0,F) accepts it: L(M)={wΣδ(q0,w)F}L(M)=\{w\in\Sigma^* \mid \delta^*(q_0,w)\in F\}.
  • DFA ↔ regular languages; PDA ↔ context‑free; LBA ↔ context‑sensitive; Turing machine ↔ recursively enumerable.
  • Regular languages form the base of the hierarchy: RegularContext‑FreeContext‑SensitiveRecursively Enumerable\text{Regular} \subset \text{Context‑Free} \subset \text{Context‑Sensitive} \subset \text{Recursively Enumerable}.
  • DFA’s finite memory limits it to patterns like “ends with 01” or “even number of 1’s”, but it cannot handle unbounded counting such as {anbnn0}\{a^n b^n\mid n\ge0\}.
2

Understanding the MCQ: Compiler, Interpreter, Loader/Linker, or None?

The MCQ conflates formal‑machine concepts (a Turing‑like Machine MM with an unbounded tape) with programming‑language tools, making “None of the mentioned” the only academically correct choice.

  • An infinite tape is a modeling assumption; any actual computation uses only a finite portion.
  • Compilers translate whole programs, interpreters execute statements incrementally, and loaders/linkers build/run executables—they do not bound the tape.
  • An “infinite language” is a set of strings, not a single infinite input to MM.
  • The correct answer is (iv) None of the mentioned.\,\boxed{\text{(iv) None of the mentioned}}\,.
  • In exams, identify domain mismatches and choose the option that rejects the inconsistency.
3

Lexical Analysis and the Main Structure Used: Finite Automata

Lexical analysis relies on finite automata—typically deterministic finite automata (DFA)—to recognize token patterns defined by regular expressions.

  • Regular expressions for identifiers, numbers, etc., are converted to NFAs then to a DFA for fast scanning.
  • The DFA processes the source character by character, tracking a single current state and emitting a token at each accepting state.
  • Queues, stacks, and trees support other compiler phases (parsing, AST construction) but are not the primary model for token recognition.
  • Lexers output a stream of tokens that the parser consumes for syntax analysis.
Chat with Kiro