CoursifyCoursify

Syntax-Directed Translation: Infix to Prefix Notation

Syntax-Directed Translation: Infix to Prefix Notation

Verified Sources
May 18, 2026

In compiler design, translating high-level program expressions into machine-friendly intermediate representations is a core task of the front-end parser. One common task is converting infix notation into prefix notation. This conversion is governed by rules of operator precedence and associativity .

To perform this translation systematically, we construct a Syntax-Directed Translation Scheme (SDTS) based on a context-free grammar (CFG) . By defining structural attributes for grammar symbols, we can perform translation in a single pass.

In this section, we will define a robust context-free grammar that handles basic arithmetic operators (++, -, *) while respecting operator precedence. We will establish a Syntax-Directed Definition (SDD) that uses synthesized attributes to construct the prefix string.

Footnotes

  1. Aho, A. V., Lam, M. S., Sethi, R., & Ullman, J. D. (2006). Compilers: Principles, Techniques, and Tools (2nd Edition). Addison-Wesley. (Sections on Syntax-Directed Translation and Parser design).

  2. Cooper, K. D., & Torczon, L. (2011). Engineering a Compiler. Morgan Kaufmann. (Analysis of expression grammars, operator precedence, and syntax trees).

Syntax Directed Translation (SDT) | Scheme & Implementation

The Grammar & Semantic Rules

To handle addition (++), subtraction (-), and multiplication (*), we must design a grammar that naturally implements operator precedence (multiplication has higher precedence than addition and subtraction) and left-associativity (operations of equal precedence are evaluated from left to right) .

We use the standard unambiguous expression grammar:

EE1+TE1TTTT1FFFdigit\begin{aligned} E &\to E_1 + T \mid E_1 - T \mid T \\ T &\to T_1 * F \mid F \\ F &\to \text{digit} \end{aligned}

To translate this grammar into prefix notation, we associate a synthesized string attribute, val, with each non-terminal (EE, TT, and FF). The token digit\text{digit} has a lexical attribute lexval\text{lexval} which represents its character value.

The Syntax-Directed Translation Scheme (SDTS) for this conversion is defined below:

ProductionSemantic Rule / Action
EE1+TE \to E_1 + TE.val=strcat("+",E1.val,"",T.val)E.\text{val} = \text{strcat}("+ ", E_1.\text{val}, " ", T.\text{val})
EE1TE \to E_1 - TE.val=strcat("",E1.val,"",T.val)E.\text{val} = \text{strcat}("- ", E_1.\text{val}, " ", T.\text{val})
ETE \to TE.val=T.valE.\text{val} = T.\text{val}
TT1FT \to T_1 * FT.val=strcat("",T1.val,"",F.val)T.\text{val} = \text{strcat}("* ", T_1.\text{val}, " ", F.\text{val})
TFT \to FT.val=F.valT.\text{val} = F.\text{val}
FdigitF \to \text{digit}F.val=digit.lexvalF.\text{val} = \text{digit}.\text{lexval}

Parse Tree Construction and Evaluation

When a parser processes an input string, it builds a parse tree. In an S-attributed definition like ours, the synthesized attributes are evaluated bottom-up, starting from the leaf nodes (digit\text{digit}) and propagating up to the root (EE) .

Footnotes

  1. Cooper, K. D., & Torczon, L. (2011). Engineering a Compiler. Morgan Kaufmann. (Analysis of expression grammars, operator precedence, and syntax trees).

  2. Grune, D., van Reeuwijk, K., Bal, H. E., Jacobs, C. J., & Langendoen, K. (2012). Modern Compiler Design. Springer. (Detailed explanation of S-attributed and L-attributed Definitions).

Why Use Synthesized Attributes?

Synthesized attributes are perfect for bottom-up parsing algorithms (like LR parsers). Since the attributes of a parent node depend entirely on the attributes of its children, the parser can execute the semantic translation action immediately upon reducing a production rule.

Parsing & Translating 9 - 5 + 2

  1. 1
    Step 1

    The input string 9 - 5 + 2 is tokenized into three digits and two operators: digit1(9)\text{digit}_1 (9), -, digit2(5)\text{digit}_2 (5), ++, and digit3(2)\text{digit}_3 (2).

  2. 2
    Step 2

    First, reduce individual digits to factors (FF) and then terms (TT):

    • F1digit1(9)    F1.val=9F_1 \to \text{digit}_1 (9) \implies F_1.\text{val} = '9'
    • T1F1    T1.val=9T_1 \to F_1 \implies T_1.\text{val} = '9'
    • E1T1    E1.val=9E_1 \to T_1 \implies E_1.\text{val} = '9'
  3. 3
    Step 3

    Next, parse the digit 55:

    • F2digit2(5)    F2.val=5F_2 \to \text{digit}_2 (5) \implies F_2.\text{val} = '5'
    • T2F2    T2.val=5T_2 \to F_2 \implies T_2.\text{val} = '5'

    Apply the reduction E2E1T2E_2 \to E_1 - T_2:

    • E_2.\text{val} = \text{strcat}('- ', '9', ' ', '5') = '- 9 5'
  4. 4
    Step 4

    Parse the final digit 22:

    • F3digit3(2)    F3.val=2F_3 \to \text{digit}_3 (2) \implies F_3.\text{val} = '2'
    • T3F3    T3.val=2T_3 \to F_3 \implies T_3.\text{val} = '2'

    Apply the final reduction at the root EE2+T3E \to E_2 + T_3:

    • E.\text{val} = \text{strcat}('+ ', '- 9 5', ' ', '2') = '+ - 9 5 2'

Parsing & Translating 9 - 5 * 2

  1. 1
    Step 1

    The input is 9 - 5 * 2. Because multiplication (*) has a higher precedence than subtraction (-), the grammar forces 525 * 2 to be grouped together under a term (TT) before resolving the subtraction under an expression (EE).

  2. 2
    Step 2

    Reduce the first digit:

    • F1digit1(9)    F1.val=9F_1 \to \text{digit}_1 (9) \implies F_1.\text{val} = '9'
    • T1F1    T1.val=9T_1 \to F_1 \implies T_1.\text{val} = '9'
    • E1T1    E1.val=9E_1 \to T_1 \implies E_1.\text{val} = '9'
  3. 3
    Step 3

    Reduce the remaining terms:

    • F2digit2(5)    F2.val=5F_2 \to \text{digit}_2 (5) \implies F_2.\text{val} = '5'
    • T2F2    T2.val=5T_2 \to F_2 \implies T_2.\text{val} = '5'
    • F3digit3(2)    F3.val=2F_3 \to \text{digit}_3 (2) \implies F_3.\text{val} = '2'

    Apply the reduction T3T2F3T_3 \to T_2 * F_3:

    • T_3.\text{val} = \text{strcat}('* ', '5', ' ', '2') = '* 5 2'
  4. 4
    Step 4

    Apply the subtraction reduction at the root EE1T3E \to E_1 - T_3:

    • E.\text{val} = \text{strcat}('- ', '9', ' ', '* 5 2') = '- 9 * 5 2'

Annotated Parse Trees

Below are the visual parse trees with annotated synthesized attributes showing exactly how the final prefix string is constructed from bottom-up attribute evaluation .

Annotated Parse Tree for 95+29 - 5 + 2

Annotated Parse Tree for 9529 - 5 * 2

Footnotes

  1. Grune, D., van Reeuwijk, K., Bal, H. E., Jacobs, C. J., & Langendoen, K. (2012). Modern Compiler Design. Springer. (Detailed explanation of S-attributed and L-attributed Definitions).

Left-Recursion and Infix-to-Prefix Translation

While left-recursive grammars are perfectly suited for bottom-up parsers (like LALR/Yacc), they cannot be parsed directly by top-down LL(1) parsers. If you write a top-down translator, you must eliminate left recursion first, which changes the grammar structure and requires using inherited attributes to pass operands down the parse tree.

This tab displays the output generated by our synthesized SDTS:

  • Input: 9 - 5 + 2
    Prefix Output: + - 9 5 2

  • Input: 9 - 5 * 2
    Prefix Output: - 9 * 5 2

In synthesized attributes, strings are accumulated and concatenated as operations are completed up the parse tree.

Parse Tree Nodes and Traversal Step Count

Comparison of node allocations and evaluation overhead between addition/subtraction and multiplication precedence levels.

Deep Dive into Grammar Mechanics

Knowledge Check

Question 1 of 3
Q1Single choice

Which of the following is the correct prefix notation for the infix expression 7 * (3 + 2)?

Explore Related Topics

1

Lexical Analysis and the Main Structure Used: Finite Automata

Lexical analysis relies on finite automata—typically deterministic finite automata (DFA)—to recognize token patterns defined by regular expressions.

  • Regular expressions for identifiers, numbers, etc., are converted to NFAs then to a DFA for fast scanning.
  • The DFA processes the source character by character, tracking a single current state and emitting a token at each accepting state.
  • Queues, stacks, and trees support other compiler phases (parsing, AST construction) but are not the primary model for token recognition.
  • Lexers output a stream of tokens that the parser consumes for syntax analysis.
2

Comparing Imperative and Non-Imperative Programming Languages

Imperative programming describes computation as explicit step‑by‑step commands that change mutable state, while non‑imperative (declarative) styles describe what result is desired, often minimizing direct state manipulation.

  • Imperative languages use assignments, loops, and explicit control flow (e.g., C, Java, Python).
  • Non‑imperative approaches include functional (Haskell), logic (Prolog), and query languages (SQL), focusing on expressions, relations, or rules.
  • Hybrid languages are common; developers mix imperative orchestration with functional cores or declarative queries.
  • Choose imperative when precise sequencing or low‑level control is needed; choose non‑imperative for abstraction, predictability, easier reasoning, and parallelism.
  • Functional code emphasizes immutability and pure functions, while logic code relies on unification and backtracking to infer answers.
3

Converting the Regular Expression $(a+b)^*ab$ into an NFA

The course shows how to turn the regular expression (a+b)ab(a+b)^*ab—the set of all strings over {a,b}\{a,b\} that end with “ab”—into an NFA, first via Thompson’s systematic ε‑construction and then with a minimal three‑state NFA.

  • Thompson’s construction builds ε‑transitions for symbols aa, bb, their union, the Kleene star, and the final concatenations, yielding a 12‑state ε‑NFA.
  • A compact direct NFA uses only three states: q0q_0 loops on a,ba,b, branches on aa to q1q_1, and q1q_1 moves on bb to accepting q2q_2.
  • The compact NFA’s transition table illustrates nondeterministic moves (e.g., δ(q0,a)={q0,q1}δ(q_0,a)=\{q_0,q_1\}) and accepts exactly the strings ending in “ab”.
  • State‑complexity comparison: Thompson’s method needs many states and ε‑moves, while the direct NFA is far smaller and easier to simulate.
Chat with Kiro