Syntax-Directed Translation: Infix to Prefix Notation
In compiler design, translating high-level program expressions into machine-friendly intermediate representations is a core task of the front-end parser. One common task is converting infix notation into prefix notation. This conversion is governed by rules of operator precedence and associativity .
To perform this translation systematically, we construct a Syntax-Directed Translation Scheme (SDTS) based on a context-free grammar (CFG) . By defining structural attributes for grammar symbols, we can perform translation in a single pass.
In this section, we will define a robust context-free grammar that handles basic arithmetic operators (, , ) while respecting operator precedence. We will establish a Syntax-Directed Definition (SDD) that uses synthesized attributes to construct the prefix string.
Footnotes
-
Aho, A. V., Lam, M. S., Sethi, R., & Ullman, J. D. (2006). Compilers: Principles, Techniques, and Tools (2nd Edition). Addison-Wesley. (Sections on Syntax-Directed Translation and Parser design). ↩
-
Cooper, K. D., & Torczon, L. (2011). Engineering a Compiler. Morgan Kaufmann. (Analysis of expression grammars, operator precedence, and syntax trees). ↩
Syntax Directed Translation (SDT) | Scheme & Implementation
The Grammar & Semantic Rules
To handle addition (), subtraction (), and multiplication (), we must design a grammar that naturally implements operator precedence (multiplication has higher precedence than addition and subtraction) and left-associativity (operations of equal precedence are evaluated from left to right) .
We use the standard unambiguous expression grammar:
To translate this grammar into prefix notation, we associate a synthesized string attribute, val, with each non-terminal (, , and ). The token has a lexical attribute which represents its character value.
The Syntax-Directed Translation Scheme (SDTS) for this conversion is defined below:
| Production | Semantic Rule / Action |
|---|---|
Parse Tree Construction and Evaluation
When a parser processes an input string, it builds a parse tree. In an S-attributed definition like ours, the synthesized attributes are evaluated bottom-up, starting from the leaf nodes () and propagating up to the root () .
Footnotes
-
Cooper, K. D., & Torczon, L. (2011). Engineering a Compiler. Morgan Kaufmann. (Analysis of expression grammars, operator precedence, and syntax trees). ↩
-
Grune, D., van Reeuwijk, K., Bal, H. E., Jacobs, C. J., & Langendoen, K. (2012). Modern Compiler Design. Springer. (Detailed explanation of S-attributed and L-attributed Definitions). ↩
Why Use Synthesized Attributes?
Synthesized attributes are perfect for bottom-up parsing algorithms (like LR parsers). Since the attributes of a parent node depend entirely on the attributes of its children, the parser can execute the semantic translation action immediately upon reducing a production rule.
Parsing & Translating 9 - 5 + 2
- 1Step 1
The input string
9 - 5 + 2is tokenized into three digits and two operators: , , , , and . - 2Step 2
First, reduce individual digits to factors () and then terms ():
- 3Step 3
Next, parse the digit :
Apply the reduction :
- E_2.\text{val} = \text{strcat}('- ', '9', ' ', '5') = '- 9 5'
- 4Step 4
Parse the final digit :
Apply the final reduction at the root :
- E.\text{val} = \text{strcat}('+ ', '- 9 5', ' ', '2') = '+ - 9 5 2'
Parsing & Translating 9 - 5 * 2
- 1Step 1
The input is
9 - 5 * 2. Because multiplication () has a higher precedence than subtraction (), the grammar forces to be grouped together under a term () before resolving the subtraction under an expression (). - 2Step 2
Reduce the first digit:
- 3Step 3
Reduce the remaining terms:
Apply the reduction :
- T_3.\text{val} = \text{strcat}('* ', '5', ' ', '2') = '* 5 2'
- 4Step 4
Apply the subtraction reduction at the root :
- E.\text{val} = \text{strcat}('- ', '9', ' ', '* 5 2') = '- 9 * 5 2'
Annotated Parse Trees
Below are the visual parse trees with annotated synthesized attributes showing exactly how the final prefix string is constructed from bottom-up attribute evaluation .
Annotated Parse Tree for
Annotated Parse Tree for
Footnotes
-
Grune, D., van Reeuwijk, K., Bal, H. E., Jacobs, C. J., & Langendoen, K. (2012). Modern Compiler Design. Springer. (Detailed explanation of S-attributed and L-attributed Definitions). ↩
Left-Recursion and Infix-to-Prefix Translation
While left-recursive grammars are perfectly suited for bottom-up parsers (like LALR/Yacc), they cannot be parsed directly by top-down LL(1) parsers. If you write a top-down translator, you must eliminate left recursion first, which changes the grammar structure and requires using inherited attributes to pass operands down the parse tree.
This tab displays the output generated by our synthesized SDTS:
-
Input:
9 - 5 + 2
Prefix Output:+ - 9 5 2 -
Input:
9 - 5 * 2
Prefix Output:- 9 * 5 2
In synthesized attributes, strings are accumulated and concatenated as operations are completed up the parse tree.
Parse Tree Nodes and Traversal Step Count
Comparison of node allocations and evaluation overhead between addition/subtraction and multiplication precedence levels.
Deep Dive into Grammar Mechanics
Knowledge Check
Which of the following is the correct prefix notation for the infix expression 7 * (3 + 2)?
Explore Related Topics
Lexical Analysis and the Main Structure Used: Finite Automata
Lexical analysis relies on finite automata—typically deterministic finite automata (DFA)—to recognize token patterns defined by regular expressions.
- Regular expressions for identifiers, numbers, etc., are converted to NFAs then to a DFA for fast scanning.
- The DFA processes the source character by character, tracking a single current state and emitting a token at each accepting state.
- Queues, stacks, and trees support other compiler phases (parsing, AST construction) but are not the primary model for token recognition.
- Lexers output a stream of tokens that the parser consumes for syntax analysis.
Comparing Imperative and Non-Imperative Programming Languages
Imperative programming describes computation as explicit step‑by‑step commands that change mutable state, while non‑imperative (declarative) styles describe what result is desired, often minimizing direct state manipulation.
- Imperative languages use assignments, loops, and explicit control flow (e.g., C, Java, Python).
- Non‑imperative approaches include functional (Haskell), logic (Prolog), and query languages (SQL), focusing on expressions, relations, or rules.
- Hybrid languages are common; developers mix imperative orchestration with functional cores or declarative queries.
- Choose imperative when precise sequencing or low‑level control is needed; choose non‑imperative for abstraction, predictability, easier reasoning, and parallelism.
- Functional code emphasizes immutability and pure functions, while logic code relies on unification and backtracking to infer answers.
Converting the Regular Expression $(a+b)^*ab$ into an NFA
The course shows how to turn the regular expression —the set of all strings over that end with “ab”—into an NFA, first via Thompson’s systematic ε‑construction and then with a minimal three‑state NFA.
- Thompson’s construction builds ε‑transitions for symbols , , their union, the Kleene star, and the final concatenations, yielding a 12‑state ε‑NFA.
- A compact direct NFA uses only three states: loops on , branches on to , and moves on to accepting .
- The compact NFA’s transition table illustrates nondeterministic moves (e.g., ) and accepts exactly the strings ending in “ab”.
- State‑complexity comparison: Thompson’s method needs many states and ε‑moves, while the direct NFA is far smaller and easier to simulate.
