Computers/Programming Language

ch4. Lexical and syntax analysis

emzei 2012. 3. 15. 14:13

▷ PL에 대해 접근하는 방식

♠ Compilation

- use complier / 고급언어를 기계언어로 translate해줌

- PLs used for large application, such as C++, COBOL

♠ Pure interpreter

- no translate

- usually used for smaller systems in which execution efficiency is not critical, such as scripts embedded in HTML, written in languages such as JavaScript

♠ Hybrid System

- translate programs written in high-level languages into intermediate forms, which are interpreted. 

- JIT(just-in-time) compiler가 전파되면서 실행속도 개선.. Java와 .NET

- JIT컴파일러는 하이브리드 시스템을 지연된 컴파일러 시스템으로 변환시킴.


♠ Formal description of PL syntax - Context-free Grammar (type-2), BNF

* Advantage of BNF

- clear, concise

- used as the direct basis for the syntax analyzer

- easy to maintain because of their modularity

♠ compiler는 문법 분석을 2가지로 나눠서 함 : lexical analysis, syntax analysis

* lexical analysis : small-scale language construct 다룸. such as names & numeric literals.

* syntax analysis : large-scale construct. such as expressions, statements, and program units.


♣ 2가지로 나눈 이유 : simplicity / efficiency / portability

- efficiency : lexical한건 device로부터 읽어와서 블라블라~ 속도 느림.

syntax하나 건 계산만 바로하면 되니까 속도 빠름.

- portability : reuse!!!


 Lexical Analysis : essentially a pattern matcher

※ pattern matcher attempts to find a substring of a given string of characters that matches a given character pattern : 즉 서브스트링 찾는거...

- 초기의 패턴 매처는 유닉스에 도입된 초기 버전의 텍스트 에디터.


♠ serve as the front end of a syntax analyzer

  - syntax analysis at the lowest level of program structure.

  - lexeme : collect logical groupings

  - token : assign internal codes for categories of these groupings

  - skipping comments and blanks (프로그램의 의미와는 전혀 상관없으니까 무시)

  - insert lexems for user-defined names into the symbol table (컴파일러가 나중에 프레이즈 하게...)

  - detect syntactic errors & report errors to the user


◈ 3 approches to building a lexical analyzer

formal description of the token patterns of the language 

(↑↑ regular expression)

- state transition diagram (state diagram)




: directed graph.

: if/switch statement-based program

: table-driven program


◈ state-diagram representation : finite automata 

(lexical analyzer를 사용한 형식의 state-diagram이 수학 머신의 클래스로 다시 표현된거)

- finite automata can be designed to recognize a class of languagese 

=> regular language

- regular expressions

- regular grammar (type-3 grammar)


The Parsing Problem: syntax analysis

♠ Construct parse trees for given program - include all of the syntactic info.

♠ 2 Goals of syntax analysis

> to check whether the input program is syntactically correct

> to produce either a complete parse tree, or trace the structure of the complete parse tree


♠ Parsing algorithms never look ahead more than one token into the input program : sufficient to allow parsing : 토큰은 하나씩 읽는것 만으로 충분

♠  According to building direction...

◑Top-down

 - trace/build a parse tree in pre-order ⇔ leftmost

 - parser's task : to find the next sentential form in that leftmost derivation

* sentential form : xAα (x:terminal/A:non-term/α:mixing string)

* Next choosing correct grammar rule is that has A as its LHS

 - under one toen look ahead 

* must choose the correct RHS on the basis of the next token of the input

  LL Parser : Table-driven parsing

 ♥ Recursive-descent parser 

  : Syntax Diagram (함부로 그리는게 아니얌! 규칙이 이쒀!!!)

  : if-then-else programming technique

  : restrictions in syntax diagram


◑Bottom-up

 - Rightmost derivation

 - Given a right sentential form α,

 - LR algorithm (Left-to-right scan, Rightmost derivation)


▷ Recursive descent parsing

 Recursive-descent parsing process

- collection of subprograms

- recursive (nested) calls

- produce a parse tree in top-down manner

♠ EBNF representation

♠ Parser has a subprogram for each nonterminal in the grammar



▷ LL Grammar Class : using symbol table

 Problem 1 : left recursion 

ex. A:=A+B 하면 recursion 발생하니까

left recursion removal ::: 책참고!!!.

♠ Problem 2 : to choose the correct RHS (PASS)

 LL Parser 따라 풀어보기!


▷ LR Parser : Bottom-up Parsing/Shift-Reduce Algorithm (Rightmost derivation)

♠ Reverse of a right-most derivation

- starts with the last sentential form (input symbol)

- until all that remains is the start symbol

♠ Task is

- To find the specific RHS in the sentential form (handle)

♠ Shift-reduce

* shift action: move the next input token onto the parser's stack

* reduce action : replaces a RHS on top of the parser's stack by its corresponding LHS

♠ Pushdown automata (PDA)

♠ LR parser : use a relatively small amount of code and a parsing table

 LR Parser 따라 풀어보기!


'Computers > Programming Language' 카테고리의 다른 글

ch6-2. Data Type  (0) 2012.03.19
ch6-1. Data Type  (0) 2012.03.17
ch5. Names, bindings, type checking, and scope  (0) 2012.03.16
ch3. describe syntax and semantics  (0) 2012.03.14
ch1. preliminaries  (0) 2012.03.13