Compiler Lecture Note, Inroduction to FL theory 컴파일러 입문 제 2 장 형식 언어
차 례 2.1 언어(Language) 2.2 문법(Grammar) 2.3 문법의 분류 Compiler Lecture Note, Inroduction to FL theory 차 례 2.1 언어(Language) 2.2 문법(Grammar) 2.3 문법의 분류
Language ◈ Basic definitions (1) alphabet - a finite set of symbols. Compiler Lecture Note, Inroduction to FL theory Language ◈ Basic definitions (1) alphabet - a finite set of symbols. - ex) T1 = {ㄱ,ㄴ,ㄷ,...,ㅎ,ㅏ,ㅑ, … ,ㅡ,ㅣ} T2 = {A,B,C, … ,Z} T3 = {begin,integer, … ,end} (2) string(or sentence, word) - a sequence of symbols from some alphabet T. (3) length - the number of symbols in the string. - denoted by |ω| 꼭 기억해야 할 세 가지 개념 1. 언어의 정의 2. 문법의 정의 및 개념 3. 인식기의 의미
- a string consisting of no symbols. - denoted by ε or λ. Compiler Lecture Note, Inroduction to FL theory (4) empty string - a string consisting of no symbols. - denoted by ε or λ. (5) T* denotes the set of all strings of symbols over the alphabet T, including the empty string. T+ = T* - {ε} ☞ T* : T star T+ : T dagger (6) Language is any set of strings over an alphabet.(Text p.42) (or A Language L over the alphabet T is a subset of T*.) L ⊆ T*
◈ Two problems 1) How do we represent a language ? Compiler Lecture Note, Inroduction to FL theory ◈ Two problems 1) How do we represent a language ? If the language is finite, the answer is easy. If the language is infinite, we are faced with the problem of finding a finite representation for the language. - Set description - Grammar : Generating Scheme - Recognizer : Recognition Scheme 2) Does there exist a finite representation for every language ? No ! - This is not always possible.
◈ More definitions (1) concatenation Compiler Lecture Note, Inroduction to FL theory ◈ More definitions (1) concatenation - u = a1a2a3...an, v = b1b2b3...bm , u • v = a1a2a3...anb1b2b3...bm - u • v를 보통 uv로 표기. - uε= u = εu - ∀u,v ∈ T*, uv ∈ T*. - |uv| = |u| + |v| (2) an represents n a's. a0 = ε (3) the reversal of a string ω, denoted ωR is the string ω written in reverse order: i.e., if ω = a1a2...an then ωR = anan-1...a1. Text p.40 (ωR)R=ω
(5) The powers of a language L are defined recursively by: L0 = {ε} Compiler Lecture Note, Inroduction to FL theory (4) language product LL' = {xy| x ∈ L and y ∈ L'} (5) The powers of a language L are defined recursively by: L0 = {ε} Ln = LLn-1 for n 1. (6) L* : reflexive transitive closure = L0 ∪ L1 ∪ L2 ∪ ...∪ Ln ∪… = (7) L+ : transitive closure = L1 ∪ L2 ∪... ∪ Ln ∪ ... = L* - L0
Grammar ◈ Language ◈ Grammar 문장(sentence)들을 원소로 갖는 집합 Compiler Lecture Note, Inroduction to FL theory Grammar ◈ Language 문장(sentence)들을 원소로 갖는 집합 언어를 어떻게 표현할 것인가 ? ◈ Grammar terminal : 정의된 언어의 알파벳 nonterminal : 스트링을 생성하는 데 사용되는 중간 과정의 심볼 언어의 구조를 정의하는데 사용 grammar symbol (V)
VN : a finite set of nonterminal symbols Compiler Lecture Note, Inroduction to FL theory ◈ G = (VN, VT, P, S) Text p.45 VN : a finite set of nonterminal symbols VT : a finite set of terminal symbols VN VT = , VN∪ VT = V P : a finite set of production rules α β, α∈ V+, β∈ V* lhs rhs S : start symbol(sentence symbol)
◈ [예] G = ( {S, A}, {a, b}, P, S ) Text p.45 [예 8] Compiler Lecture Note, Inroduction to FL theory ◈ [예] G = ( {S, A}, {a, b}, P, S ) Text p.45 [예 8] P : S aAS S a A SbA A ba A SS => S aAS | a A SbA | ba | SS
◈ Derivation 1. => : "directly produce" or "directly derive" Compiler Lecture Note, Inroduction to FL theory ◈ Derivation 1. => : "directly produce" or "directly derive" if α β∈ P and , δ∈ V* then αδ => βδ 2. => * : Suppose α1,α2,...,αn ∈ V* and α1 =>α2 => … => αn, then α1 =>* αn (zero or more derivations) 3. => + : one or more derivations. cf) : production rule에서 사용. "may be replaced by" => : derivation할 때 사용한다.
◈ L(G) : Language generated by grammar G Compiler Lecture Note, Inroduction to FL theory ◈ L(G) : Language generated by grammar G L(G) = {ω | S =>* ω, ω ∈ VT*} ☞ ω is a sentential form of G if S =>* ω and ω ∈ V*. ω is a sentence of G if S =>* ω and ω ∈ VT*. P : S aA | bB | ε A bS B aS S =>* abba 유도 과정 S => aA (생성규칙 S aA) => abS (생성규칙 A bS) => abbB (생성규칙 S bB) => abbaS (생성규칙 B aS) => abba (생성규칙 S )
◈ G1 = ( {S}, {a}, P, S ) 을 이용하여 L(G1) P : S a | aS Compiler Lecture Note, Inroduction to FL theory ◈ G1 = ( {S}, {a}, P, S ) 을 이용하여 L(G1) P : S a | aS L (G1) = { an | n 1 } ◈ Language design Text p.49 Grammar Language generation design
◈ G = ( {A, B, C}, {a, b, c}, P, A) P : A abc A aBbc Bb bB Bc Cbcc Compiler Lecture Note, Inroduction to FL theory ◈ G = ( {A, B, C}, {a, b, c}, P, A) P : A abc A aBbc Bb bB Bc Cbcc bC Cb aC aaB aC aa L(G) = { anbncn | n 1 }
(===>) ex1) S 0S1 | 01 ex2) S aSb | c ex3) A aB B bB | b Compiler Lecture Note, Inroduction to FL theory (===>) ex1) S 0S1 | 01 ex2) S aSb | c ex3) A aB B bB | b ex4) A abc A aBbc Bb bB Bc Cbcc bC Cb aC aaB aC aa
◈ Grammar Design L = { an | n 0 }일 때 문법 : A aA | ε Compiler Lecture Note, Inroduction to FL theory ◈ Grammar Design L = { an | n 0 }일 때 문법 : A aA | ε L = { an | n 1 }일 때 문법 : A aA | a Embedded production A aAb ex1) L1 = { anbn | n 0 } ex2) L2 = { 0i1j | i j, i,j 1 } ex3) Constructs of Conventional PL
Compiler Lecture Note, Inroduction to FL theory 1) 파스칼 언어의 상수정의 부분 : 상수정의 부분은 CONST라는 예약어로 시작하며 하나의 상수 정의는 a=b의 형태를 갖는다. 여기서, a는 identifier를 b는 상수를 나타내는 terminal 심벌이다. 상수정의부분은 선택적이며 각각의 상수정의는 ;으로 구분한다. 다음은 상수정의 부분의 예이다. CONST ON = TRUE; OFF = FALSE; EPSILON = 1.0E-10;
※ 문법을 고안할 때, nonterminal의 이름은 구문 구조를 대변할 수 있는 명칭으로 쓰는 것이 바람직하다. Compiler Lecture Note, Inroduction to FL theory 2) C 언어의 정수선언 부분 : 정수선언 부분은 여러 개의 정수선언으로 구성되며 하나의 선언은 int a,a,a;와 같은 형태를 갖는다. 여기서 a는 임의의 identifier를 나타낸다. 그리고 ;으로 각각의 선언을 구분한다. 예를 들어, int i,j; int sum;과 같다. ※ 문법을 고안할 때, nonterminal의 이름은 구문 구조를 대변할 수 있는 명칭으로 쓰는 것이 바람직하다.
◈ In order to prove that a grammar generates a language L Compiler Lecture Note, Inroduction to FL theory ◈ In order to prove that a grammar generates a language L i) Every sentence generated by the grammar is in L. ii) Every string in L can be generated by the grammar. 교과서 54쪽 [예16] G = ( { S } , { ( , ) } , {S -> (S)S |ε}, S ) <=> All strings of balanced parentheses. proof) (=>) Every sentence derivable from S is balanced. (<=) Every balanced string is derivable from S.
(=>) Every sentence derivable from S is balanced. Compiler Lecture Note, Inroduction to FL theory (=>) Every sentence derivable from S is balanced. (i.e., S =>* ω, ω: balanced) By induction on the number of steps in a derivation. i) n = 1 일 때, S => ε, ε is surely balanced. ii) Suppose that all derivations of fewer than n steps produce balanced sentences. iii) Consider a leftmost derivation of exactly n steps. S => (S)S =>* (x)S =>* (x)y By the hypothesis x,y : balanced. Thus (x)y balanced.
(<=) Every balanced string is derivable from S. Compiler Lecture Note, Inroduction to FL theory (<=) Every balanced string is derivable from S. By induction on the length of a string. i) |ω| = 0, S => ε (the empty string is derivable from S.) ii) Assume that every balanced string of length less than 2n is derived from S. iii) Consider a balanced string ω of length 2n. Let (x) : shortest prefix of ω being balanced. Thus ω = (x)y, where x,y : balanced. Since |x|, |y|<2n, they are derivable from S by inductive hypothesis. Thus S => (S)S =>* (x)S =>* (x)y = ω Therefore, (x)y is also derivable from S.
Chomsky Hierarchy ◈ Noam Chomsky Compiler Lecture Note, Inroduction to FL theory Chomsky Hierarchy ◈ Noam Chomsky ◈ According to the form of the productions. α -> β ∈ P - Type 0 : No restrictions(unrestricted grammar) - Type 1 : Context-sensitive grammar(CSG). -> β, | | | β| - Type 2 : Context-free grammar(CFG). A -> , where A : nonterminal, ∈ V*. - Type 3 : Regular grammar(RG). A -> tB or A -> t, (right-linear) A -> Bt or A -> t, (left-linear) where, A, B: nonterminal, t ∈ VT*.
◈ REL (Recursively Enumerable Language) Compiler Lecture Note, Inroduction to FL theory ◈ REL (Recursively Enumerable Language) CSL (Context Sensitive Language) CFL (Context Free Language) RL (Regular Language) ◈ Examples of Formal Language simple matching language : Lm = {anbn | n ≥ 0} CFL double matching language : Ldm = {anbncn | n ≥ 0} CSL mirror image language : Lmi = {ωωR | ω ∈ VT*} CFL palindrome language : Lr = {ω | ω = ωR } CFL parenthesis language : Lp = {ω | ω: balenced parenthesis} CFL
context-sensitive language unrestricted language context-free language Compiler Lecture Note, Inroduction to FL theory ◈ The Chomsky Hierarchy of Languages regular language context-sensitive language unrestricted language context-free language
Grammar Language Recognizer Compiler Lecture Note, Inroduction to FL theory ◈ Languages & Recognizers Grammar Language Recognizer Type 0 (unrestricted) recursively enumerable set Turing machine context-sensitive language linear bounded automata type 1 context-sensitive context-free language pushdown automata type 2 context-free type 3 (regular) regular language finite automata