Compiler Lecture Note, Inroduction to FL theory 컴파일러 입문 제 2 장 형식 언어
차 례 2.1 언어(Language) 2.2 문법(Grammar) 2.3 문법의 분류 Compiler Lecture Note, Inroduction to FL theory 차 례 2.1 언어(Language) 2.2 문법(Grammar) 2.3 문법의 분류
Symbol Properties Examples Compiler Lecture Note, Inroduction to FL theory Symbol A “symbol” is an abstract entity that we shall not define formally. Properties Examples ㄱ,ㄴ,ㄷ,...,ㅎ,ㅏ,ㅑ, … ,ㅡ,ㅣ A,B,C, … ,Z begin,integer, … ,end
Alphabet Def 2.1: Alphabet Properties Examples Compiler Lecture Note, Inroduction to FL theory Alphabet Def 2.1: Alphabet An alphabet T is a finite set of symbols. Properties An alphabet is a set. An alphabet is finite. Examples T1 = {ㄱ,ㄴ,ㄷ,...,ㅎ,ㅏ,ㅑ, … ,ㅡ,ㅣ} T2 = {A,B,C, … ,Z} T3 = {begin,integer, … ,end} T4 = {x | x is an integer}는 알파벳이 아니다. 왜?
String Def 2.2: String Properties Examples Compiler Lecture Note, Inroduction to FL theory String Def 2.2: String A string (or word) over the alphabet T is a finite sequence of symbols from T. Properties A string is not a set. A string is finite. a, b, c represent symbols: u, v, w represent string. For example, w = abba means that the value of the string w is abba. Examples T1 = {0} -> string: 0, 00, 000, … T2 = {a, b}-> string: a, b, aa, ab, ba, bb, aaa, …
Length Def 2.3: length Properties Examples Compiler Lecture Note, Inroduction to FL theory Length Def 2.3: length The length of a string, denoted by |.|, is the number of symbols in the string. Properties Examples w = a1a2a3…ak이면, |w| = k이다.
Concatenation Def 2.4: Concatenation Properties Examples Compiler Lecture Note, Inroduction to FL theory Concatenation Def 2.4: Concatenation The concatenation of two strings is the string formed by writing the first, followed by the second, with no intervening space. Properties u • v를 보통 uv로 표기. u를 prefix, v를 suffix라 부른다. |uv| = |u| + |v| Examples u = a1a2a3...an, v = b1b2b3...bm , u • v = a1a2a3...anb1b2b3...bm
Empty String Def 2.5: empty string Properties Examples Compiler Lecture Note, Inroduction to FL theory Empty String Def 2.5: empty string The empty string e is the identity for the concatenation operator. That is, ew = we = w for each string w. Properties e is a string, not a set. e is a string consisting of no symbols. |e| = 0 Examples
Power & Reversal Properties Def The string an represents n a's. Compiler Lecture Note, Inroduction to FL theory Power & Reversal Def The string an represents n a's. The reversal of a string ω, denoted ωR is the string ω written in reverse order: i.e., if ω = a1a2...an then ωR = anan-1...a1. Properties a0 = ε (ωR) R = ω
T* & T+ Def 2.6: T* (T star), T+ (T dagger) Properties Examples Compiler Lecture Note, Inroduction to FL theory T* & T+ Def 2.6: T* (T star), T+ (T dagger) T* denotes the set of all strings of symbols over the alphabet T, including the empty string. T+ = T* - {ε}. Properties T is a set of symbols, while T* is a set of strings. T* and T+ are infinite sets. ∀u,v ∈ T*, uv ∈ T*. (For every u, v in the set T*, uv is in T*.) T*는 알파벳 T에 속하는 심벌로부터 만들 수 있는 스트림의 전체 집합 (universal set)을 의미한다. Examples ex 4 @ p.41
(Formal) Language Def 2.7: (Formal) Language Properties Examples Compiler Lecture Note, Inroduction to FL theory (Formal) Language Def 2.7: (Formal) Language A (formal) language L over the alphabet T is a subset of T*, that is, L ⊆ T*. Properties 또 다른 정의: Language is any set of strings over an alphabet. 언어는 string의 집합이다. 스트링 수가 유한이면 유한 언어 (finite language), 무한이면 무한언어 (infinite language)라 한다. 여기서 정의한 언어는 의미를 갖고 있지 않다. 형식언어이론은 스트링 집합의 속성에 대한 이론이다. Examples - ex 5 @ p.42
Product of Languages Def 2.8: Product Properties Examples Compiler Lecture Note, Inroduction to FL theory Product of Languages Def 2.8: Product The product of L and L’, denoted LL’, is the set {xy| x ∈ L and y ∈ L'}. Properties 언어의 곱은 교환법칙을 만족하지 않는다. 즉, LL’ ¹ L’L Examples 예 6 @ p. 43
Power of Language Def 2.9: Power Properties Examples Compiler Lecture Note, Inroduction to FL theory Power of Language Def 2.9: Power The powers of L are defined recursively by: L0 = {ε} Ln = LLn-1 for n 1. Properties power는 곱으로 표현된다. Examples 예 7 @ p. 43
Closure Def 2.10: Reflexive transitive closure (or just closure) Compiler Lecture Note, Inroduction to FL theory Closure Def 2.10: Reflexive transitive closure (or just closure) The reflexive transitive closure of language L, denoted L*, is defined by: L* = L0 ∪ L1 ∪ L2 ∪ ...∪ Ln ∪… = Properties L* denotes strings constructed by concatenating any number of words from L. Examples {10, 11}* = {e, 10, 11, 1010, 1011, 1110, 1111, …}
Transitive Closure Def 2.11: Transitive closure Properties Examples Compiler Lecture Note, Inroduction to FL theory Transitive Closure Def 2.11: Transitive closure The transitive closure of language L, denoted L+, is defined by: L+ = L1 ∪ L2 ∪ ...∪ Ln ∪… = Properties L+ contain e iff L does. L+ ¹ L* - L0 Examples
◈ Two problems 1) How do we represent a language ? Compiler Lecture Note, Inroduction to FL theory ◈ Two problems 1) How do we represent a language ? If the language is finite, the answer is easy. If the language is infinite, we are faced with the problem of finding a finite representation for the language. - Set description - Grammar : Generating Scheme - Recognizer : Recognition Scheme 2) Does there exist a finite representation for every language ? No ! - This is not always possible.
Grammar ◈ Language ◈ Grammar 문장(sentence)들을 원소로 갖는 집합 Compiler Lecture Note, Inroduction to FL theory Grammar ◈ Language 문장(sentence)들을 원소로 갖는 집합 언어를 어떻게 표현할 것인가 ? ◈ Grammar terminal : 정의된 언어의 알파벳 nonterminal : 스트링을 생성하는 데 사용되는 중간 과정의 심볼 언어의 구조를 정의하는데 사용 grammar symbol (V)
Grammar Def 2.12: Grammar A grammar is denoted Compiler Lecture Note, Inroduction to FL theory Grammar Def 2.12: Grammar A grammar is denoted G = (VN, VT, P, S), where 1. VN : a finite set of nonterminal symbols 2. VT : a finite set of terminal symbols VN VT = , VN∪ VT = V 3. P : a finite set of production rules α®β, α∈ V+, β∈ V* lhs rhs 4. S : start symbol (sentence symbol)
Grammar Example 8 G = ( {S, A}, {a, b}, P, S ) Text p.45 [예 8] Compiler Lecture Note, Inroduction to FL theory Grammar Example 8 G = ( {S, A}, {a, b}, P, S ) Text p.45 [예 8] P : S ® aAS S ® a A ® SbA A ® ba A ® SS => S ® aAS | a A ® SbA | ba | SS
Grammar Notes on grammar α®β는 a를 b로 바꾼다는 의미이다. Compiler Lecture Note, Inroduction to FL theory Grammar Notes on grammar α®β는 a를 b로 바꾼다는 의미이다. α®β1, α®β2를 α®β1|β2로 표기한다. VN VT = 일 경우 VN 과 VT 은 disjoint라 한다. α∈V+는 α∈(VN∪ VT)+ 을 의미한다. 앞의 예제에서 aAS ∈V+ 이다. Convention A, B, C ∈ VN : Nonterminal symbols a, b, c ∈VT : terminal symbols a, b, g ∈V* w ∈VT*
◈ Derivation 1. => : "directly produce" or "directly derive" Compiler Lecture Note, Inroduction to FL theory ◈ Derivation 1. => : "directly produce" or "directly derive" if α β∈ P and , δ∈ V* then αδ => βδ 2. => * : just “drive” Suppose α1,α2,...,αn ∈ V* and α1 =>α2 => … => αn, then α1 =>* αn (zero or more derivations) 3. => + : one or more derivations. Cf. ® :production rule에서 사용. "may be replaced by" => : derivation할 때 사용한다.
Example P : S ® aA | bB | ε A ® bS B ® aS S =>* abba 유도 과정 Compiler Lecture Note, Inroduction to FL theory Example P : S ® aA | bB | ε A ® bS B ® aS S =>* abba 유도 과정 S => aA (생성규칙 S aA) => abS (생성규칙 A bS) => abbB (생성규칙 S bB) => abbaS (생성규칙 B aS) => abba (생성규칙 S )
Sentential Form & Sentence Compiler Lecture Note, Inroduction to FL theory Sentential Form & Sentence Def 2.14: Sentential Form and Sentence Let S be the start symbol of a grammar G and ω be a string such that S =>* ω. The string ω is called a sentential form if ω ∈ V*, and a sentence if ω ∈ VT*. Properties Both sentences and sentential forms are strings. A sentence is a sentential form solely of terminal symbols.
Language Generated by Grammar Compiler Lecture Note, Inroduction to FL theory Language Generated by Grammar Def 2.15: Language generated by grammar The language generated by G, denoted L[G], is the set: L(G) = {ω | S =>* ω, ω ∈ VT*}. Properties L[G] is the set of sentences. The string of L[G] consists solely of terminal. The string of L[G] can be derived from S. Examples ex 11-16
◈ G1 = ( {S}, {a}, P, S ) 을 이용하여 L(G1) P : S a | aS Compiler Lecture Note, Inroduction to FL theory ◈ G1 = ( {S}, {a}, P, S ) 을 이용하여 L(G1) P : S a | aS L (G1) = { an | n 1 } ◈ Language design Text p.49 Grammar Language generation design
◈ G = ( {A, B, C}, {a, b, c}, P, A) P : A abc A aBbc Bb bB Bc Cbcc Compiler Lecture Note, Inroduction to FL theory ◈ G = ( {A, B, C}, {a, b, c}, P, A) P : A abc A aBbc Bb bB Bc Cbcc bC Cb aC aaB aC aa L(G) = { anbncn | n 1 }
(===>) ex1) S 0S1 | 01 ex2) S aSb | c ex3) A aB B bB | b Compiler Lecture Note, Inroduction to FL theory (===>) ex1) S 0S1 | 01 ex2) S aSb | c ex3) A aB B bB | b ex4) A abc A aBbc Bb bB Bc Cbcc bC Cb aC aaB aC aa
◈ Grammar Design L = { an | n 0 }일 때 문법 : A aA | ε Compiler Lecture Note, Inroduction to FL theory ◈ Grammar Design L = { an | n 0 }일 때 문법 : A aA | ε L = { an | n 1 }일 때 문법 : A aA | a Embedded production A aAb ex1) L1 = { anbn | n 0 } ex2) L2 = { 0i1j | i j, i,j 1 } ex3) Constructs of Conventional PL
Compiler Lecture Note, Inroduction to FL theory 1) 파스칼 언어의 상수정의 부분 : 상수정의 부분은 CONST라는 예약어로 시작하며 하나의 상수 정의는 a=b의 형태를 갖는다. 여기서, a는 identifier를 b는 상수를 나타내는 terminal 심벌이다. 상수정의부분은 선택적이며 각각의 상수정의는 ;으로 구분한다. 다음은 상수정의 부분의 예이다. CONST ON = TRUE; OFF = FALSE; EPSILON = 1.0E-10;
※ 문법을 고안할 때, nonterminal의 이름은 구문 구조를 대변할 수 있는 명칭으로 쓰는 것이 바람직하다. Compiler Lecture Note, Inroduction to FL theory 2) C 언어의 정수선언 부분 : 정수선언 부분은 여러 개의 정수선언으로 구성되며 하나의 선언은 int a,a,a;와 같은 형태를 갖는다. 여기서 a는 임의의 identifier를 나타낸다. 그리고 ;으로 각각의 선언을 구분한다. 예를 들어, int i,j; int sum;과 같다. ※ 문법을 고안할 때, nonterminal의 이름은 구문 구조를 대변할 수 있는 명칭으로 쓰는 것이 바람직하다.
◈ In order to prove that a grammar generates a language L Compiler Lecture Note, Inroduction to FL theory ◈ In order to prove that a grammar generates a language L i) Every sentence generated by the grammar is in L. ii) Every string in L can be generated by the grammar. 교과서 54쪽 [예16] G = ( { S } , { ( , ) } , {S -> (S)S |ε}, S ) <=> All strings of balanced parentheses. proof) (=>) Every sentence derivable from S is balanced. (<=) Every balanced string is derivable from S.
(=>) Every sentence derivable from S is balanced. Compiler Lecture Note, Inroduction to FL theory (=>) Every sentence derivable from S is balanced. (i.e., S =>* ω, ω: balanced) By induction on the number of steps in a derivation. i) n = 1 일 때, S => ε, ε is surely balanced. ii) Suppose that all derivations of fewer than n steps produce balanced sentences. iii) Consider a leftmost derivation of exactly n steps. S => (S)S =>* (x)S =>* (x)y By the hypothesis x,y : balanced. Thus (x)y balanced.
(<=) Every balanced string is derivable from S. Compiler Lecture Note, Inroduction to FL theory (<=) Every balanced string is derivable from S. By induction on the length of a string. i) |ω| = 0, S => ε (the empty string is derivable from S.) ii) Assume that every balanced string of length less than 2n is derived from S. iii) Consider a balanced string ω of length 2n. Let (x) : shortest prefix of ω being balanced. Thus ω = (x)y, where x,y : balanced. Since |x|, |y|<2n, they are derivable from S by inductive hypothesis. Thus S => (S)S =>* (x)S =>* (x)y = ω Therefore, (x)y is also derivable from S.
Chomsky Hierarchy ◈ Noam Chomsky Compiler Lecture Note, Inroduction to FL theory Chomsky Hierarchy ◈ Noam Chomsky ◈ According to the form of the productions. α -> β ∈ P - Type 0 : No restrictions(unrestricted grammar) - Type 1 : Context-sensitive grammar(CSG). -> β, | | | β| - Type 2 : Context-free grammar(CFG). A -> , where A : nonterminal, ∈ V*. - Type 3 : Regular grammar(RG). A -> tB or A -> t, (right-linear) A -> Bt or A -> t, (left-linear) where, A, B: nonterminal, t ∈ VT*.
◈ REL (Recursively Enumerable Language) Compiler Lecture Note, Inroduction to FL theory ◈ REL (Recursively Enumerable Language) CSL (Context Sensitive Language) CFL (Context Free Language) RL (Regular Language) ◈ Examples of Formal Language simple matching language : Lm = {anbn | n ≥ 0} CFL double matching language : Ldm = {anbncn | n ≥ 0} CSL mirror image language : Lmi = {ωωR | ω ∈ VT*} CFL palindrome language : Lr = {ω | ω = ωR } CFL parenthesis language : Lp = {ω | ω: balenced parenthesis} CFL
context-sensitive language unrestricted language context-free language Compiler Lecture Note, Inroduction to FL theory ◈ The Chomsky Hierarchy of Languages regular language context-sensitive language unrestricted language context-free language
Grammar Language Recognizer Compiler Lecture Note, Inroduction to FL theory ◈ Languages & Recognizers Grammar Language Recognizer Type 0 (unrestricted) recursively enumerable set Turing machine context-sensitive language linear bounded automata type 1 context-sensitive context-free language pushdown automata type 2 context-free type 3 (regular) regular language finite automata