Compiler Lecture Note, Inroduction to FL theory

Slides:



Advertisements
Similar presentations
CHAPTER 5 TOP-DOWN PARSING SUNG-DONG KIM, DEPT. OF COMPUTER ENGINEERING, HANSUNG UNIVERSITY.
Advertisements

 수학 10- 나  1 학년 2 학기  Ⅰ. 도형의 방정식 1. 평면좌표 (1/24) 두 점 사이의 거리 수업 계획 수업 활동.
언어와 문법 (languages, grammar)
컴파일러 입문 제 5 장 Context-Free 문법.
Programming Languages Type Conversion Classifying Type Conversions According to the notation –Implicit Conversions (coercion) –Explicit Conversions (casting)
이산수학(Discrete Mathematics)
(Mathematical Induction)
C++ Tutorial 1 서강대학교 데이터베이스 연구실.
Sources of the Magnetic Field
DS020 오토마타형식언어 Chapter 3. Regular Languages and Regular Grammars
Compiler Lecture Note, Inroduction to FL theory
제 4 장 구문 분석.
제 7 장  LR 파서.
제 1장 C 언어의 소개.
기본 컴퓨터 프로그래밍 Lecture #6.
4장 구문(Syntax).
과목 홈페이지  전산학개론 이메일 숙제를 제출할 경우, 메일 제목은 반드시 ‘[전산학개론]’으로 시작.
제 6 장 데이터 타입 6.1 데이터 타입 및 타입 정보 6.2 타입의 용도 6.3 타입 구성자 6.4 사례 연구
Ruby 프로그래밍 1 문자열 입출력 제어구조 looping 함수 정의
Discrete Math II Howon Kim
2. 형식언어 (Formal Language)
문법과 언어.
프로그래밍언어론 2nd edition Tucker and Noonan
오토메타 형식언어 2003년도 제 2학기.
3장 (2) 구문과 의미론 순천향대학교 컴퓨터공학과 하상호.
DS020 오토마타형식언어 Chapter 6. Simplification of Context-Free Grammars and Normal Forms Exercises October 16, 2003.
장. 문법 구조(Syntax) 컴퓨터공학과 권기태 프로그래밍언어론.
제 5장. Context-Free Languages
이산수학(Discrete Mathematics) 수학적 귀납법 (Mathematical Induction)
학습목표 Turing Machine의 작동원리와 일반적인 계산모형으로서의 가능성 이해
Tel : Office : 2공학관 408호 오토마타 및 형식언어 김 현 성 Tel : Office : 2공학관 408호
Sung-Hae Jun 자연어 처리의 이해 Sung-Hae Jun
Chapter 7. PUSHDOWN AUTOMATA Exercises
재귀 혹은 귀납 Recursive or Inductive Definition 집합을 정의하는 방법
Chapter 2. Finite Automata Exercises
Discrete Math II Howon Kim
주요 내용 형식 언어와 문법 정규식과 정규 집합 유한 상태 기계 정규 문법과 유한 상태 기계와 정규집합.
계수와 응용 (Counting and Its Applications)
이산수학 논리∙명제에서 알고리즘까지 √ 원리를 알면 IT가 맛있다 ehanbit.net.
2. 형식언어 (Formal Language)
학습목표 CFL을 accept하는 오토메타인 pda를 이해 npda와 dpda의 차이를 compiler의 측면에서 학습
7. LL 구문 분석 7-1. 결정적 구문 분석 7-2. Recursive-descent 파서 7-3. Predictive 파서 7-4. Predictive 파싱 테이블의 구성 7-5. Strong LL(k) 문법과 LL(k)문법.
이산수학(Discrete Mathematics)  증명 전략 (Proof Strategy)
제 11장. 형식언어와 오토메타의 Hierarchy
제 4 장. Regular Language의 특성
학습목표 Pumping Lemma와 Closure 특성을 통해 CFL와 Language Family간의 관계 이해
Mathematical Description of Continuous-Time Signals
Introduction to Programming Language
Discrete Math II Howon Kim
이산수학(Discrete Mathematics)  증명 방법 (Methods of Proof)
문제 다음의 서술적 언어로 표현된 요구사항을 DFD로 완성하시오
프로그래밍언어론 2nd edition Tucker and Noonan
3. 정규 언어(Regular Language)
약속 November 9th, 2012.
Discrete Math II Howon Kim
4. 어휘 분석(Lexical analysis)
Chapter 5. Context-Free Language Exercises
이산수학(Discrete Mathematics)
제12장. Algorithmic Computation의 한계
이산수학(Discrete Mathematics)  집합 연산 (Set Operations)
점화와 응용 (Recurrence and Its Applications)
창 병 모 숙명여대 전산학과 자바 언어를 위한 CFA 창 병 모 숙명여대 전산학과
1. 관계 데이터 모델 (1) 관계 데이터 모델 정의 ① 논리적인 데이터 모델에서 데이터간의 관계를 기본키(primary key) 와 이를 참조하는 외래키(foreign key)로 표현하는 데이터 모델 ② 개체 집합에 대한 속성 관계를 표현하기 위해 개체를 테이블(table)
이산수학(Discrete Mathematics)
진리 나무 Truth-tree  ∧ ∨ → ↔  =.
이산수학(Discrete Mathematics)  증명 전략 (Proof Strategy)
이산수학(Discrete Mathematics) 수열과 합 (Sequences and Summations)
For regex_compile function in grep.c
제10장. Other Models of TM’s 학습목표
[CPA340] Algorithms and Practice Youn-Hee Han
Presentation transcript:

Compiler Lecture Note, Inroduction to FL theory 컴파일러 입문 제 2 장 형식 언어

차 례 2.1 언어(Language) 2.2 문법(Grammar) 2.3 문법의 분류 Compiler Lecture Note, Inroduction to FL theory 차 례 2.1 언어(Language) 2.2 문법(Grammar) 2.3 문법의 분류

Symbol Properties Examples Compiler Lecture Note, Inroduction to FL theory Symbol A “symbol” is an abstract entity that we shall not define formally. Properties Examples ㄱ,ㄴ,ㄷ,...,ㅎ,ㅏ,ㅑ, … ,ㅡ,ㅣ A,B,C, … ,Z begin,integer, … ,end

Alphabet Def 2.1: Alphabet Properties Examples Compiler Lecture Note, Inroduction to FL theory Alphabet Def 2.1: Alphabet An alphabet T is a finite set of symbols. Properties An alphabet is a set. An alphabet is finite. Examples T1 = {ㄱ,ㄴ,ㄷ,...,ㅎ,ㅏ,ㅑ, … ,ㅡ,ㅣ} T2 = {A,B,C, … ,Z} T3 = {begin,integer, … ,end} T4 = {x | x is an integer}는 알파벳이 아니다. 왜?

String Def 2.2: String Properties Examples Compiler Lecture Note, Inroduction to FL theory String Def 2.2: String A string (or word) over the alphabet T is a finite sequence of symbols from T. Properties A string is not a set. A string is finite. a, b, c represent symbols: u, v, w represent string. For example, w = abba means that the value of the string w is abba. Examples T1 = {0} -> string: 0, 00, 000, … T2 = {a, b}-> string: a, b, aa, ab, ba, bb, aaa, …

Length Def 2.3: length Properties Examples Compiler Lecture Note, Inroduction to FL theory Length Def 2.3: length The length of a string, denoted by |.|, is the number of symbols in the string. Properties Examples w = a1a2a3…ak이면, |w| = k이다.

Concatenation Def 2.4: Concatenation Properties Examples Compiler Lecture Note, Inroduction to FL theory Concatenation Def 2.4: Concatenation The concatenation of two strings is the string formed by writing the first, followed by the second, with no intervening space. Properties u • v를 보통 uv로 표기. u를 prefix, v를 suffix라 부른다. |uv| = |u| + |v| Examples u = a1a2a3...an, v = b1b2b3...bm , u • v = a1a2a3...anb1b2b3...bm

Empty String Def 2.5: empty string Properties Examples Compiler Lecture Note, Inroduction to FL theory Empty String Def 2.5: empty string The empty string e is the identity for the concatenation operator. That is, ew = we = w for each string w. Properties e is a string, not a set. e is a string consisting of no symbols. |e| = 0 Examples

Power & Reversal Properties Def The string an represents n a's. Compiler Lecture Note, Inroduction to FL theory Power & Reversal Def The string an represents n a's. The reversal of a string ω, denoted ωR is the string ω written in reverse order: i.e., if ω = a1a2...an then ωR = anan-1...a1. Properties a0 = ε (ωR) R = ω

T* & T+ Def 2.6: T* (T star), T+ (T dagger) Properties Examples Compiler Lecture Note, Inroduction to FL theory T* & T+ Def 2.6: T* (T star), T+ (T dagger) T* denotes the set of all strings of symbols over the alphabet T, including the empty string. T+ = T* - {ε}. Properties T is a set of symbols, while T* is a set of strings. T* and T+ are infinite sets. ∀u,v ∈ T*, uv ∈ T*. (For every u, v in the set T*, uv is in T*.) T*는 알파벳 T에 속하는 심벌로부터 만들 수 있는 스트림의 전체 집합 (universal set)을 의미한다. Examples ex 4 @ p.41

(Formal) Language Def 2.7: (Formal) Language Properties Examples Compiler Lecture Note, Inroduction to FL theory (Formal) Language Def 2.7: (Formal) Language A (formal) language L over the alphabet T is a subset of T*, that is, L ⊆ T*. Properties 또 다른 정의: Language is any set of strings over an alphabet. 언어는 string의 집합이다. 스트링 수가 유한이면 유한 언어 (finite language), 무한이면 무한언어 (infinite language)라 한다. 여기서 정의한 언어는 의미를 갖고 있지 않다. 형식언어이론은 스트링 집합의 속성에 대한 이론이다. Examples - ex 5 @ p.42

Product of Languages Def 2.8: Product Properties Examples Compiler Lecture Note, Inroduction to FL theory Product of Languages Def 2.8: Product The product of L and L’, denoted LL’, is the set {xy| x ∈ L and y ∈ L'}. Properties 언어의 곱은 교환법칙을 만족하지 않는다. 즉, LL’ ¹ L’L Examples 예 6 @ p. 43

Power of Language Def 2.9: Power Properties Examples Compiler Lecture Note, Inroduction to FL theory Power of Language Def 2.9: Power The powers of L are defined recursively by: L0 = {ε} Ln = LLn-1 for n 1. Properties power는 곱으로 표현된다. Examples 예 7 @ p. 43

Closure Def 2.10: Reflexive transitive closure (or just closure) Compiler Lecture Note, Inroduction to FL theory Closure Def 2.10: Reflexive transitive closure (or just closure) The reflexive transitive closure of language L, denoted L*, is defined by: L* = L0 ∪ L1 ∪ L2 ∪ ...∪ Ln ∪… = Properties L* denotes strings constructed by concatenating any number of words from L. Examples {10, 11}* = {e, 10, 11, 1010, 1011, 1110, 1111, …}

Transitive Closure Def 2.11: Transitive closure Properties Examples Compiler Lecture Note, Inroduction to FL theory Transitive Closure Def 2.11: Transitive closure The transitive closure of language L, denoted L+, is defined by: L+ = L1 ∪ L2 ∪ ...∪ Ln ∪… = Properties L+ contain e iff L does. L+ ¹ L* - L0 Examples

◈ Two problems 1) How do we represent a language ? Compiler Lecture Note, Inroduction to FL theory ◈ Two problems 1) How do we represent a language ? If the language is finite, the answer is easy. If the language is infinite, we are faced with the problem of finding a finite representation for the language. - Set description - Grammar : Generating Scheme - Recognizer : Recognition Scheme 2) Does there exist a finite representation for every language ? No ! - This is not always possible.

Grammar ◈ Language ◈ Grammar 문장(sentence)들을 원소로 갖는 집합 Compiler Lecture Note, Inroduction to FL theory Grammar ◈ Language 문장(sentence)들을 원소로 갖는 집합 언어를 어떻게 표현할 것인가 ? ◈ Grammar terminal : 정의된 언어의 알파벳 nonterminal : 스트링을 생성하는 데 사용되는 중간 과정의 심볼 언어의 구조를 정의하는데 사용 grammar symbol (V)

Grammar Def 2.12: Grammar A grammar is denoted Compiler Lecture Note, Inroduction to FL theory Grammar Def 2.12: Grammar A grammar is denoted G = (VN, VT, P, S), where 1. VN : a finite set of nonterminal symbols 2. VT : a finite set of terminal symbols VN  VT =  , VN∪ VT = V 3. P : a finite set of production rules α®β, α∈ V+, β∈ V* lhs rhs 4. S : start symbol (sentence symbol)

Grammar Example 8 G = ( {S, A}, {a, b}, P, S ) Text p.45 [예 8] Compiler Lecture Note, Inroduction to FL theory Grammar Example 8 G = ( {S, A}, {a, b}, P, S ) Text p.45 [예 8] P : S ® aAS S ® a A ® SbA A ® ba A ® SS => S ® aAS | a A ® SbA | ba | SS

Grammar Notes on grammar α®β는 a를 b로 바꾼다는 의미이다. Compiler Lecture Note, Inroduction to FL theory Grammar Notes on grammar α®β는 a를 b로 바꾼다는 의미이다. α®β1, α®β2를 α®β1|β2로 표기한다. VN  VT =  일 경우 VN 과 VT 은 disjoint라 한다. α∈V+는 α∈(VN∪ VT)+ 을 의미한다. 앞의 예제에서 aAS ∈V+ 이다. Convention A, B, C ∈ VN : Nonterminal symbols a, b, c ∈VT : terminal symbols a, b, g ∈V* w ∈VT*

◈ Derivation 1. => : "directly produce" or "directly derive" Compiler Lecture Note, Inroduction to FL theory ◈ Derivation 1. => : "directly produce" or "directly derive" if α β∈ P and  , δ∈ V* then  αδ =>  βδ 2. => * : just “drive” Suppose α1,α2,...,αn ∈ V* and α1 =>α2 => … => αn, then α1 =>* αn (zero or more derivations) 3. => + : one or more derivations. Cf. ® :production rule에서 사용. "may be replaced by" => : derivation할 때 사용한다.

Example P : S ® aA | bB | ε A ® bS B ® aS S =>* abba 유도 과정 Compiler Lecture Note, Inroduction to FL theory Example P : S ® aA | bB | ε A ® bS B ® aS S =>* abba 유도 과정 S => aA (생성규칙 S aA) => abS (생성규칙 A bS) => abbB (생성규칙 S bB) => abbaS (생성규칙 B aS) => abba (생성규칙 S )

Sentential Form & Sentence Compiler Lecture Note, Inroduction to FL theory Sentential Form & Sentence Def 2.14: Sentential Form and Sentence Let S be the start symbol of a grammar G and ω be a string such that S =>* ω. The string ω is called a sentential form if ω ∈ V*, and a sentence if ω ∈ VT*. Properties Both sentences and sentential forms are strings. A sentence is a sentential form solely of terminal symbols.

Language Generated by Grammar Compiler Lecture Note, Inroduction to FL theory Language Generated by Grammar Def 2.15: Language generated by grammar The language generated by G, denoted L[G], is the set: L(G) = {ω | S =>* ω, ω ∈ VT*}. Properties L[G] is the set of sentences. The string of L[G] consists solely of terminal. The string of L[G] can be derived from S. Examples ex 11-16

◈ G1 = ( {S}, {a}, P, S ) 을 이용하여 L(G1) P : S a | aS Compiler Lecture Note, Inroduction to FL theory ◈ G1 = ( {S}, {a}, P, S ) 을 이용하여 L(G1) P : S a | aS L (G1) = { an | n  1 } ◈ Language design Text p.49 Grammar Language generation design

◈ G = ( {A, B, C}, {a, b, c}, P, A) P : A abc A aBbc Bb bB Bc Cbcc Compiler Lecture Note, Inroduction to FL theory ◈ G = ( {A, B, C}, {a, b, c}, P, A) P : A abc A aBbc Bb bB Bc Cbcc bC Cb aC aaB aC aa L(G) = { anbncn | n  1 }

(===>) ex1) S 0S1 | 01 ex2) S aSb | c ex3) A aB B bB | b Compiler Lecture Note, Inroduction to FL theory (===>) ex1) S 0S1 | 01 ex2) S aSb | c ex3) A aB B bB | b ex4) A abc A aBbc Bb bB Bc Cbcc bC Cb aC aaB aC aa

◈ Grammar Design L = { an | n  0 }일 때 문법 : A aA | ε Compiler Lecture Note, Inroduction to FL theory ◈ Grammar Design L = { an | n  0 }일 때 문법 : A aA | ε L = { an | n  1 }일 때 문법 : A aA | a Embedded production A aAb ex1) L1 = { anbn | n  0 } ex2) L2 = { 0i1j | i  j, i,j  1 } ex3) Constructs of Conventional PL

Compiler Lecture Note, Inroduction to FL theory 1) 파스칼 언어의 상수정의 부분 : 상수정의 부분은 CONST라는 예약어로 시작하며 하나의 상수 정의는 a=b의 형태를 갖는다. 여기서, a는 identifier를 b는 상수를 나타내는 terminal 심벌이다. 상수정의부분은 선택적이며 각각의 상수정의는 ;으로 구분한다. 다음은 상수정의 부분의 예이다. CONST ON = TRUE; OFF = FALSE; EPSILON = 1.0E-10;

※ 문법을 고안할 때, nonterminal의 이름은 구문 구조를 대변할 수 있는 명칭으로 쓰는 것이 바람직하다. Compiler Lecture Note, Inroduction to FL theory 2) C 언어의 정수선언 부분 : 정수선언 부분은 여러 개의 정수선언으로 구성되며 하나의 선언은 int a,a,a;와 같은 형태를 갖는다. 여기서 a는 임의의 identifier를 나타낸다. 그리고 ;으로 각각의 선언을 구분한다. 예를 들어, int i,j; int sum;과 같다. ※ 문법을 고안할 때, nonterminal의 이름은 구문 구조를 대변할 수 있는 명칭으로 쓰는 것이 바람직하다.

◈ In order to prove that a grammar generates a language L Compiler Lecture Note, Inroduction to FL theory ◈ In order to prove that a grammar generates a language L i) Every sentence generated by the grammar is in L. ii) Every string in L can be generated by the grammar. 교과서 54쪽 [예16] G = ( { S } , { ( , ) } , {S -> (S)S |ε}, S ) <=> All strings of balanced parentheses. proof) (=>) Every sentence derivable from S is balanced. (<=) Every balanced string is derivable from S.

(=>) Every sentence derivable from S is balanced. Compiler Lecture Note, Inroduction to FL theory (=>) Every sentence derivable from S is balanced. (i.e., S =>* ω, ω: balanced) By induction on the number of steps in a derivation. i) n = 1 일 때, S => ε, ε is surely balanced. ii) Suppose that all derivations of fewer than n steps produce balanced sentences. iii) Consider a leftmost derivation of exactly n steps. S => (S)S =>* (x)S =>* (x)y By the hypothesis x,y : balanced. Thus (x)y balanced.

(<=) Every balanced string is derivable from S. Compiler Lecture Note, Inroduction to FL theory (<=) Every balanced string is derivable from S. By induction on the length of a string. i) |ω| = 0, S => ε (the empty string is derivable from S.) ii) Assume that every balanced string of length less than 2n is derived from S. iii) Consider a balanced string ω of length 2n. Let (x) : shortest prefix of ω being balanced. Thus ω = (x)y, where x,y : balanced. Since |x|, |y|<2n, they are derivable from S by inductive hypothesis. Thus S => (S)S =>* (x)S =>* (x)y = ω Therefore, (x)y is also derivable from S.

Chomsky Hierarchy ◈ Noam Chomsky Compiler Lecture Note, Inroduction to FL theory Chomsky Hierarchy ◈ Noam Chomsky ◈ According to the form of the productions. α -> β ∈ P - Type 0 : No restrictions(unrestricted grammar) - Type 1 : Context-sensitive grammar(CSG).  -> β, |  |  | β| - Type 2 : Context-free grammar(CFG). A -> , where A : nonterminal,  ∈ V*. - Type 3 : Regular grammar(RG). A -> tB or A -> t, (right-linear) A -> Bt or A -> t, (left-linear) where, A, B: nonterminal, t ∈ VT*.

◈ REL (Recursively Enumerable Language) Compiler Lecture Note, Inroduction to FL theory ◈ REL (Recursively Enumerable Language) CSL (Context Sensitive Language) CFL (Context Free Language) RL (Regular Language) ◈ Examples of Formal Language  simple matching language : Lm = {anbn | n ≥ 0} CFL  double matching language : Ldm = {anbncn | n ≥ 0} CSL  mirror image language : Lmi = {ωωR | ω ∈ VT*} CFL  palindrome language : Lr = {ω | ω = ωR } CFL  parenthesis language : Lp = {ω | ω: balenced parenthesis} CFL

context-sensitive language unrestricted language context-free language Compiler Lecture Note, Inroduction to FL theory ◈ The Chomsky Hierarchy of Languages regular language context-sensitive language unrestricted language context-free language

Grammar Language Recognizer Compiler Lecture Note, Inroduction to FL theory ◈ Languages & Recognizers Grammar Language Recognizer Type 0 (unrestricted) recursively enumerable set Turing machine context-sensitive language linear bounded automata type 1 context-sensitive context-free language pushdown automata type 2 context-free type 3 (regular) regular language finite automata