Download presentation
Presentation is loading. Please wait.
1
7. Computational Learning Theory
신수용
2
개요 Introduction PAC Sample Complexity for Finite Hypothesis Spaces
for Infinite Hypothesis Spaces Mistake Bound Model
3
7.1 Introduction Machine Learning 학습시의 의문점들
학습 알고리즘과 무관하게 학습문제가 쉬운지 어려운지 판단할 수 있는가? 성공적인 학습을 위한 training example의 개수를 판단할 수 있는가? Learner가 trainer에게 질문을 하는 것은 어떤 영향을 미치는가? Learner가 학습을 마치기 전까지 행하는 mistake의 수는 얼마인가? 학습 문제의 계산 복잡도를 판단할 수 있는가?
4
7.1 (2) Our Goal Sample Complexity Computational Complexity
Learner가 성공적인 가설을 성립하기까지 필요한 training example의 개수 Computational Complexity Learner가 성공적인 가설을 성립하기까지 필요한 계산 시간 Mistake bound Learner가 성공적인 가설을 성립하기까지 잘못 분류하는 training example의 개수
5
7.2 Probably Learning an Approximately Correct Hypothesis
Problem Setting X : set of all possible instances C : set of target concept D : probability distribution 조건 : stationary
6
7.2 PAC (2) Error of a Hypothesis True Error(errorD(h)) training error
with respect to target concept c and distribution D is the probability that h will misclassify an instance drawn at random according to D training error the fraction of training examples misclassified by h
7
7.2 PAC (3) PAC Learnability Sample error errorD(h) = 0 : 효과없음
with respect to a set S of samples to be the fraction of S misclassified by h S = training example -> sample error = training error PAC Learnability errorD(h) = 0 : 효과없음 모든 가능한 instance를 제공하지 않으면, 제공된 training example에 대해서 다양한 가설이 존재해서 적합한 가설을 선택하기 어려움
8
7.2 PAC (4) Training example이 임의로 선택되기 때문에, 오류가 있는 training example이 선택될 수 있음 조건 완화 error가 0이 아니어도 됨 충분히 작은 상수 보다 작은 수의 error는 허용 모든 training 예제에 대해서 성공하지 않아도 됨 충분히 작은 상수 보다 작은 수의 실패 확률은 허용 the learner probably learn a hypothesis that is approximately correct(PAC learning)
9
7.2 PAC (5) PAC PAC-learnable
PAC인 경우에 polynomial-time algorithm and polynomial function이 존재 C is PAC-learnable : 0 < < 1/2, 0 < < 1/2, learner L will with probability at least ( ) output a hypothesis such that in time that is polynomial in
10
7.3 Sample complexity for finite hypothesis spaces
consistent learner outputs hypothesis that perfectly fit the training data version space (VSH,D) set of all hypothesis that correctly classify the training examples D.
11
7.3 Sample complexity for finite hypothesis spaces (1)
Consistent learner에게 필요한 예제의 수는 version space가 적합한 가설만을 가지게 되는 예제의 수와 같다. . Version space is said to be with respect to c and D, if every hypothesis h in VSH,D has error less than with respect to c and D all the hypotheses consistent with the observed training examples happen to have true error less than
12
7.3 Sample complexity for finite hypothesis spaces (3)
-exhausting the version space H is finite, D is a sequence of , the probability that the version space VSH,D is not exhausted is less than or equal to PAC 도입
13
7.3 (4) Agnostic Learning and Inconsistent Hypothesis agnostic learner
target concept가 H에 의해서 표현 가능한지 여부에 대해서 가정하지 않고, 최소한의 training error를 가지는 hypothesis를 찾는 학습자 hbest hypothesis from H having lowest training error over the training example
14
7.3 (5) 실제 학습시 error : Hoeffding bounds 지금까지 제한 조건 : errorD(hbest) = 0
deviation between the true probability of some event and its observed frequency over m independent trials
15
7.3 (6) Conjunctions of Boolean Literals Are PAC-Learnable
|H| = 3n FIND-S algorithm을 쓰면 boolean conjunction은 PAC-learnable PAC-Learnability of Other Concept Classes Unbiased Learners
16
7.3 (7) k-terms DNF and k-CNF concepts
Set C of all definable target concepts corresponds to the power set of X : unbiased class : exponential sample complexity k-terms DNF and k-CNF concepts polynomial sample complexity를 가지지만 polynomial time에 학습불가능한 경우 k-DNF
17
7.4 Sample Complexity for Infinite Hypothesis Spaces
Sample Complexity를 |H| term을 사용해서 표현시 나쁜점 quite weak bounds infinite hypothesis space는 표현 불가능 H’s Complexity의 다른 표현법(instead of |H|) Vapnik-Chervonenkis dimension of H VC dimension, or VC(H) 보다 tight한 bound를 제공 number of distinct instances from X that can be completely discriminated using H
18
7.4 Sample Complexity for Infinite Hypothesis Spaces (2)
Shattering a Set of Instances Shatter A set of instances S is shattered by hypothesis space H if and only if for every dichotomy of S there exists some hypothesis in H consistent with this dichotomy The Vapnik-Chervonenkis Dimension size of the largest finite subset of X shattered by H. If arbitrarily large finite sets of X can be shattered by H, then
19
7.4 Sample Complexity for Infinite Hypothesis Spaces (3)
20
7.4 Sample Complexity for Infinite Hypothesis Spaces (4)
Illustrative Examples X : set of real numbers H : set of intervals on the real number line S = {3.1, 5.7} 4개의 hypothesis가 필요 (1 < x <2), (1 < x <4), (4 < x < 7), (1 < x <7) VC(H) = 2 (at least) S = {1.5, 4.2, 5.1} Not shattered -> VC(H) = 2 (final) H : infinite -> VC(H) : finite
21
7.4 (5) X : points on the x, y planes
H : all linear decision surfaces in the plane VC(H) = 3 VC dimension of linear decision surfaces in an r dimensional space : r + 1 ex) (a) (b)
22
7.4 (6) Sample Complexity and the VC Dimension
training example 개수의 upper tight bound sufficient number for successful learning lower bound on sample complexity necessary number for successful learning
23
7.5 The Mistake Bound Model of Learning
total number of mistakes it makes before it converges to the correct hypothesis Mistake Bound for the FIND-S Algorithm FIND-S Initialize h to the most specific hypothesis For each positive training instance x remove from h any literal that is not satisfied by x Output hypothesis h.
24
7.5 The Mistake Bound Model of Learning (2)
최악 경우의 mistake bound : n + 1 ex) Mistake Bound for the HALVING Algorithm Having Algorithm version space + majority vote worst case mistake bound
25
7.5 The Mistake Bound Model of Learning (3)
Optimal Mistake Bounds MA(c) A가 c를 학습하기 위해서 발생시키는 최대 error optimal mistake bound (Opt(C)) minimum over all possible learning algorithm A of MA(C)
26
7.5 (4) WEIGHTED-MAJORITY Algorithm VC와 Mistake Bound와의 관계
weighted vote 이용 Relative mistake bound for Weighted-Majority
Similar presentations