7. Computational Learning Theory

Slides:

Advertisements

Similar presentations

생활 속의 확률과 진실성 하안북중 1학년 서동조.

Advertisements

(Mathematical Induction)

Sources of the Magnetic Field

Multiple features Linear Regression with multiple variables (다변량 선형회귀)

Chapter 7 ARP and RARP.

Shortest Path Algorithm

Dialogue System Seminar

Chaper 2 ~ chaper 3 허승현 제어시스템 설계.

Discrete Math II Howon Kim

정 의 학습의 일반적 정의 기계학습(Machine Learning)의 정의

국민건강영양조사 한국보건의료연구원 이 자 연

알고리즘(Algorithm)  알고리즘 개요 (효율, 분석, 차수) Part 1 강원대학교 컴퓨터과학전공 문양세.

REINFORCEMENT LEARNING

발표제목 발표제목 둘째 줄 2000년 11월 송 홍 엽 연세대학교 전기전자공학과 송 홍 엽

Learning Classifier using DNA Bagging

EPS Based Motion Recognition algorithm Comparison

On the computation of multidimensional Aggregates

CHAPTER 21 UNIVARIATE STATISTICS

Genetic Algorithm 신희성.

Ch. 10 Algorithm Efficiency & Sorting

Ch. 10 Algorithm Efficiency & Sorting

1 도시차원의 쇠퇴실태와 경향 Trends and Features of Urban Decline in Korea

Chapter 2. Finite Automata Exercises

숭실대학교 마이닝연구실 김완섭 2009년 2월 8일 아이디어 - 상관분석에 대한 연구

Semi-supervised Document classification (probabilistic model and EM)

계수와 응용 (Counting and Its Applications)

Medical Instrumentation

4-1 Gaussian Distribution

Parallel software Lab. 박 창 규

PCA Lecture 9 주성분 분석 (PCA)

제 4 장. Regular Language의 특성

Data Mining Final Project

추정의 기본원리 Introduction to Estimation

Chip-based Computing + UMDA

Modeling one measurement variable against another Regression analysis (회귀분석) Chapter 12.

경제통계학 개요 사공 용 서강대학교 경제학과.

Association between two measurement variables Correlation

Discrete Math II Howon Kim

Introduction to Programming Language

제1장 자료구조를 배우기 위한 준비.

Inferences concerning two populations and paired comparisons

알고리즘(Algorithm)  알고리즘 개요 (효율, 분석, 차수) Part 년 봄학기

Association between two measurement variables Correlation

Statistical inference I (통계적 추론)

KMP ALPS 알고리즘 세미나 김태리.

The normal distribution (정규분포)

알고리즘의 분석(analysis) 공간적 효율성(Space Efficiency)과 시간적 효율성(Time Efficiency)

이산수학(Discrete Mathematics) 비둘기 집 원리 (The Pigeonhole Principle)

이산수학(Discrete Mathematics)

Internet Computing KUT Youn-Hee Han

제12장. Algorithmic Computation의 한계

Modeling one measurement variable against another Regression analysis (회귀분석) Chapter 12.

점화와 응용 (Recurrence and Its Applications)

1. 관계 데이터 모델 (1) 관계 데이터 모델 정의 ① 논리적인 데이터 모델에서 데이터간의 관계를 기본키(primary key) 와 이를 참조하는 외래키(foreign key)로 표현하는 데이터 모델 ② 개체 집합에 대한 속성 관계를 표현하기 위해 개체를 테이블(table)

Definitions (정의) Statistics란?

이산수학(Discrete Mathematics)

최소의 실험 횟수에서 최대의 정보를 얻기 위한 계획방법 분석방법: 분산분석(Analysis of Variance, ANOVA)

알고리즘의 분석(analysis) 공간적 효율성(Space Efficiency)과 시간적 효율성(Time Efficiency)

Hongik Univ. Software Engineering Laboratory Jin Hyub Lee

DNA Implementation of Version Space Learning

제10장. Other Models of TM’s 학습목표

[CPA340] Algorithms and Practice Youn-Hee Han

Progress Seminar 신희안.

Progress Seminar 신희안.

Chapter 2. Coulomb’s Law & Electric Field Intensity

Chapter 4. Energy and Potential

Model representation Linear regression with one variable

알고리즘의 분석(analysis) 공간적 효율성(Space Efficiency)과 시간적 효율성(Time Efficiency)

Chapter 7: Deadlocks.

Presentation transcript:

7. Computational Learning Theory 신수용 2018-11-27

개요 Introduction PAC Sample Complexity for Finite Hypothesis Spaces for Infinite Hypothesis Spaces Mistake Bound Model 2018-11-27

7.1 Introduction Machine Learning 학습시의 의문점들 학습 알고리즘과 무관하게 학습문제가 쉬운지 어려운지 판단할 수 있는가? 성공적인 학습을 위한 training example의 개수를 판단할 수 있는가? Learner가 trainer에게 질문을 하는 것은 어떤 영향을 미치는가? Learner가 학습을 마치기 전까지 행하는 mistake의 수는 얼마인가? 학습 문제의 계산 복잡도를 판단할 수 있는가? 2018-11-27

7.1 (2) Our Goal Sample Complexity Computational Complexity Learner가 성공적인 가설을 성립하기까지 필요한 training example의 개수 Computational Complexity Learner가 성공적인 가설을 성립하기까지 필요한 계산 시간 Mistake bound Learner가 성공적인 가설을 성립하기까지 잘못 분류하는 training example의 개수 2018-11-27

7.2 Probably Learning an Approximately Correct Hypothesis Problem Setting X : set of all possible instances C : set of target concept D : probability distribution 조건 : stationary 2018-11-27

7.2 PAC (2) Error of a Hypothesis True Error(errorD(h)) training error with respect to target concept c and distribution D is the probability that h will misclassify an instance drawn at random according to D training error the fraction of training examples misclassified by h 2018-11-27

7.2 PAC (3) PAC Learnability Sample error errorD(h) = 0 : 효과없음 with respect to a set S of samples to be the fraction of S misclassified by h S = training example -> sample error = training error PAC Learnability errorD(h) = 0 : 효과없음 모든 가능한 instance를 제공하지 않으면, 제공된 training example에 대해서 다양한 가설이 존재해서 적합한 가설을 선택하기 어려움 2018-11-27

7.2 PAC (4) Training example이 임의로 선택되기 때문에, 오류가 있는 training example이 선택될 수 있음 조건 완화 error가 0이 아니어도 됨 충분히 작은 상수 보다 작은 수의 error는 허용 모든 training 예제에 대해서 성공하지 않아도 됨 충분히 작은 상수 보다 작은 수의 실패 확률은 허용 the learner probably learn a hypothesis that is approximately correct(PAC learning) 2018-11-27

7.2 PAC (5) PAC PAC-learnable PAC인 경우에 polynomial-time algorithm and polynomial function이 존재 C is PAC-learnable : 0 < < 1/2, 0 < < 1/2, learner L will with probability at least (1 - ) output a hypothesis such that in time that is polynomial in 2018-11-27

7.3 Sample complexity for finite hypothesis spaces consistent learner outputs hypothesis that perfectly fit the training data version space (VSH,D) set of all hypothesis that correctly classify the training examples D. 2018-11-27

7.3 Sample complexity for finite hypothesis spaces (1) Consistent learner에게 필요한 예제의 수는 version space가 적합한 가설만을 가지게 되는 예제의 수와 같다. . Version space is said to be with respect to c and D, if every hypothesis h in VSH,D has error less than with respect to c and D all the hypotheses consistent with the observed training examples happen to have true error less than 2018-11-27

7.3 Sample complexity for finite hypothesis spaces (3) -exhausting the version space H is finite, D is a sequence of , the probability that the version space VSH,D is not -exhausted is less than or equal to PAC 도입 2018-11-27

7.3 (4) Agnostic Learning and Inconsistent Hypothesis agnostic learner target concept가 H에 의해서 표현 가능한지 여부에 대해서 가정하지 않고, 최소한의 training error를 가지는 hypothesis를 찾는 학습자 hbest hypothesis from H having lowest training error over the training example 2018-11-27

7.3 (5) 실제 학습시 error : Hoeffding bounds 지금까지 제한 조건 : errorD(hbest) = 0 deviation between the true probability of some event and its observed frequency over m independent trials 2018-11-27

7.3 (6) Conjunctions of Boolean Literals Are PAC-Learnable |H| = 3n FIND-S algorithm을 쓰면 boolean conjunction은 PAC-learnable PAC-Learnability of Other Concept Classes Unbiased Learners 2018-11-27

7.3 (7) k-terms DNF and k-CNF concepts Set C of all definable target concepts corresponds to the power set of X : unbiased class : exponential sample complexity k-terms DNF and k-CNF concepts polynomial sample complexity를 가지지만 polynomial time에 학습불가능한 경우 k-DNF 2018-11-27

7.4 Sample Complexity for Infinite Hypothesis Spaces Sample Complexity를 |H| term을 사용해서 표현시 나쁜점 quite weak bounds infinite hypothesis space는 표현 불가능 H’s Complexity의 다른 표현법(instead of |H|) Vapnik-Chervonenkis dimension of H VC dimension, or VC(H) 보다 tight한 bound를 제공 number of distinct instances from X that can be completely discriminated using H 2018-11-27

7.4 Sample Complexity for Infinite Hypothesis Spaces (2) Shattering a Set of Instances Shatter A set of instances S is shattered by hypothesis space H if and only if for every dichotomy of S there exists some hypothesis in H consistent with this dichotomy The Vapnik-Chervonenkis Dimension size of the largest finite subset of X shattered by H. If arbitrarily large finite sets of X can be shattered by H, then 2018-11-27

7.4 Sample Complexity for Infinite Hypothesis Spaces (3) 2018-11-27

7.4 Sample Complexity for Infinite Hypothesis Spaces (4) Illustrative Examples X : set of real numbers H : set of intervals on the real number line S = {3.1, 5.7} 4개의 hypothesis가 필요 (1 < x <2), (1 < x <4), (4 < x < 7), (1 < x <7) VC(H) = 2 (at least) S = {1.5, 4.2, 5.1} Not shattered -> VC(H) = 2 (final) H : infinite -> VC(H) : finite 2018-11-27

7.4 (5) X : points on the x, y planes H : all linear decision surfaces in the plane VC(H) = 3 VC dimension of linear decision surfaces in an r dimensional space : r + 1 ex) (a) (b) 2018-11-27

7.4 (6) Sample Complexity and the VC Dimension training example 개수의 upper tight bound sufficient number for successful learning lower bound on sample complexity necessary number for successful learning 2018-11-27

7.5 The Mistake Bound Model of Learning total number of mistakes it makes before it converges to the correct hypothesis Mistake Bound for the FIND-S Algorithm FIND-S Initialize h to the most specific hypothesis For each positive training instance x remove from h any literal that is not satisfied by x Output hypothesis h. 2018-11-27

7.5 The Mistake Bound Model of Learning (2) 최악 경우의 mistake bound : n + 1 ex) Mistake Bound for the HALVING Algorithm Having Algorithm version space + majority vote worst case mistake bound 2018-11-27

7.5 The Mistake Bound Model of Learning (3) Optimal Mistake Bounds MA(c) A가 c를 학습하기 위해서 발생시키는 최대 error optimal mistake bound (Opt(C)) minimum over all possible learning algorithm A of MA(C) 2018-11-27

7.5 (4) WEIGHTED-MAJORITY Algorithm VC와 Mistake Bound와의 관계 weighted vote 이용 Relative mistake bound for Weighted-Majority 2018-11-27