Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hierarchical Classification: Comparison with Flat Method

Similar presentations


Presentation on theme: "Hierarchical Classification: Comparison with Flat Method"— Presentation transcript:

1 Hierarchical Classification: Comparison with Flat Method
Yongwook Yoon Jun 12, 2003 NLP Lab., POSTECH

2 Contents Hierarchical vs. Flat Classifier System Overview
Experiment & Results Comparison with Flat Methods Future Work

3 Hierarchical Classification
Natural Method in very large group of classes Top-down Classification Human readability promotion The better performance than flat method Much better recall and precision Flexible strategy: applicable to different levels of hierarchy

4 Flat vs. Hierarchical classification
Business Root Grain Oil D1 D2 D3 Dn D1 D2 Di Dj Dj+1 Dn

5 System Overview (1) Baseline system currently
BOW (Bag Of Words) approach No syntactic or semantic feature yet Naïve Bayesian classifier Many feature selection methods Pruning vocabulary by document count or word count Information gain Hierarchical classification added

6 System Overview (2) Many Extensions possible
BOW Library (by Mccallum) supplies Support Vector Machine Not on-line but batch Maximum Entropy K-nearest neighbor Pr-TFIDF Bow does not supplies On-line learning methods Syntactic or semantic classification features

7 System Overview (3) Functions implemented for hierarchical classification Construct the hierarchy prototype as a directory structure Division of documents into different levels of hierarchy Training of each classifiers in hierarchy Testing of documents from the root to the leaf classifiers automatically Logging and evaluation

8 Construct Hierarchical Structure
Business Grain Oil Make up Logical hierarchy Documents Divide docs Into hierarchy Training of Each classifier Training parameters

9 Classifying documents
A doc From Root node Input a document Level_0 From intermediate Result, Go further Grain Steel Oil Class_j Final classification Class_1 Class_i Class_k Class_N

10 Experiment Data: 20 newsgroup documents Two major trials 20 classes
1,000 documents per class Intrinsic hierarchy rec.sport.hockey, talk.politics.mideast Two major trials Flat vs. Hierarchy Evaluation and comparison ck ck

11 Experiment detail Sigir-2001(Bekkerman et. al) 과 동일한 환경에서 실험
News의 header부분 제외 단 subject 라인은 살림 모든 문자를 소문자화 Multi-labeled class허용 4-fold cross-validation

12 4-fold cross validation
Flat and hierarchy case 에 동일 적용 20,000의 문서를 random하게 5,000개의 문서로 된 4 set로 구분 위 4 set중 3개씩을 training으로 1개는 testing으로 조합 4번의 실험을 수행 최종 evaluation은 위 4개를 평균 Sigir-2001 evaluation 방법

13 root alt comp misc rec sci soc talk 총 8개의 classifier가 필요 forsale
religion atheism os graphics sys windows crypt electronics med space sport motor- cycles autos christian x ms-windows ibm mac baseball hocky politics religion misc hardware hardware guns mideast misc misc 총 8개의 classifier가 필요

14 Result – Flat method The result of 4-fold cross-validation: ./test_out/cv3_info900.stats Correct: out of ( 82.88% percent accuracy) classname :total alt.atheism : % comp.graphics : % 2 comp.os.ms-windows.misc : % 3 comp.sys.ibm.pc.hardware : % 4 comp.sys.mac.hardware : % comp.windows.x : % misc.forsale : % rec.autos : % rec.motorcycles : % rec.sport.baseball : % rec.sport.hockey : % sci.crypt : % sci.electronics : % sci.med : % sci.space : % 15 soc.religion.christian : % talk.politics.guns : % 17 talk.politics.mideast : % talk.politics.misc : % talk.religion.misc : %

15 Evaluation measure From the four entries αi, βi, γi of the confusion matrix we compute Precision = Recall = Micro-averaged BEP = (P+R) / 2 Overall recall and precision are the same βi: 원래 Ci가 아니나 Ci로 분류된 문서 수, γi: 원래 Ci이나 Cj로 분류된 문서 수

16 Result – Hierarchical (1)
# level_0 The result of 4-fold cross-validation: ./test_out/cv3_info20000.stats Correct: out of ( 92.22% percent accuracy) classname :total alt.atheism : % comp : % misc.forsale : % rec : % sci : % 5 soc.religion.christian : % talk : % # level_1/comp. The result of 4-fold cross-validation: ./test_out/cv3_info900.stats Correct: 4276 out of 4807 ( 88.95% percent accuracy) classname :total comp.graphics : % 1 comp.os.ms-windows.misc : % comp.sys : % comp.windows.x : %

17 Result – Hierarchical (1)
# level_1/sci. The result of 4-fold cross-validation: ./test_out/cv3_info900.stats Correct: 3596 out of 3730 ( 96.41% percent accuracy) classname :total sci.crypt : % 1 sci.electronics : % sci.med : % sci.space : % # level_1/talk. Correct: 3014 out of 3276 ( 92.00% percent accuracy) classname :total 0 talk.politics : % 1 talk.religion.misc : % # level_2/comp.sys. The result of 4-fold cross-validation: ./test_out/cv3_info10.stats Correct: 1709 out of 1743 ( 98.05% percent accuracy) 0 comp.sys.ibm.pc.hardware : % 1 comp.sys.mac.hardware : %

18 Evaluation measure in Hierarchical
Recall Same as the flat method case The correct classification count in leaf node Precision Should consider the incorrect count in the upper level classification Divide the incorrect count by the count of classes in the lower level Finally, in the computation at the leaf node we consider all of the averaged incorrect count of the upper levels into account

19 Evaluation measure in Hierarchical
Recall Same as the flat method case The correct classification count in leaf node Precision Should consider the incorrect count in the upper level classification Divide the incorrect count by the count of classes in the lower level Finally, in the computation at the leaf node we consider all of the averaged incorrect count of the upper levels into account

20 Evaluation measure in Hierarchical – Cont’
# level_0 The result of 4-fold cross-validation: ./test_out/cv3_info20000.stats Correct: out of ( 92.22% percent accuracy) classname :total alt.atheism : % comp : % misc.forsale : % rec : % sci : % 5 soc.religion.christian : % talk : % # level_1/comp. The result of 4-fold cross-validation: ./test_out/cv3_info900.stats Correct: 4276 out of 4807 ( 88.95% percent accuracy) classname :total comp.graphics : % 1 comp.os.ms-windows.misc : % comp.sys : % comp.windows.x : % ( ) / 5 = 68.2 Precision of comp.graphics = 802 / ( ) = 82.5

21 Comparison of Flat vs. Hierarchical
Algorithm BEP condition Flat 82.9 Naïve Bayesian, infogain – 900 words Hierarchical 85.8 Infogain – 10 ~ 20,000 words Sigir-2001 (Flat) 86.3 (unfair) SVM, MI – 15,000 words 88.6 IB clustering – 300 groups

22 Analysis Hierarchical이 Flat case보다 월등한 성능향상
Naïve Bayesian 과 word feature 만의 결과 SVM, Maximum entropy 등 다른 방법 적용할 필요 Hierarchical에서는 level마다 다른 classifier와 다른 feature selection방법 사용 가능 보다 flexible한 적용 가능 성능향상을 위한 tuning의 여지가 많음 실험에서는 level마다 feature갯수를 달리 적용 Level_0: 20,000; Level_1:900; Level_2: 10~900 words Class 추가, 삭제에 대하여 국부적인 영향 Relevant level에 대해서만 재 학습

23 Future Work Naïve Bayesian을 대체한 다른 classifier 적용
Support Vector Machine, Maximum Entropy, K-nearest neighbor, Pr-TFIDF 등 Hierarchy + Adaptive learning Hierarchy 에 적합한 adaptive 방법 발굴 다른 feature selection 방법 적용

24 Discussion Overall precision 과 recall이 일치하는 게 우연인가 필연인가?
Micro-averaged 라서 그런가 아니면, class당 문서수가 일정한 결과인가? Hierarchical의 classification 단계에서 어느 한 단계의 분류결과에 대해서 정답만 추출하여 다음단계로 내려보내지 말고 다 내려보내야 맞음(현실적, 합리적인 적용) 현실에선 중간단계마다 정답을 구분하지 않음 또, Hierarchical precision계산시 상위레벨에서 틀린것을 평균해서 하위로 내려보낼 필요 없이, 위에서 처럼 틀린것까지 다 하위레벨로 내려보낸후 leaf node에서 정답과 비교, precison을 계산하는 것이 맞지 않나? 지금의 계산방법과 비교 필요 Future Work에 대하여 Baseline이후 연구방향 확정 필요 Class 의 추가, 삭제를 incremental learning으로 cover 하는 방법 Hard Problem ! 단순 Hierarchical + adaptive learning

25 The End


Download ppt "Hierarchical Classification: Comparison with Flat Method"

Similar presentations


Ads by Google