Presentation is loading. Please wait.

Presentation is loading. Please wait.

Extracting Schedule Information from Korean

Similar presentations


Presentation on theme: "Extracting Schedule Information from Korean"— Presentation transcript:

1 Extracting Schedule Information from Korean Email
제목을 이렇게 해도 될지? Kyoungryol Kim

2 Table of Contents Purpose of Utilization Annotated Data Analysis
Reference for NER Tagging Baseline System

3 1. Purpose of Utilization

4 Purpose of Utilization
To extract accurate schedule information, including "Speaker", "Meeting Location" from Korean and register them to online calendar. Finding semantics from extracted information. Meeting Location : Geographical location recognition Speaker : Person recognition (contacts of the ) Meeting Location 대전 유성구 한국과학기술원 전산학과 1층 세미나실 Speaker 김 아나톨리, 박광희

5 isHeldAt hasReference hasReference 김아나톨리 박광희 Named Entity Recognition
... 4 , 카이스트 전산동 1층 세미나실 에서 세미나 진행 합니다 발표자 아나톨리 박광희 학생 ... O 시 O 에 O , O 카이스트 B-Location 전산동 I-Location 1층 I-Location 세미나실 I-Location 에서 O 세미나 O 를 O 진행 O 합니다 O 발표자 O 는 O 김 B-Person 아나톨리 I-Person , O 박광희 B-Person 학생 O ... O 시 O 에 O , O 카이스트 B-Location 전산동 I-Location 1층 I-Location 세미나실 I-Location 에서 O 세미나 O 를 O 진행 O 합니다 O 발표자 O 는 O 김 B-Person 아나톨리 I-Person , O 박광희 B-Person 학생 O 안녕하세요, 금주 수요일 오후 2시~4시에, 카이스트 전산동 1층 세미나실에서 세미나를 진행합니다. CI LAB과 TC LAB 이 공동으로 주관하는 세미나이며, 지도교수님께서 참석하실 예정입니다. 석사과정학생들은 꼭 참석바랍니다. 발표자는 김 아나톨리, 박광희 학생이니 준비해주십시오. 문의사항은 박상원 학생에게 문의바랍니다. 감사합니다. Geographical coordiates , isHeldAt Meeting Location 카이스트 전산동 1층 세미나실 Speaker 김 아나톨리, 박광희 hasReference hasReference 김아나톨리 박광희 INPUT TEXT Named Entity Recognition Information Type Classification Semantics Recognition OUTPUT Tokenization Template Generation

6 2. Annotated Data Analysis

7 Annotated Data Contents included in Word file.

8 3. Reference for NER Tagging

9 Reference for NER tagging
[Lee et al. 2010] Named Entity Recognition with Structural SVMs and Pegasos algorithm state-of-the-art Korean NER Performance (F-measure): CRFs (84.99%), structural SVMs (85.14%), modified Pegasos (85.43%) Boundary tags : IBO2 model (B-I-O) Domain of Corpus: TV(2900:100 docs), Sports (3500: 100 docs) Features : Morpheme -2,-1,0,1,2 Suffix -2,-1,0,1,2 POStag -2,-1,0,1,2 POStag + length Position of Morpheme in Eojeol (Start /Center /End) NE dictionary (true or false) + length NE dictionary feature (index) + length 15 regular expressions : [A-Z]*, [0-9]*, [0-9][0-9], [0-9][0-9][0-9][0-9], [A-Za-z0-0]*, ---.

10 Reference for NER tagging
[Kim et al. 2008] Korean Named Entity Recognition Using Two-level Maximum Entropy Model POS tagging Noun-sequences extraction NE boundary recognition NE candidate selection (recognition) Boundary Tags : S : Start M : Middle E : End U : Uniterm NONE

11 Reference for NER tagging
[Seon et al. 2001] Korean Named Entity Recognition Using Machine Learning Methods and Pattern-Selection Rules Select target words using POS-tag and clue word dictionary Searches for target words in the NE dictionary Handles unknown words using MEM method with lexical sub-pattern information and a clue word dictionary Solves the ambiguity problem using NN. Convert adjacent words into NE tag using pattern selection rules

12 4. Baseline System

13 Baseline system [Min et al 2005] Information Extraction Using Context and Position Corpus : 245 meeting announcement Target : Attendee, Meeting Location, Time, Date Performance (F-measure) : Attendee : 36%, Meeting Location : 57%, Time : 92.5%, Date : 91% Method Sentence to LSP NE Recognition ME, NN, Pattern-selection Instance Disambiguation ML : Naive Bayes Score calculation


Download ppt "Extracting Schedule Information from Korean"

Similar presentations


Ads by Google