Database Summarization Using Fuzzy ISA Hierarchies 전산학과 이도헌 박사 논문 발표 : 신명근
Table of Content 1. Introduction 2. Generalized tuples & Support Degree 3. Fuzzy Domain Knowledge 4. Summary Discovery Process 5. Example 6. Conclusion
Introduction Database Summary Database Summarization 임의의 database에서 찾아야 할 지식(knowledge) 사용자가 이해할 수 있는 정보의 요약(essence) Database Summarization A task that reduces a large number of actual database tuples into a relatively small number of generalized descriptions. i.e., generalized tuples
Example of Databse Summary Program usage log in UNIX Database Summary Program User 사용시간 Generalized tuple : < editor, 프로그래머, 자정근처 > 설명 : editor 프로그램을 프로그래머들 이 자정 근처에 많이 사용한다. vi 류형규 23:30 emacs 장준영 23:55 vi 이근우 00:30
Fuzzy ISA Hierarchy Crisp ISA Hierarchy Fuzzy ISA Hierarchy All programs All programs editor Documentation editor documentation 0.1 0.3 emacs vi word emacs vi word
Comparison of Crisp and Fuzzy ISA Hierarchies Computer Uasge Log Using Crisp ISA Hierarchies - emacs : 120 records - vi : 680 records - word : 25 records Editor usage = 120 + 680 = 800 Documentation usage = 120 + 680 + 25 = 825 Using Fuzzy ISA Hierarchies Editor usage = 120 + 680 = 800 Documentation usage = 120*0.1 + 680*0.3 + 25 = 241
Fuzzy set primer A fuzzy set f on a domain D is defined by its membership function uf(x), which assign a fraction number between 0 and 1. 예, On domain < emacs, vi, word > documentaion = { 0.1, 0.3, 1.0 } f1 is called a subset of f2 and denoted f1 ⊆ f2, iff ∀x ∈ D, uf1(x) ≤ uf2(x). w denotes a special fuzzy set such that ∀x ∈ D, uf(x) = 1
Definition: Generalized tuple An m-ary tuple <f1, ..., fm> on an attribute scheme (A1, ..., Am), where fj’s are suzzy sets and Aj’s are attributes
Example of Generalized Tuple Generalized Tuples <vi, 류형규> <emacs, 장준영> <editor, 프로그래머> Program User vi 류형규 emacs 장준영 Database의 모든 튜플은 generalized tuple이고, 두 개 이상 튜플의 특성을 총괄적으로 표현하는 것도 Generalized Tuple이다.
Definition : Support Degree The support degree of a generalized tuple g = <f1, ..., fm> with respect to a given collection of database tuples C whose attribute scheme is (A1, ..., Am), is defined as SD(g|C) = [∑ti ∈ C [uf1(ti.A1), ..., ufm(ti.Am)]]/|C| wehre is a t-norm operator, usually MIN. ufj(ti.Aj) denotes the membership degree of an attribute Aj of a tuple ti withrespect to a fuzzy set fj. |C| denotes the cardinality of the collection C
Support Degree Support Degree는 해당 generalized tuple을 지지하는 튜플의 비율이다. 일정 threshold 값 이상의 Support Degree를 가지는 Generalized tuples을 qualified generalized tuples 라고 한다. Database Summarization == qualified generalized tuples를 찾는 것
Example of Support Degree Program User Support <documentation, 사무원> vi 사무원1 (0.3, 1.0) = 0.3 word 사무원2 (1.0, 1.0) = 1.0 vi 류형규 (0.3, 0.0) = 0.0 SD(<documentation, 사무원> | C) = [ (0.3, 1.0) + (1.0, 1.0) ] / 3 = (0.3 + 1.0) / 3 = 0.43 = 43 % is a t-norm operator, 보통 MIN 을 사용
Fuzzy Set Hierarchy Fuzzy ISA Hierarchy ==> Fuzzy Set Hierarchy Fuzzy ISA hierarchy to be used directly. All programs All programs 변환 editor Documentation editor documentation 0.1 0.3 emacs vi word emacs vi word
Summary Discovery Process
Example Collection & Fuzzy Set Hierarchies
Example(1/3) 1. Initail start < -, - > 2. Specialization <engineering, -> <business, -> <game, -> <-, developer> <-, marketer> 0.833 0.24 0.09 0.833 0.411 3. Support Degree 계산 (t = 0.4)
Example(2/3) The 2’nd specialization of The 1’st specialization of generalized tuples The 1’st specialization of the root generalized tuple
The 3’rd specialization of Example(3/3) The 3’rd specialization of generalized tuple The final heiearchy of generalized tuples
결론 Proposed an Interactive top-down summary discovery process. Which utilizes fuzzy ISA hierarchies as the domain knowledge. Defined generalized tuples as a representational form of a database summary.
Future Work Fuzzy ISA Hierarchy 대신에 KAH를 이용한 database summarization 방법을 찾을 수 있다. KAH에 Fuzzy ISA Hierarchy의 개념을 응용한다.