Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Mining Lab March 4, 2016 Jae-Gil Lee, Associate Professor

Similar presentations


Presentation on theme: "Data Mining Lab March 4, 2016 Jae-Gil Lee, Associate Professor"— Presentation transcript:

1 Data Mining Lab March 4, 2016 Jae-Gil Lee, Associate Professor
Department of Knowledge Service Engineering KAIST

2

3 Brief Bio Areas of Interest: Big Data & Data Mining
Professional Experience PhD: KAIST (2005) KAIST, Dept. of Knowledge Service Engineering Associate Professor IBM Almaden Research Center Postdoctoral Research Staff University of Illinois at Urbana-Champaign, Dept. of Computer Science Postdoc Research Associate Dec ~ Present Sept ~ Nov. 2010 July 2006 ~ Aug. 2008

4

5 Knowledge Wisdom (지혜) Knowledge (지식) Data (데이터) Research Scope

6 Data Mining Extraction of interesting (non-trivial, implicit, previously unknown, and potentially useful) patterns or knowledge from huge amounts of data [W. Frawley] Confluence of multiple disciplines Statistics Pattern Recognition Databases Machine Learning Algorithms Data Mining

7 Data Mining Lab Initiated in 2011
Initiated in 2011 Consisting of 9 PhD and 6 master students (as of March 2016) Graduating 8 masters (as of March 2016) Working on various data mining methods for advanced data sets, more specifically, trajectory data and social network data

8 Data Mining Vision Scaling up algorithms to cope with Big Data
Trajectory data Social network data Improving the knowledge quality by combining multiple data sources Modeling the human behaviors precisely from the human activity data

9 Trajectory Data A trajectory is a sequence of the location and timestamp of a moving object Hurricanes Turtles Vessels Vehicles

10 (Social) Network Data A social network (e.g., Facebook, Twitter) is usually modeled as a graph A node → an actor An edge → a relationship or an interaction

11 Research Interests High-performance data mining for big data [BigComp14 (Best Paper), VLDB14] Mobility pattern mining from large-scale trajectory data [IEEE TKDE11, ACM TIST11, IEEE TKDE15] Community detection from complex (e.g., multi-layer) networks [ASONAM12, SIGMOD Record, ACM TIST16] Expertise finding in social networks and Q&A services [AAAI ICWSM13 (Best Paper), AAAI ICWSM14] Theoretical foundation on community detection [IEEE ICDE14, IEEE ICDE16] Data mining for emerging platforms and services [IEEE ICDE 16]

12 동아일보 2015년 5월 11일 기사 초기 단계지만 … “당신이 어느 매장으로 갈지 다 알아요” [유통+기술, R테크의 시대로] <2> 위치정보-빅데이터 활용 2054년 미국, 사람들이 상점에 들어갈 때마다 전자 광고판이 제각각 다른 상품을 권한다. 홍채로 소비자를 인식한 뒤, 그의 취향과 라이프스타일에 맞춰 그가 살 만한 상품을 추천해 주는 것이다. 영화 ‘마이너리티 리포트’에 등장하는 이 장면은 2015년 현재에도 이미 기술적으로 가능하다는 것이 전문가들의 설명이다. 영화에서는 홍채로 개별 소비자를 인식하지만, 지금은 스마트폰의 와이파이나 블루투스 기능으로 고객을 구별한다는 점이 다를 뿐이다.

13 Courses KSE525: Data Mining and Knowledge Discovery (Spring semester each year)  open in this semester KSE526: Analytical Methodologies for Big Data (Fall semester) KSE625: Data Mining for Social Networks (Fall semester)

14 On-Going Projects Interactive Analysis of Spatial Big Data (Spatial Big Data) ⇒ up to 4 years left Big Data Mining for Social Networks (Social Network)⇒ up to 2 years left Data Mining on Mobile Devices (Smart Cloudlet) ⇒ up to 2 years left Self-Evolving Knowledge Base (ExoBrain) ⇒ 1 year left

15 Spatial Big Data The goal is to develop an interactive analytics platform for spatial big data Focusing on real-time spatial data, especially from smartphones and sensors (Internet of Things) We are developing two core engines based on open source software Complex event processing Online analytical processing Funded by Ministry of Land, Infrastructure and Transport (국토교통부)

16 Spatial Big Data (cont’d)
Scheduler (Data Flow Management, Queuing) Various Real-Time Streaming Data Input Data Manager Distributed System (Storm Cluster) Kafka Data Source 1 Filter S-CEP S-CEP Data Source 2 JSON, GeoJSON File Format Spout Filter S-CEP Data Source … User Interface (5th Year) Aggregator Data Source N Spout S-CEP S-CEP Delivery Spout S-CEP S-CEP 범례 : Bolt Spout : 데이터 소스 Bolt : 데이터 처리 유닛 S-CEP : 공간 CEP Storage (Redis)

17 Spatial Big Data (cont’d)
Truck real-time monitoring service Real-time Vehicle Event Stream (위치, 속도, 가속도, 축하중, 타이어 압력 등) Real-time Traffic Flow Event Stream (교통량, 교통 사고 정보 등) Real-time Weather Event Stream (기온, 강수, 안개 여부, 습도 등) Scheduler Input Distributed Computation Spatially Enabled Complex Event Processing Real-time Freight Event Stream (수하물 종류, 수하물 위험도, 수하물 보관상태 등)

18 Smart Cloudlet The goal is to develop a distributed, parallel data analysis platform on mobile devices Hadoop on mobile devices and interface for data mining algorithms Focusing on similar photo retrieval as well as k-means, k-NN algorithms Funded by Institute for Information & Communication Technology Promotion (정보통신진흥원) Analysis Request Sub-Task Results Mobile Device Sub-Task and Data Data Generation (e.g., Photos)

19 Smart Cloudlet (cont’d)

20 Thank You! Any Questions?


Download ppt "Data Mining Lab March 4, 2016 Jae-Gil Lee, Associate Professor"

Similar presentations


Ads by Google