Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hive. Part of Hadoop Ecosystems MapReduce Runtime (Dist. Programming Framework) Hadoop Distributed File System (HDFS) Zookeeper (Coordination) Hbase (Column.

Similar presentations


Presentation on theme: "Hive. Part of Hadoop Ecosystems MapReduce Runtime (Dist. Programming Framework) Hadoop Distributed File System (HDFS) Zookeeper (Coordination) Hbase (Column."— Presentation transcript:

1 Hive

2 Part of Hadoop Ecosystems MapReduce Runtime (Dist. Programming Framework) Hadoop Distributed File System (HDFS) Zookeeper (Coordination) Hbase (Column NoSQL DB) Sqoop/Flume (Data integration) Oozie (Job Workflow & Scheduling) Pig/Hive (Analytical Language) Hue (Web Console) Mahout (Data Mining)

3 Data import Sqoop을 사용하여 mysql에 있는 movie 데이터를 HDFS상 에 올리기 위한 명령어. (테스트를 위한 용도.) movie table을 HDFS로 import movierating table을 HDFS로 import Sqoop import --connect jdbc:mysql://localhost/movielens --table movie --fields-terminated-by '\t‘ --username training –password training sqoop import --connect jdbc:mysql://localhost/movielens --table movierating --fields-terminated-by '\t' --username training --password training

4 Create Table & Load Data movie table 생성 movie data 를 movie table 에 load

5 Describe & select movie table 기본 구조 보기 movie table 전체 데이터 중에 5 개만 보기

6 Data movierating table movie table

7 Where

8 Join Join 을 위해 movie rating 를 위한 table 생성 과 로드.

9 Join 조인된 결과에서 movie 의 이름과 rating 을 5 개만 추출 select movie.name, movierating.rating from movie join movierating on (movie.id = movierating.movieid) limit 5; 각 무비에 대한 rating 의 평균을 구한다. select movie.name, avg(movierating.rating) from movie join movierating on (movie.id = movierating.movieid) group by movie.name limit 5; Rating 평균을 구한 것을 내림차순으로 정렬한다. select movie.name, avg(movierating.rating) c5 from movie join movierating on (movie.id = movierating.movieid) group by movie.name order by c5 desc limit 5; Movie table 과 movierating table 을 movieid 를 키로 하여 join 한다.


Download ppt "Hive. Part of Hadoop Ecosystems MapReduce Runtime (Dist. Programming Framework) Hadoop Distributed File System (HDFS) Zookeeper (Coordination) Hbase (Column."

Similar presentations


Ads by Google