Download presentation
Presentation is loading. Please wait.
1
Data Warehouse 구축 (Star Schema)
2
Drill across process of linking tow or more fact tables at the same granularity dimension table공유
3
Star Schema : Time dimension
시간차원의 중요성 시간차원이 없는 질의는 meaningless! What are the sales volumes for all products available in store 52? “for what period of time” Time의 속성 day of week, week of month, work day, weekend, holiday, season, fiscal period
4
Star Schema : Space dimension
Marketing 정책 수립에 유용한 분석 보고서 작성 “What is the average driving distance for all customers who made a purchase at store 52 during the Christmas holiday season?” “What is the average income and family size of the neighborhoods where my customers exists?” effectively expand analysis domain Space 차원의 속성 위/경도, street address, street block, city quadrant, zip code, street, city, country, state, demographic information household income, education, family size, home value, etc customer address ~ store(school)간 driving time
5
Star Schema : Space dimension
Spatial industry petroleum, telecommunications, government agencies 질의 예 “What is all those customers who bought from stores 1 & 2 during the Christmas holiday season?” (result) All customers who drove 3 minutes or less to a store in Tampa during holiday season “See the market for all stores in the surrounding area presented in concentric rings of 1, 2, & 3 miles” “See total potential customers who drive time to each of your store in the Tampa area is 4, 8, or 12 minutes
6
Star Schema : Space dimension
4개 상점위치에서 시장조사 2개 상점에서 고객들의 운행거리
7
Star Schema : Space dimension
Time Dimension date_id customer_id product_id store_id customer_geo_id item_price item_qty item_cost product_id product_code description unit_price unit_cost selling_unit stocking_unit life_stamp ... customer_id customer_code customer_name address1 address2 city state zip telephone life_stamp * customer_geo_code customer_geo_id customer_geo_code zip_code census_track block_group country state avg_income avg_house_size avg_age ...
8
Star Schema : Space dimension
Target Mailing 에 응용 예 for tuning AD. expenditure targeting upper-middle class, middle-aged males “What was our total sales revenue from customers who bought from store 52 during Christmas season and who live in neighborhood earning an average of $50,000 per year and are more than 30 years old?”
9
Star Schema : Extensibility
Causal dimension advisory dimension that should not change the fundamental grain of a table 어떤 event가 일어나게 된 이유를 설명 promotion, store condition “Was my promotion profitable?”
10
New causal dimension Causal_key Condition_name Price_treatment_type
Price_discount Ad_type Ad_media_name Ad_size Display_type Display_provider Display_size
11
Star Schema : Extensibility
Building the fact table at a granular level 만약, 주단위로 요약된 fact table의 경우, monthly data로 확장할 수는 없다.
12
Star Schema : Helper In HealthCare billing Helper Table
grain : individual line item on a doctor bill 한 환자가 여러 개의 진단명을 가질 수 있음. choose one value (“primary” diagnosis) and omit the other values Diagnosis data의 유용성이 떨어짐 create a fixed number of additional Diagnosis dimension slots in the fact table key Helper Table weighting factor : 한 grain을 형성하는 각 요소의 중요도 할당 예) 3개의 진단명이 있는 경우, 각 weighting factor는 1/3
13
Star Schema : Helper
14
Star Schema :Factless Fact Tables
단지, dimension table과 link되는 multi-part key만을 가지고 있음 유형 I : table that record event (event-tracking table) 예) fact table for recording student attendance on a daily basis at a college grain : individual student attendance event “Which classes were the most heavily attended?” “Which classes were the most consistently attended?” “Which teachers taught the most students?” “Which teachers taught classes in facilities belonging to other departments?” “What was the average total walking distance of a student in a given day?”
15
Star Schema :Factless Fact Tables
16
Star Schema :Factless Fact Tables
유형 II : coverage table 일어나지 않은 사건에 대한 질의를 수행 “Which products were on promotion that didn’t sell?” fact table의 내용이 sparse할 때 사용 예) coverage table for each product in each store that is on promotion in each time period (9609)
17
Star Schema :Factless Fact Tables
18
Star Schema : Slowly Changing Dimension
Example 품목 설명이 종종 바뀌는 ‘품목’ 차원 테이블 처리 방법 Overwriting 변경된 dimension 속성의 과거 이력을 보존하지 못함. 그래서, 이전 value가 중요하지 않은 경우에 사용 Creating Another Dimension Record Creating Current Value Field
19
Star Schema : Slowly Changing Dimension
Creating Another Dimension Record generalize the key to changed dimension 예) primary key + version digit product dimension의 경우, SKU#+01, SKU#+02 fact table의 레코드들을 dimension attribute의 history에 따라 partition이 가능 과거 이력을 수직적으로 보존 단점 dimension key의 generalization dimension table의 크기(레코드 수) 증가 Creating a Current Value Field “current value” 필드를 추가 과거 이력을 수평적으로 보존
20
Star Schema : Slowly Changing Dimension
*예) Pkg_type변동 glued box => pasted box SKU#-01 => SKU#-02
21
Star Schema : Big Dimension
millions or tens of millions records 예) customer dimension 개개인(고객)의 정보를 기록 압축, 요약이 힘들고, 다른 차원 테이블에 비해 변동이 심함. MiniDimension의 생성 어느 정도 static한 dimension과 계속적으로 변하는 dimension을 분리 예) customer dimension의 경우에 demographic minidimension 계속 변화하는 demographic measure (income, purchase_behavior)가 일정 간격을 가지는 구획값으로 변형 demographic dimension attribute의 모든 가능한 값의 조합을 결정하여 키값을 생성
22
Demographic Dimension
demographics_key age_level income_level marital_status sex purchase_behavior Sales Fact time_key demographics_key customer_key product_key promotion_key sales_units sales_dollars Customer Dimension customer_key first_name last_name street_address city state zip demographics_key
23
Star Schema : MiniDimension
minimize the joins necessary thought the fact table when you want to retrieve spatial data for customers without fact aggregation Big Dimension
24
Star Schema : MiniDimension
Advantages support frequent snapshotting of customer profiles with no increase in data storage or data complexity as you increase the number of snapshots demographic dimension itself cannot be allowed to grow too large Disadvantages browsing performance 저하 demographics data can only be browsed along with the more constant customer data by liinking through the fact table if fact table is empty, (아무런 event가 발생하지 않음) cannot link the demographics to the customer dummy sales event 삽입
25
Star Schema : Voyage 여행관련 분석질의
How many trips are there in a typical overall tickets? Between which trip city pairs do we have the most customer satisfaction problem? cf) telephone cable
26
Star Schema 예 : Voyage
27
Star Schema 예 : Insurance
Transaction, Claim Processing
28
Star Schema : 인적자원 관리
29
Star Schema : 인적자원 관리 분석질의
report summary statuses of the entire employee base on a regular (monthly) basis # of employees, total salary paid during the month cumulative salary paid this year cumulative vacation days taken, vacation days accrued, number of new hires, # of promotions profile the employee population at any precise instant in time how many employees we have what their detailed profiles were on that date monitoring every action taken on a given employee transaction sequence/time
Similar presentations