논문을 위한 통계 회귀분석 (regression) 하성욱 한성대학교 대학원.

논문을 위한 통계 회귀분석 (regression) 하성욱 한성대학교 대학원

Regression I 회귀분석 (regression)
한 개 또는 그 이상의 독립변수들과 한 개의 종속변수 간의 관계를 파악하는 기법 단순선형 회귀분석(simple linear regression) 독립변수가 하나인 선형의 회귀분석 다중선형 회귀분석(multiple linear regression) 독립변수가 여러 개인 선형의 회귀분석

Regression II 단순선형 회귀분석(simple linear regression)
단순선형 회귀분석의 회귀계수(β, beta)는 기울기(slope)를 의미함 상관계수(Pearson r)는 예측의 정확성(prediction accuracy)을 의미함 y x y x 낮은 정의 상관관계 (low positive correlation) 높은 정의 상관관계 (high positive correlation)

Regression III 단순선형 회귀분석의 회귀계수(β, beta)는 기울기(slope)
상관계수(Pearson r)는 예측의 정확성(prediction accuracy) β = ∑i=1n (xi - )(yi - ) x y ∑i=1n (xi - )2 rxy = Cov(x, y) STD(x) * STD(y) = Sxy Sy Sx ∑i=1n (xi - )(yi - ) x y √ ∑i=1n (xi - )2 √ ∑i=1n (yi - )2

Regression IV 단순선형 회귀분석(simple linear regression) 실습
<노인만족도.sav> 파일을 이용하여 실습 상관계수(SPSS에서 R로 표시) 비표준화된 회귀계수(SPSS에서 B로 표시) 표준화된 회귀계수(SPSS에서 β로 표시) T-value = 비표준화된 회귀계수/ 회귀계수의 표준오차 다중선형 회귀분석(multiple linear regression) 실습 설명력 R2 수정된 설명력 Adjusted R2

Regression V OLS (ordinary least square; 최소자승법) OLS Assumptions
Yi = β0 + β1 Xi + εi X와 Y는 선형종속관계이다 설명변수 X는 비확률변수이다 εi ~ N(μi , σi2) E(εi) = μi = 0 Var(εi) = σi2 = σ2 [homoskedasticity]  [Heteroskedasticity] Cov(εi, εj) = 0 (for i ≠ j) [Independent error term]  [Autocorrelation]

Regression VI Normality (잔차 정규성) SPSS, 기술통계량-데이터 탐색 Graphical Methods
Shapiro-Wilk Test or Kolmogorov-Smirnov Test 특히, 표본 수가 작을 때 H0 : normal Graphical Methods 표본 수가 크면 정규성 기각이 잘 됨 e.g. Q-Q Plot (quantile-quantile plot)

Regression VII Heteroskedasticity (이분산성)
Heteroskedasticity does not cause OLS coefficient estimates to be biased. However, the variance of the coefficients tends to be underestimated, inflating t-scores and sometimes making insignificant variables appear to be statistically significant. SAS, 근사 공분산행렬 SAS, 동분산성에 대한 Chi-square test H0 : Homoskedasticity

Regression VIII Autocorrelation (잔차들간의 상관계수)
Durbin-Watson Statistic ≒ 2 * (1-ρ), -1 < ρ < 1 While it does not bias the OLS coefficient estimates, the standard errors tend to be underestimated (and the t-scores overestimated) when the autocorrelations of the errors at low lags are positive H0 : No Autocorrelation 0 < DW < 4 dl, du는 통계표가 따로 있음 (Savin and White, 1977) 4-dl < DW < : 음(-)의 연속상관 4-du < DW < 4-dl : 해석하기 힘듦 2 < DW < 4-du : 연속상관 없음 (0) du < DW < : 연속상관 없음 (0) dl < DW < du : 해석하기 힘듦 0 < DW < dl : 양(+)의 연속상관

Regression IX Multicollinearity (독립변수간 높은 상관계수)
A statistical problem that arises when correlations between independent variables are extremely high, producing large standard errors of regression coefficients (beta, β) and unstable coefficients (Venkatraman, 1989). (1) Variance Inflation => Low t-value c.f. T-value = [ β – hypothesized β ] / s.e.(β) (2) Large p-value of β (3) can not reject H0: β=0 전형적인 현상 높은 상관계수의 변수들 다수: 여러 독립변수 중 특정 독립변수만 유의 회귀계수의 불안정성

Regression X Multicollinearity를 확인하는 네 가지 지표 해결방안 Tolerance, 0.1 이하
Variance Inflation Factor (VIF), 10 이상 Eigenvalue, 0.01 이하 Condition Index, 100 이상 해결방안 여러 독립변수들 중에서 일부 제거 표본의 추가 확보 경우에 따라 평균변환변수(mean-centered variable) 사용

Control Variable Concept of control variable In statistical term
Predictor of Dependent Variable But, No interest in this study Show distinctively the impact of independent variables e.g. 친절과 만족도 연구, 인테리어의 수준을 통제 In statistical term Yi = b0 + bs CVsi + ei Yi = b0 + bs CVsi + b1 Xi + b2 Zi + ei No interpretation on CVs

Dummy Regression I 명목척도 (nominal), 서열척도 (ordinal) 등의 질적 측정(nonmetric)의 변수를 독립변수로 하는 회귀분석 e.g. 계절별 에어컨 판매량 차이 (c.f. ANOVA-Duncan test) 계절변수 (1, 2, 3, 4)를 3개의 더미변수로 변환 ‘겨울’을 기준으로 다른 계절에 겨울 보다 많이 팔리는 정도 Yi = b0 + b1 D1i + b2 D2i + b3 D3i + ei 원 변수 Dummy 변수로 변환 Season Dummy 1 (봄) Dummy 2 (여름) Dummy 3 (가을) 봄 (1) 1 여름 (2) 1 가을 (3) 1 겨울 (4)

Dummy Regression II Yi = b0 + b1 D1i + b2 D2i + b3 D3i + ei
N개 범주(category)를 가지는 원변수는 (N-1)개의 Dummy 변수로 변환 가능 <노인만족도.sav> 파일을 이용하여 실습 단순선형 회귀분석(simple linear regression) 실습에 dummy variable 추가 e.g. 결혼 여부, 사별 여부, 여자 여부

Quadratic Regression I
2차 방정식: Y = a x2 + b x + c a >0 Y a =0 a <0 - b/2a X

Quadratic Regression II
2차 방정식: Y = a x2 + b x + c a >0, b>0 Y a >0, b<0 a =0, b>0 a =0, b<0 a <0, b<0 a <0, b>0 X

Quadratic Regression III
Yi = b0 + b1 xi + b2 xi2 + ei b2 = 0, Linear relationship b1 > 0, Positive relationship b1 < 0, Negative relationship b1 = 0, No relationship b2 > 0, b1 < 0, U-shaped relationship b1  0, xi >0인 영역에서 positive relationship과 유사 b2 < 0, b1 > 0, Reverse U-shaped relationship b1 ≤ 0, xi >0인 영역에서 negative relationship과 유사

Quadratic Regression IV
Model Comparison Original model Yi = b0 + b1 Xi + b2 Xi2 + ei Mean-centered model Let mcXi = Xi – [ New Model ] Yi = mcb0 + mcb1 mcXi + mcb2 mcXi2 + mcei Yi = b0 + b1 Xi + b2 Xi2 + ei [ Original Model ] Yi = b0 + b1 (mcXi+ ) + b2 (mcXi+ )2 + ei Yi = b0+b1 +b (b1+2b2 )mcXi + b2 mcXi2 + ei i.e. mcb0 = b0+b1 +b2 2 ; mcb1 = b1+2b2 ; mcb2 = b2 X X X X X X X X X

Quadratic Regression V
Yi = mcb0 + mcb1 xi + mcb2 xi2 + mcei mcb2 = 0, Linear relationship mcb1 > 0, Positive relationship mcb1 < 0, Negative relationship mcb1 = 0, No relationship mcb2 > 0, mcb1 ≤ 0, U-shaped relationship mcb1 > 0, U-shaped or positive relationship mcb2 < 0, mcb1 ≥ 0, Reverse U-shaped relationship mcb1 < 0, reverse U-shaped or positive relationship <SP-5 EE_data.xls>로 실습

논문을 위한 통계 회귀분석 (regression): 매개효과(mediation) 하성욱 한성대학교 대학원

Mediation I Mediation 전체 효과: c 매개 모형 스트레스 (X) 공격성 (Y) 무가치감 (M)
Yi = k0 + c Xi + ei 매개 모형 Mi = k1 + a Xi + e1i Yi = k2 + c’ Xi + b Mi + e2i i.e. Yi = k2 + bk1 + c’ Xi + ab Xi + b e1i + e2i c 스트레스 (X) 공격성 (Y) 무가치감 (M) a b 간접(매개)효과: ab c’ 스트레스 (X) 공격성 (Y) 직접효과

Mediation II 전체효과 = 직접효과 + 간접효과(매개효과) 매개효과의 존재 c = c’ + ab
ab : significant 완전 매개효과 (full mediation) & c’ : not significant 부분 매개효과 (partial mediation) & c’ : significant

Mediation III Baron & Kenny (1986)의 매개효과 검증
단계 1 : c가 유의 (i.e. X => Y ) 단계 2 : a가 유의 (i.e. X => M ) 단계 3 : X의 효과를 통제하고 b가 유의 (i.e. M => Y ) 단계 4 : M의 효과를 통제한 단계3의 회귀모형에서 c’가 안 유의하면 완전매개 : 간접효과만 존재 c’가 유의하면 부분매개 : 간접효과 & 직접효과 존재 <SP-5 Mediation.sav>으로 실습

Mediation IV Baron & Kenny (1986)의 한계점 단계 1 c가 안 유의해도 매개효과 가능함
c’ 와 ab의 부호가 반대인 경우: i.e. 억제변수 (suppressor) X : 종업원 지능 Y : 조립작업의 실수의 수 M : 조립작업에 대한 싫증 정도 [ X=>M ] a > 0, [ M=>Y ] b > 0, [ X=>Y ] c’ < 0 매개효과의 크기와 매개효과의 표준오차 계산이 불가능함 매개효과의 크기: ab 매개효과의 표준오차: √ (b2 sa2 + a2 sb2) Var (f) = D’VD를 활용하여 도출됨 Sobel test: Z = ab / √ (b2 sa2 + a2 sb2) 매개효과가 정규분포를 따른다는 가정하에서

Mediation V 매개효과를 포함하는 연구모형의 예

Mediation VI 매개효과를 포함하는 연구모형의 회귀분석 결과 단계 1단계 2단계 3단계 종속변수 이직의도 소진
비표준화 계수 표준화 계수 t-값 통제변수 ... OCB -.480 (.102) -.259 -4.718 *** .312 (.085) .201 3.666 -.560 (.101) -.303 -5.518 .258 (.066) .216 3.929 Adj. R2 .135 .137 .172 F-값 10.991 11.150 12.150

Mediation VII 심덕섭·양동민·하성욱 (2010)

Mediation VIII

Mediation IX 복잡한 매개효과의 분해 <SP-5 SHRM Mediation.sav>으로 실습 독립변수
전체효과 (ci) 직접효과 (ci') 매개효과 합계 (∑j aijbj) 의미 (ai1b1) 역량 (ai2b2) 자기결정 (ai3b3) 영향력 (ai4b4) 직무특성 (i=1) .249*** .167*** .083* .016 .043* -.010 .034* 통제위치 (i=2) .351*** .214*** .136** .023 .087** -.008 .034✝ LMX (i=3) .109✝ .020 .089** .012 -.002 .045*

논문을 위한 통계 회귀분석 (regression): 조절효과(moderation) 하성욱 한성대학교 대학원

Moderation I The impact that a predictor (independent) variable has on a criterion (dependent) variable is dependent on the level of a third variable (moderator). Criterion-specific: Fit definition means the influence to Y Interaction term: Independent * Moderator Moderated Regression Analysis Yi = b0 + b1 Xi + b2 Zi + b3 Xi*Zi + ei Yi = b0 + (b1 + b3 Zi )*Xi + b2 Zi + ei Research Model Appearance of Employees (Z) Moderator b2 Independent Dependent b3 Zi Kindness of Employees (X) Sales Volume of Store (Y) b1

Moderation II Sales Kindness 연구결과의 예 Appearance: H Appearance: M
Appearance: L Kindness

Moderation III : Mean-centering I
Not raw score, But the mean-centered score of each variable (Venkatraman, 1989) Mean-centering (평균 중심화) mcXi = Xi – mcZi = Zi – Multicollinearity issue (다중공선성) With X and Z bivariate normal, Corr (mcXmcZ, mcX) = 0 Interpretation issue Yi = b0 + b1 Xi + b2 Zi + b3 Xi*Zi + ei Yi = b0 + (b1 + b3 Zi )*Xi + b2 Zi + ei b1 means the impact of X on Y when Z=0 b2 means the impact of Z on Y when X=0 If Z doesn’t have zero value, b1 doesn’t have meaning. If X doesn’t have zero value, b2 doesn’t have meaning. mcb1 means the impact of X on Y when Z= mcb2 means the impact of Z on Y when X= X Z Z X

Moderation IV : Mean-centering II
Model Comparison Original model Yi = b0 + b1 Xi + b2 Zi + b3 Xi*Zi + ei Yi = b0 + (b1 + b3 Zi )*Xi + b2 Zi + ei Mean-centered model Let mcXi = Xi – Let mcZi = Zi – [ New Model ] Yi = mcb1 mcXi + mcb2 mcZi + mcb3 mcXi*mcZi + mcb0 + mcei Yi = b1 Xi + b2 Zi + b3 Xi*Zi + b0 + ei [ Original Model ] Yi = b1(mcXi+ ) + b2(mcZi+ ) + b3(mcXi+ )*(mcZi+ )+ b0 + ei Yi = (b1+b3 )*mcXi + (b2+b3 )*mcZi + b3 mcXi*mcZi (b0 + b1 + b2 + b3 ) + ei i.e. mcb1 = b1+b3 ; mcb2 = b2+b3 ; mcb3 = b3 X Z X Z X Z Z X X Z X Z Z X

Moderation V Moderated Regression Analysis Procedure
Step1 : 모든 변수들을 평균 중심화를 수행함 Step2 : 이 변수들로 계층적 회귀분석 (hierarchical regression) Model 1 : Y = f ( mcCVs, mcX, mcZ ) Model 2 : Y = f ( mcCVs, mcX, mcZ, mcX*mcZ ) Step3 : 비표준화된 계수를 검토 통계패키지가 계산하는 mcb1과 mcb2의 표준화된 계수는 정확하나 mcb3의 표준화된 계수는 정확한 값이 아님 Step4 : 단순회귀선 (simple regression line) 그리기 특정한 Z 값에 대해 도출된 회귀선을 그리기 ŶH, ZH (i.e. mcZ = + 1 ZSTD ), 84 percentile ŶM, ZM (i.e. mcZ = 0 = Z - ) ŶL, ZL (i.e. mcZ = - 1 ZSTD ), 16 percentile 중심화 여부에 무관하게 단순회귀선의 기울기는 같고 절편은 다름 Z

Moderation VI Step5 : 단순회귀선의 기울기 검정 Step6 : 조절효과 존재 시 교차점 계산
H0 : [ mcb1 + mcb3*mcZ ] = 0 Var (f) = D’VD 활용하여 t-value를 직접 계산하기 통계패키지 활용하여 t-value 계산하기 (step1) Zk = mcZ – k (k만큼 더 이동) (step2) mcX, Zk, mcX*Zk로 회귀분석 실시 (step3) mcX의 t-value Step6 : 조절효과 존재 시 교차점 계산 ŶH = ŶL mcXcross = - mcb2 / mcb3 mcXcross 와 Xcross 는 값이 다름 단순회귀식에서 절편 값이 서로 다르기 때문임

Moderation VII <SP-5 Moderation.sav> Raw variables 비표준화계수 표준오차
유의확률 상수 5.252 .272 19.289 .000 Kindness .224 .032 .167 7.057 Appearance .102 .026 .092 3.893 Raw variables 비표준화계수 표준오차 표준화계수 t 유의확률 상수 6.238 .561 .000 Kindness -.046 .138 .166 6.984 Appearance -.004 .059 .092 3.892 K*A .029 .014 .048 2.009 .045 <SP-5 Moderation.sav> Mean-Centering 비표준화계수 표준오차 표준화계수 t 유의확률 상수 7.015 .056 .000 mc_Kindness .222 .032 .166 6.984 mc_Appearance .102 .026 .092 3.892 mc_K*mc_A .029 .014 .048 2.009 .045

Moderation VIII Sales Kindness 단순회귀선(simple regression line)
<SP-5 EE_data.xls>

Moderation IX nonmetric moderator Dummy variables로 변환
N개 범주 (category)를 가지는 원변수는 (N-1)개의 Dummy variables로 변환 가능 e.g. 전공에 따라 학점이 연봉에 미치는 영향 분석 Yi : 연봉 전공 dummy 인문학 전공 기준 D1i : 공학 전공 dummy D2i : 경영학 전공 dummy Xi : 학점 Yi = b0 + b1 D1i + b2 D2i + b3 Xi + b4 D1i Xi + b5 D2i Xi + ei 분석과정 Dummy는 평균 중심화 할 필요 없고 평균 중심화 한 mcXi를 사용하여 MRA의 분석과정을 적용

Moderation X : Subgroup Analysis
Nonmetric moderator or Alterative to MRA Fisher’s Z-test (or T-statistic or Chi-squared statistic) Z = ½ ln [ (1+r1)/(1-r1) ] - ½ ln [ (1+r2)/(1-r2) ] √ [ 1/(n1-1) + 1/(n2-1) ] ni : i 그룹 표본수, ri : i 그룹 상관계수 Low Appearance Group (n1=24) High Appearance Group (n2=18) Z-test Correlation b/w Kindness of Employees & Sales Volume of Store 0.316 0.816 *** -2.56 ** 주) + : p<0.1; * : p<0.05; ** : p<0.01; *** : p<0.001

Multicollinearity Issues
Mean Centering: X 대신 mcX [ i.e. X-mean(X) ]로 추정 Reduction of correlation With X and Z bivariate normal, Corr (mcXmcZ, mcX) = 0 Fit regression: Moderation Z 대신 mcZ [ i.e. Z-mean(Z) ]로 추정 X*Z 대신 mcX*mcZ [ i.e. (X-mean(X))*(Z-mean(Z)) ]로 추정 Quadratic regression X2 대신 mcX2 [ i.e. (X-mean(X))2 ]로 추정

Matching I The match between the one independent variable and the other independent variable Match b/w Environment & Organizational Structure (Burns and Stalker, 1961) Negative Absolute Difference Misfit Fit Uncertain Environmental Uncertainty Fit Fit Misfit Stable Mechanic Organic NAD = 0 Organizational Structure

Matching II Criterion-Free: Fit definition is independent of Y
Negative absolute difference term: - | EU – OS | Not raw score, But the standardized score of each variable (Bourgeois, 1985; Venkatraman, 1989) Research Model Deviation Score Analysis Performancei = b0 + b1 EUi + b2 OSi - b3 | EUi - OSi |+ ei Environmental Uncertainty b1 Dependent b3 Firm Performance Organizational Structure b2 <SP-5 EE_data.xls>

Fit effect Concept of Fit
Contingency, Match, Consistency, Fit, Congruence, Coalignment (Venkatraman, 1989) Fit as profile deviation Fit as Gestalts Low Many (e.g. cluster analysis) Fit as Mediation Fit as Covariation The # of Variables Involved Precise Functional Form (e.g. factor analysis) Fit as Moderation Fit as Matching High Few Criterion-specific Criterion-free Fitness anchoring a particular criterion (e.g. effectiveness)

Other Fit Concepts I Gestalts Miles and Snow (1978)’s Strategy Type
Prospectors, Analyzers, Defenders, Reactors By Cluster Analysis i.e. Distance between samples

Other Fit Concepts II Profile Deviation Deviation from Ideal Profile

Other Fit Concepts III Covariation: Internal Consistency
Second-order, Confirmatory Factor Analysis

Questions Regression Mediation Moderation Matching Fit effect
Simple linear regression, Multiple linear regression OLS Assumptions: Normality, Heteroskedasticity, Autocorrelation, Multicollinearity Control Variable Dummy Regression Quadratic Regression Mediation Baron & Kenny (1986), Sobel test Moderation Moderated regression analysis (MRA) Nonmetric moderator, Subgroup analysis Matching Fit effect

논문을 위한 통계 회귀분석 (regression) 하성욱 한성대학교 대학원.

Similar presentations

Presentation on theme: "논문을 위한 통계 회귀분석 (regression) 하성욱 한성대학교 대학원."— Presentation transcript:

Similar presentations

About project

지원

로그인

Auth with social network:

논문을 위한 통계 회귀분석 (regression) 하성욱 한성대학교 대학원.

Similar presentations

Presentation on theme: "논문을 위한 통계 회귀분석 (regression) 하성욱 한성대학교 대학원."— Presentation transcript:

Similar presentations

About project

지원