Presentation is loading. Please wait.

Presentation is loading. Please wait.

Mean and Variance.

Similar presentations


Presentation on theme: "Mean and Variance."— Presentation transcript:

1 Mean and Variance

2 Distribution ?

3

4 statistics pop’n dist’n dist’n of a sample (sample) statistic (population) parameter

5 pop’n dist’n dist’n of a sample X %freq Head 1 0.5 Tail Total 1.0 X
Total 1.0 dist’n of a sample X freq %freq Head 1 20 0.4 Tail 30 0.6 Total 50 1.0 X %freq Head 1 0.35 Tail 0.65 Total 1.0

6 Y freq %freq 1 10 0.1 2 20 0.2 3 4 5 6 Total 100 1.0 Y %freq 1 1/6 2 3 4 5 6 Total 1.0

7 A new variable X from mseg of credit card data
mseg X Low Spender Med Low Spender 2 Average Spender 3 Med High Spender 4 High Spender

8 ? Variable X of credit card data X freq %freq 1 26 0.26 2 20 0.20 3 11
0.11 4 25 0.25 5 18 0.18 Total 100 1.00 X %freq 1 ? 2 3 4 5 Total 1.00

9 Mean, Mode Median (truncated, winsorized) Mean Measure for location (center)

10 Mean

11 Median

12 50% 50% Median

13 Mode

14

15 Hit/Stop Burst

16 Dealer's hidden card ?

17 2 - 9 1,11 10

18 Outlier

19 Truncated mean / Winsorized mean
5 6 6 4 Truncated mean / Winsorized mean

20 Truncated mean / Winsorized mean
5 6 6 4 6 4 5 1 9 6 4 5 6 4 5 Truncated mean / Winsorized mean

21 Quartiles 25% 75% 50% 50% 75% 25% Q Q Q 25 percentile 50 percentile
1 Q 2 Q 3 25 percentile 50 percentile 75 percentile Median Quartiles

22 Wrong housing statistics make wrong real estate policy.
While median is better statistic than mean in representing house prices, Korean government publishes statistics calculated by mean on house prices. Mean price can be distorted by just one or two extreme prices. 일러스트=유재일 기자 빗나간 주택통계 부동산 정책도 헛발질 한국의 PIR은 주택의 평균 가격과 도시근로자의 평균 가계소득을 기준으로 계산한다. 반면 미국의 PIR은 미디언 가격(MEDIAN PRICE·중간가격)과 미디언 소득을 기준으로 한다. 미디언 가격은 그 지역에서 거래된 가장 가격이 싼 주택에서부터 가장 비싼 주택을 일렬로 늘어 놓은 뒤 그 중간치를 선택한다. 건설산업전략연구소 김선덕 소장은 “평균가격이나 평균소득은 고가의 주택이나 엄청난 고소득자가 일부 포함되면 통계가 왜곡될 수 있다”고 말했다. 더군다나 한국의 주택가격은 호가(呼價)이고 미국의 주택가격은 실거래가를 기준으로 한다. 차학봉 기자 , 입력 : :31

23 p% (100-p)% p-th percentile percentile

24 Range InterQuartile Range (IQR) Variance Standart Deviation Measure for variability

25 Range

26

27 variance, standard deviation

28 Mean (Y) = 1*(1/6) + 2*(1/6) + ... + 6*(1/6) = 3.5
freq %freq 1 10 0.1 2 20 0.2 3 4 5 6 Total 100 1.0 Y %freq 1 1/6 2 3 4 5 6 Total 1.0 Mean (Y) = 1* * * *0.2 = 3.8 Mean (Y) = 1*(1/6) + 2*(1/6) *(1/6) = 3.5

29 Mean of X Mean (X) = 1*0.26 + 2*0.20 + 3*0.11 + 4*0.25 + 5*0.18 = 2.89
X freq %freq Low Spender Med Low Spender Average Spender Med High Spender High Spender Total Mean (X) = 1* * * * *0.18 = 2.89

30

31

32 A new variable Q = (X – 3)2 X Q %freq Low Spender (-2) Med Low Spender 2 (-1) Average Spender Med High Spender High Spender Total Mean (Q) = (-2)2* (-1)2* * * *0.18

33 Let ,

34 Distribution of a sample

35 Sample mean

36 (O) Sample variance

37 For large n, large enough

38

39 Standard deviation

40 V = (X – 2.89 )2 X V freq Low Spender (1-2.89) Med Low Spender 2 (2-2.89)2 20 Average Spender (3-2.89) Med High Spender (4-2.89) High Spender (5-2.89) Total Var*(X)= (1/99)[(1-2.89)2*26 + …+ (5-2.89)2*18] = 2.22 sd*(X) = 1.49

41 statistics pop’n dist’n dist’n of a sample sample median
population median sample mean population mean sample variance population variance …. ….

42 no. of teeth no. of phone calls weight of body

43 no. of teeth weight of body no. of phone calls

44

45

46

47

48 Expected value

49

50 X f(xi) Head 1 0.5 Tail 1

51 Y f(yi) 1 1/6 2 3 4 5 6

52 X f(xi) 1 1/2 1/4 1/8

53 X 3X f(xi) 1 3 1/2 2 6 1/4 9 1/8 4 12

54

55

56 100 x + 10 x

57 100 x + 10 x X Y 100X 10Y 100X+10Y f 1 (H) 1 100 10 110 1/12 0 (T) 2 20 120 6 60 160

58

59 For any constant

60

61 Thank you !!


Download ppt "Mean and Variance."

Similar presentations


Ads by Google