< 문제 출처 >

이공학도를 위한 확률과 통계 3판 한글판

 

 

 

 

ds13.2.3-oil-well-drilling-costs.txt
0.00MB

 

 

 

a. 

 

 

> raw_datas <- read.table("ds13.2.3-oil-well-drilling-costs.txt", header=T)
> model <- lm(Cost ~ Depth + Geology + Downtime + Rig_index, data = raw_datas)
> summary(model)

Call:
lm(formula = Cost ~ Depth + Geology + Downtime + Rig_index, data = raw_datas)

Residuals:
    Min      1Q  Median      3Q     Max 
-931.10 -615.98  -29.55  544.06 1307.84 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) -3238.5755   863.8390  -3.749  0.00322 ** 
Depth           0.9615     0.1977   4.863  0.00050 ***
Geology         0.7315     2.1243   0.344  0.73706    
Downtime        2.8889     1.1736   2.462  0.03159 *  
Rig_index     389.9436   330.9654   1.178  0.26358    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 780.5 on 11 degrees of freedom
Multiple R-squared:  0.9432,	Adjusted R-squared:  0.9225 
F-statistic: 45.63 on 4 and 11 DF,  p-value: 8.751e-07

> coef(model)
  (Intercept)         Depth       Geology      Downtime     Rig_index 
-3238.5755047     0.9614660     0.7315475     2.8889063   389.9436061 

 

 - 추정값

βˆ 0 = -3238.6

βˆ 1 = 0.9615

βˆ 2 = 0.732

βˆ 3 = 2.889

βˆ 4 = 389.9

 

 

 

 

 

> plot(model, 1)

 

 

 

 

 

 

 

b. 

 

summary 결과를 통해 Geology p-value 수치를 확인할 수 있습니다.

Geology p-value = 0.737

이므로 not significant 임을 알 수 있습니다.

 

> cor.test(raw_datas$Geology, raw_datas$Cost)

	Pearson's product-moment correlation

data:  raw_datas$Geology and raw_datas$Cost
t = 7.2013, df = 14, p-value = 4.556e-06
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.6992388 0.9605522
sample estimates:
      cor 
0.8873679

Geology & Cost 상관계수 = 0.8873679

 

> cor.test(raw_datas$Depth, raw_datas$Geology)

	Pearson's product-moment correlation

data:  raw_datas$Depth and raw_datas$Geology
t = 8.9112, df = 14, p-value = 3.819e-07
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.7851826 0.9730106
sample estimates:
    cor 
0.92202 

Geology & Depth 상관계수 = 0.92202

 

Geology는 이미 모델에 포함된 Depth와 매우 높은 상관관계를 가지기 때문에 필요하지 않습니다.

 

 

 

 

 

c. 

 

p-value가 큰 Geology와 Rig_index 변수를 제거하는 것이 더 적절한 최종 모델입니다.

두 변수를 제외하고 다시 모델을 확인하면 아래와 같습니다.

> raw_datas <- read.table("ds13.2.3-oil-well-drilling-costs.txt", header=T)
> model <- lm(Cost ~ Depth + Downtime, data = raw_datas)
> summary(model)

Call:
lm(formula = Cost ~ Depth + Downtime, data = raw_datas)

Residuals:
    Min      1Q  Median      3Q     Max 
-922.78 -609.41  -22.18  497.20 1311.61 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) -3.011e+03  7.343e+02  -4.100  0.00125 ** 
Depth        1.039e+00  7.567e-02  13.734 4.08e-09 ***
Downtime     2.673e+00  1.134e+00   2.356  0.03482 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 764.4 on 13 degrees of freedom
Multiple R-squared:  0.9356,	Adjusted R-squared:  0.9257 
F-statistic:  94.4 on 2 and 13 DF,  p-value: 1.814e-08

 

cost = -3011 + (1.039 × depth) + (2.673 × downtime)

 

 

+ Recent posts