1 相关分析
1.1 相关系数
> x=c(0.051,0.926,0.209,0.358,1.672,-1.191,1.404,1.112,0.108,-0.429)
> y=c(-0.488,-0.466,2.464,-0.959,0.592,-0.925,0.0206,0.048,-1.857,0.708)
> cor(x,y)
[1] 0.1884933
> cor.test(x,y)
Pearson's product-moment correlation
data: x and y
t = 0.54287, df = 8, p-value = 0.602
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.5005369 0.7313256
sample estimates:
cor
0.1884933
说明:x和y的相关系数为0.1884933,p-value = 0.602>0.05,因而接受原假设,认为x和y不相关。
1.2 散点图
> plot(x, y, main="散点图", xlab="x ", ylab="y", pch=19,col="#88ADA6",xlim=c(-2,2.5),ylim=c(-2,3),bty="n")
> abline(lm(y~x), col="red")
2 回归分析
2.1 初步回归
> library(haven)
> carsale=read_sav("D:/Desktop/car_sales.sav")
> View(carsale)
> lm_result=lm(lnsales~type+price+engine_s+horsepow+wheelbas+width+length+curb_wgt+fuel_cap+mpg,data=carsale)
> summary(lm_result)
Call:
lm(formula = lnsales ~ type + price + engine_s + horsepow + wheelbas +
width + length + curb_wgt + fuel_cap + mpg, data = carsale)
Residuals:
Min 1Q Median 3Q Max
-4.4507 -0.5087 0.0545 0.6471 2.0692
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -3.016773 2.740832 -1.101 0.272912
type 0.883034 0.330767 2.670 0.008484 **
price -0.046396 0.012904 -3.596 0.000447 ***
engine_s 0.356266 0.190379 1.871 0.063368 .
horsepow -0.002152 0.004227 -0.509 0.611436
wheelbas 0.041623 0.023324 1.785 0.076489 .
width -0.028087 0.041534 -0.676 0.499991
length 0.014603 0.014153 1.032 0.303948
curb_wgt 0.156488 0.349762 0.447 0.655265
fuel_cap -0.056668 0.047100 -1.203 0.230932
mpg 0.081221 0.040140 2.023 0.044915 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.9896 on 141 degrees of freedom
(因为不存在,5个观察量被删除了)
Multiple R-squared: 0.4855, Adjusted R-squared: 0.449
F-statistic: 13.31 on 10 and 141 DF, p-value: 3.135e-16
> lnsales_res=residuals(lm_result) ## 获得残差序列
说明:type、price、mpg三个变量的p值小于0.05,因而显著,是销量变量的影响因素。R方为0.4855;F统计量为13.31,且对应的P值小于0.05,模型整体线性关系显著。
2.2 逐步回归
> carsale_1=na.omit(carsale)
> lm_result1=lm(lnsales~type+price+engine_s+horsepow+wheelbas+width+length+curb_wgt+fuel_cap+mpg,data=carsale_1)
> lm_step=step(lm_result1)
Start: AIC=24.35
lnsales ~ type + price + engine_s + horsepow + wheelbas + width +
length + curb_wgt + fuel_cap + mpg
Df Sum of Sq RSS AIC
- length 1 0.0176 119.39 22.370
- width 1 0.0241 119.40 22.377
- curb_wgt 1 0.0242 119.40 22.377
- horsepow 1 0.0681 119.44 22.420
- engine_s 1 1.2766 120.65 23.598
- mpg 1 1.5934 120.97 23.905
<none> 119.38 24.353
- fuel_cap 1 2.6187 122.00 24.892
- type 1 4.9556 124.33 27.112
- price 1 5.5770 124.95 27.695
- wheelbas 1 5.6307 125.01 27.746
Step: AIC=22.37
lnsales ~ type + price + engine_s + horsepow + wheelbas + width +
curb_wgt + fuel_cap + mpg
Df Sum of Sq RSS AIC
- width 1 0.0184 119.41 20.388
- curb_wgt 1 0.0445 119.44 20.414
- horsepow 1 0.0582 119.45 20.428
- engine_s 1 1.2759 120.67 21.614
- mpg 1 1.5792 120.97 21.908
<none> 119.39 22.370
- fuel_cap 1 2.7105 122.11 22.997
- type 1 6.0258 125.42 26.131
- price 1 6.2983 125.69 26.385
- wheelbas 1 13.5696 132.96 32.965
Step: AIC=20.39
lnsales ~ type + price + engine_s + horsepow + wheelbas + curb_wgt +
fuel_cap + mpg
Df Sum of Sq RSS AIC
- curb_wgt 1 0.0347 119.45 18.423
- horsepow 1 0.0614 119.47 18.449
- engine_s 1 1.2752 120.69 19.631
- mpg 1 1.5753 120.99 19.922
<none> 119.41 20.388
- fuel_cap 1 2.9315 122.34 21.226
- price 1 6.4889 125.90 24.580
- type 1 6.7950 126.21 24.864
- wheelbas 1 14.3539 133.77 31.669
Step: AIC=18.42
lnsales ~ type + price + engine_s + horsepow + wheelbas + fuel_cap +
mpg
Df Sum of Sq RSS AIC
- horsepow 1 0.0806 119.53 16.501
- engine_s 1 1.5373 120.98 17.919
- mpg 1 1.5784 121.03 17.958
<none> 119.45 18.423
- fuel_cap 1 2.9895 122.44 19.315
- price 1 6.8517 126.30 22.948
- type 1 6.8749 126.32 22.970
- wheelbas 1 19.2049 138.65 33.866
Step: AIC=16.5
lnsales ~ type + price + engine_s + wheelbas + fuel_cap + mpg
Df Sum of Sq RSS AIC
- mpg 1 1.6623 121.19 16.117
<none> 119.53 16.501
- engine_s 1 2.1807 121.71 16.617
- fuel_cap 1 2.9513 122.48 17.355
- type 1 7.8948 127.42 21.985
- price 1 15.8928 135.42 29.107
- wheelbas 1 19.1617 138.69 31.898
Step: AIC=16.12
lnsales ~ type + price + engine_s + wheelbas + fuel_cap
Df Sum of Sq RSS AIC
- engine_s 1 0.9749 122.17 15.055
<none> 121.19 16.117
- type 1 6.3662 127.56 20.108
- fuel_cap 1 6.4829 127.67 20.214
- price 1 16.8657 138.06 29.362
- wheelbas 1 21.0277 142.22 32.837
Step: AIC=15.05
lnsales ~ type + price + wheelbas + fuel_cap
Df Sum of Sq RSS AIC
<none> 122.17 15.055
- fuel_cap 1 5.6627 127.83 18.356
- type 1 6.2185 128.38 18.864
- price 1 17.8652 140.03 29.024
- wheelbas 1 23.9246 146.09 33.980
说明:采用逐步回归后,最终保留了fuel_cap、type、price、wheelbas四个变量。逐步回归没有给出显著性,所以需要再次运行lm()来估计这四个变量对lnsales的影响。
> lm_result2=lm(lnsales~type+price+fuel_cap+wheelbas,data=carsale_1)
> summary(lm_result2)
Call:
lm(formula = lnsales ~ type + price + fuel_cap + wheelbas, data = carsale_1)
Residuals:
Min 1Q Median 3Q Max
-4.4197 -0.5928 0.0711 0.6575 2.2405
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.373310 1.456755 -1.629 0.1061
type 0.736757 0.308565 2.388 0.0186 *
price -0.035458 0.008761 -4.047 9.57e-05 ***
fuel_cap -0.111819 0.049076 -2.278 0.0246 *
wheelbas 0.079240 0.016919 4.683 7.98e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.044 on 112 degrees of freedom
Multiple R-squared: 0.4116, Adjusted R-squared: 0.3906
F-statistic: 19.59 on 4 and 112 DF, p-value: 3.044e-12
说明:再次估计后,fuel_cap、type、price、wheelbas四个变量均显著,可以作为最终的估计结果!
2.3 模型整体显著性检验
> anova(object = lm_result2,test = "Chisq")
Analysis of Variance Table
Response: lnsales
Df Sum Sq Mean Sq F value Pr(>F)
type 1 14.567 14.567 13.3545 0.0003939 ***
price 1 46.129 46.129 42.2903 2.276e-09 ***
fuel_cap 1 0.832 0.832 0.7628 0.3843273
wheelbas 1 23.925 23.925 21.9338 7.979e-06 ***
Residuals 112 122.166 1.091
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
说明:模型整体显著性检验分别给出了变量逐个加入到模型中的显著性。最终所有变量加入模型后时,P值为7.979e-06 < 0.01,因而模型整体显著!