相关与回归分析

Reads: 2268 Edit

1 相关分析

1.1 相关系数

> x=c(0.051,0.926,0.209,0.358,1.672,-1.191,1.404,1.112,0.108,-0.429)
> y=c(-0.488,-0.466,2.464,-0.959,0.592,-0.925,0.0206,0.048,-1.857,0.708)
> cor(x,y)
[1] 0.1884933
> cor.test(x,y)

    Pearson's product-moment correlation

data:  x and y
t = 0.54287, df = 8, p-value = 0.602
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 -0.5005369  0.7313256
sample estimates:
      cor 
0.1884933 

说明:x和y的相关系数为0.1884933,p-value = 0.602>0.05,因而接受原假设,认为x和y不相关。

1.2 散点图

> plot(x, y, main="散点图", xlab="x ", ylab="y", pch=19,col="#88ADA6",xlim=c(-2,2.5),ylim=c(-2,3),bty="n")
> abline(lm(y~x), col="red") 

r-70

2 回归分析

2.1 初步回归

>  library(haven)
> carsale=read_sav("D:/Desktop/car_sales.sav")
> View(carsale)

r-71

> lm_result=lm(lnsales~type+price+engine_s+horsepow+wheelbas+width+length+curb_wgt+fuel_cap+mpg,data=carsale)
> summary(lm_result)

Call:
lm(formula = lnsales ~ type + price + engine_s + horsepow + wheelbas + 
    width + length + curb_wgt + fuel_cap + mpg, data = carsale)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.4507 -0.5087  0.0545  0.6471  2.0692 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) -3.016773   2.740832  -1.101 0.272912    
type         0.883034   0.330767   2.670 0.008484 ** 
price       -0.046396   0.012904  -3.596 0.000447 ***
engine_s     0.356266   0.190379   1.871 0.063368 .  
horsepow    -0.002152   0.004227  -0.509 0.611436    
wheelbas     0.041623   0.023324   1.785 0.076489 .  
width       -0.028087   0.041534  -0.676 0.499991    
length       0.014603   0.014153   1.032 0.303948    
curb_wgt     0.156488   0.349762   0.447 0.655265    
fuel_cap    -0.056668   0.047100  -1.203 0.230932    
mpg          0.081221   0.040140   2.023 0.044915 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.9896 on 141 degrees of freedom
  (因为不存在,5个观察量被删除了)
Multiple R-squared:  0.4855,	Adjusted R-squared:  0.449 
F-statistic: 13.31 on 10 and 141 DF,  p-value: 3.135e-16

> lnsales_res=residuals(lm_result)     ## 获得残差序列

说明:type、price、mpg三个变量的p值小于0.05,因而显著,是销量变量的影响因素。R方为0.4855;F统计量为13.31,且对应的P值小于0.05,模型整体线性关系显著。

2.2 逐步回归

> carsale_1=na.omit(carsale)
> lm_result1=lm(lnsales~type+price+engine_s+horsepow+wheelbas+width+length+curb_wgt+fuel_cap+mpg,data=carsale_1)
> lm_step=step(lm_result1)
Start:  AIC=24.35
lnsales ~ type + price + engine_s + horsepow + wheelbas + width + 
    length + curb_wgt + fuel_cap + mpg

           Df Sum of Sq    RSS    AIC
- length    1    0.0176 119.39 22.370
- width     1    0.0241 119.40 22.377
- curb_wgt  1    0.0242 119.40 22.377
- horsepow  1    0.0681 119.44 22.420
- engine_s  1    1.2766 120.65 23.598
- mpg       1    1.5934 120.97 23.905
<none>                  119.38 24.353
- fuel_cap  1    2.6187 122.00 24.892
- type      1    4.9556 124.33 27.112
- price     1    5.5770 124.95 27.695
- wheelbas  1    5.6307 125.01 27.746

Step:  AIC=22.37
lnsales ~ type + price + engine_s + horsepow + wheelbas + width + 
    curb_wgt + fuel_cap + mpg

           Df Sum of Sq    RSS    AIC
- width     1    0.0184 119.41 20.388
- curb_wgt  1    0.0445 119.44 20.414
- horsepow  1    0.0582 119.45 20.428
- engine_s  1    1.2759 120.67 21.614
- mpg       1    1.5792 120.97 21.908
<none>                  119.39 22.370
- fuel_cap  1    2.7105 122.11 22.997
- type      1    6.0258 125.42 26.131
- price     1    6.2983 125.69 26.385
- wheelbas  1   13.5696 132.96 32.965

Step:  AIC=20.39
lnsales ~ type + price + engine_s + horsepow + wheelbas + curb_wgt + 
    fuel_cap + mpg

           Df Sum of Sq    RSS    AIC
- curb_wgt  1    0.0347 119.45 18.423
- horsepow  1    0.0614 119.47 18.449
- engine_s  1    1.2752 120.69 19.631
- mpg       1    1.5753 120.99 19.922
<none>                  119.41 20.388
- fuel_cap  1    2.9315 122.34 21.226
- price     1    6.4889 125.90 24.580
- type      1    6.7950 126.21 24.864
- wheelbas  1   14.3539 133.77 31.669

Step:  AIC=18.42
lnsales ~ type + price + engine_s + horsepow + wheelbas + fuel_cap + 
    mpg

           Df Sum of Sq    RSS    AIC
- horsepow  1    0.0806 119.53 16.501
- engine_s  1    1.5373 120.98 17.919
- mpg       1    1.5784 121.03 17.958
<none>                  119.45 18.423
- fuel_cap  1    2.9895 122.44 19.315
- price     1    6.8517 126.30 22.948
- type      1    6.8749 126.32 22.970
- wheelbas  1   19.2049 138.65 33.866

Step:  AIC=16.5
lnsales ~ type + price + engine_s + wheelbas + fuel_cap + mpg

           Df Sum of Sq    RSS    AIC
- mpg       1    1.6623 121.19 16.117
<none>                  119.53 16.501
- engine_s  1    2.1807 121.71 16.617
- fuel_cap  1    2.9513 122.48 17.355
- type      1    7.8948 127.42 21.985
- price     1   15.8928 135.42 29.107
- wheelbas  1   19.1617 138.69 31.898

Step:  AIC=16.12
lnsales ~ type + price + engine_s + wheelbas + fuel_cap

           Df Sum of Sq    RSS    AIC
- engine_s  1    0.9749 122.17 15.055
<none>                  121.19 16.117
- type      1    6.3662 127.56 20.108
- fuel_cap  1    6.4829 127.67 20.214
- price     1   16.8657 138.06 29.362
- wheelbas  1   21.0277 142.22 32.837

Step:  AIC=15.05
lnsales ~ type + price + wheelbas + fuel_cap

           Df Sum of Sq    RSS    AIC
<none>                  122.17 15.055
- fuel_cap  1    5.6627 127.83 18.356
- type      1    6.2185 128.38 18.864
- price     1   17.8652 140.03 29.024
- wheelbas  1   23.9246 146.09 33.980

说明:采用逐步回归后,最终保留了fuel_cap、type、price、wheelbas四个变量。逐步回归没有给出显著性,所以需要再次运行lm()来估计这四个变量对lnsales的影响。

> lm_result2=lm(lnsales~type+price+fuel_cap+wheelbas,data=carsale_1)
> summary(lm_result2)

Call:
lm(formula = lnsales ~ type + price + fuel_cap + wheelbas, data = carsale_1)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.4197 -0.5928  0.0711  0.6575  2.2405 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) -2.373310   1.456755  -1.629   0.1061    
type         0.736757   0.308565   2.388   0.0186 *  
price       -0.035458   0.008761  -4.047 9.57e-05 ***
fuel_cap    -0.111819   0.049076  -2.278   0.0246 *  
wheelbas     0.079240   0.016919   4.683 7.98e-06 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.044 on 112 degrees of freedom
Multiple R-squared:  0.4116,	Adjusted R-squared:  0.3906 
F-statistic: 19.59 on 4 and 112 DF,  p-value: 3.044e-12

说明:再次估计后,fuel_cap、type、price、wheelbas四个变量均显著,可以作为最终的估计结果!

2.3 模型整体显著性检验

> anova(object = lm_result2,test = "Chisq")
Analysis of Variance Table

Response: lnsales
           Df  Sum Sq Mean Sq F value    Pr(>F)    
type        1  14.567  14.567 13.3545 0.0003939 ***
price       1  46.129  46.129 42.2903 2.276e-09 ***
fuel_cap    1   0.832   0.832  0.7628 0.3843273    
wheelbas    1  23.925  23.925 21.9338 7.979e-06 ***
Residuals 112 122.166   1.091                      
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

说明:模型整体显著性检验分别给出了变量逐个加入到模型中的显著性。最终所有变量加入模型后时,P值为7.979e-06 < 0.01,因而模型整体显著!



获取案例数据,请关注微信公众号并回复:R_dt8


Comments

Make a comment