Logistic回归

Reads: 2152 Edit

1 数据说明

根据贷款人的身份信息来判断其是否违约,具体数据形式如下:

spss-269

2 读取整理数据

> library(haven)
> bankData=read_sav("D:/Desktop/bankloan.sav")
> bankData=data.frame(bankData)[,1:9]
> bankData=na.omit(bankData)

3 logistic模型

3.1 初步回归

> fit_logit=glm(违约~.,data=bankData,family = "binomial")
> summary(fit_logit)

Call:
glm(formula = 违约 ~ ., family = "binomial", data = bankData)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.3680  -0.6431  -0.2920   0.2450   3.0019  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept) -1.553623   0.619272  -2.509   0.0121 *  
年龄         0.034407   0.017370   1.981   0.0476 *  
教育         0.090563   0.123056   0.736   0.4618    
工龄        -0.258227   0.033159  -7.787 6.84e-15 ***
地址        -0.105004   0.023224  -4.521 6.15e-06 ***
收入        -0.008567   0.007956  -1.077   0.2816    
负债率       0.067330   0.030532   2.205   0.0274 *  
信用卡负债   0.625581   0.112827   5.545 2.95e-08 ***
其他负债     0.062704   0.077485   0.809   0.4184    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 804.36  on 699  degrees of freedom
Residual deviance: 551.67  on 691  degrees of freedom
AIC: 569.67

Number of Fisher Scoring iterations: 6

3.2 逐步回归法

> step_logit=step(object = fit_logit,trace = 0)
> summary(step_logit)

Call:
glm(formula = 违约 ~ 年龄 + 工龄 + 地址 + 负债率 + 信用卡负债, 
    family = "binomial", data = bankData)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.3555  -0.6521  -0.2949   0.2592   2.9132  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -1.63128    0.51268  -3.182  0.00146 ** 
年龄         0.03256    0.01717   1.896  0.05799 .  
工龄        -0.26076    0.03011  -8.662  < 2e-16 ***
地址        -0.10365    0.02309  -4.490 7.13e-06 ***
负债率       0.08926    0.01855   4.813 1.49e-06 ***
信用卡负债   0.57265    0.08723   6.565 5.20e-11 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 804.36  on 699  degrees of freedom
Residual deviance: 553.18  on 694  degrees of freedom
AIC: 565.18

Number of Fisher Scoring iterations: 6

说明:采用逐步回归后,只有年龄、工龄、地址、负债率和信用卡负债5个变量加入到模型,且除了年龄在10%显著性水平下显著外,其余变量均在1%显著水平下显著。

3.3 模型整体显著性检验

> anova(object = step_logit,test = "Chisq")
Analysis of Deviance Table

Model: binomial, link: logit

Response: 违约

Terms added sequentially (first to last)


           Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
NULL                         699     804.36              
年龄        1   13.635       698     790.73  0.000222 ***
工龄        1   51.487       697     739.24 7.209e-13 ***
地址        1    9.220       696     730.02  0.002394 ** 
负债率      1  111.462       695     618.56 < 2.2e-16 ***
信用卡负债  1   65.385       694     553.18 6.162e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

说明:模型整体显著性检验分别给出了变量逐个加入到模型中的显著性。最终所有变量加入模型后时,P值为6.162e-16 < 0.01,因而模型整体显著!



获取案例数据,请关注微信公众号并回复:R_dt9


Comments

Make a comment