1 数据说明
根据贷款人的身份信息来判断其是否违约,具体数据形式如下:
2 读取整理数据
> library(haven)
> bankData=read_sav("D:/Desktop/bankloan.sav")
> bankData=data.frame(bankData)[,1:9]
> bankData=na.omit(bankData)
3 logistic模型
3.1 初步回归
> fit_logit=glm(违约~.,data=bankData,family = "binomial")
> summary(fit_logit)
Call:
glm(formula = 违约 ~ ., family = "binomial", data = bankData)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.3680 -0.6431 -0.2920 0.2450 3.0019
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.553623 0.619272 -2.509 0.0121 *
年龄 0.034407 0.017370 1.981 0.0476 *
教育 0.090563 0.123056 0.736 0.4618
工龄 -0.258227 0.033159 -7.787 6.84e-15 ***
地址 -0.105004 0.023224 -4.521 6.15e-06 ***
收入 -0.008567 0.007956 -1.077 0.2816
负债率 0.067330 0.030532 2.205 0.0274 *
信用卡负债 0.625581 0.112827 5.545 2.95e-08 ***
其他负债 0.062704 0.077485 0.809 0.4184
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 804.36 on 699 degrees of freedom
Residual deviance: 551.67 on 691 degrees of freedom
AIC: 569.67
Number of Fisher Scoring iterations: 6
3.2 逐步回归法
> step_logit=step(object = fit_logit,trace = 0)
> summary(step_logit)
Call:
glm(formula = 违约 ~ 年龄 + 工龄 + 地址 + 负债率 + 信用卡负债,
family = "binomial", data = bankData)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.3555 -0.6521 -0.2949 0.2592 2.9132
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.63128 0.51268 -3.182 0.00146 **
年龄 0.03256 0.01717 1.896 0.05799 .
工龄 -0.26076 0.03011 -8.662 < 2e-16 ***
地址 -0.10365 0.02309 -4.490 7.13e-06 ***
负债率 0.08926 0.01855 4.813 1.49e-06 ***
信用卡负债 0.57265 0.08723 6.565 5.20e-11 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 804.36 on 699 degrees of freedom
Residual deviance: 553.18 on 694 degrees of freedom
AIC: 565.18
Number of Fisher Scoring iterations: 6
说明:采用逐步回归后,只有年龄、工龄、地址、负债率和信用卡负债5个变量加入到模型,且除了年龄在10%显著性水平下显著外,其余变量均在1%显著水平下显著。
3.3 模型整体显著性检验
> anova(object = step_logit,test = "Chisq")
Analysis of Deviance Table
Model: binomial, link: logit
Response: 违约
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev Pr(>Chi)
NULL 699 804.36
年龄 1 13.635 698 790.73 0.000222 ***
工龄 1 51.487 697 739.24 7.209e-13 ***
地址 1 9.220 696 730.02 0.002394 **
负债率 1 111.462 695 618.56 < 2.2e-16 ***
信用卡负债 1 65.385 694 553.18 6.162e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
说明:模型整体显著性检验分别给出了变量逐个加入到模型中的显著性。最终所有变量加入模型后时,P值为6.162e-16 < 0.01,因而模型整体显著!