1 summary()函数进行描述性统计
> mydata1=read_excel("D:/Desktop/EconomicData.xlsx",sheet="Sheet2")
> summary(mydata1)
prov year pgdp eduyear pfdi
Length:240 Min. :2011 Min. : 1.644 Min. : 7.514 Min. :0.0004388
Class :character 1st Qu.:2013 1st Qu.: 3.505 1st Qu.: 8.719 1st Qu.:0.0403062
Mode :character Median :2014 Median : 4.373 Median : 9.100 Median :0.0825892
Mean :2014 Mean : 5.255 Mean : 9.186 Mean :0.1247431
3rd Qu.:2016 3rd Qu.: 6.338 3rd Qu.: 9.452 3rd Qu.:0.1389363
Max. :2018 Max. :15.368 Max. :12.555 Max. :0.8509067
open area educost college
Min. : 0.05078 Min. :1 Min. : 131.4 Min. : 9.00
1st Qu.: 0.34682 1st Qu.:1 1st Qu.: 642.8 1st Qu.: 57.00
Median : 0.55283 Median :2 Median : 908.6 Median : 84.50
Mean : 1.77374 Mean :2 Mean :1051.9 Mean : 84.77
3rd Qu.: 2.01176 3rd Qu.:3 3rd Qu.:1304.5 3rd Qu.:118.00
Max. :13.26505 Max. :3 Max. :4268.4 Max. :167.00
2 by()函数进行分组描述性统计
上述例子中为面板数据,我们想按照地区分组进行描述性统计。具体步骤为:
1.编写统计哪些指标的函数
2.使用sapply()函数对数据框中的变量进行循环
3.使用by()函数进行分组统计
mystat=function(inputdata){
n=length(inputdata)
mean=mean(inputdata)
min=min(inputdata)
max=max(inputdata)
sd=sd(inputdata)
return(c(n=n,mean=mean,min=min,max=max,sd=sd))
}
> mystats=function(x)sapply(x, mystat)
> by(mydata1,mydata1$prov,mystats)
mydata1$prov: 安徽
prov year pgdp eduyear pfdi
n "8" "8" "8" "8" "8"
mean NA "2014.5" "3.659224625" "8.6248075" "0.1303890875"
min "安徽" "2011" "2.563782" "8.24831" "0.0717416"
max "安徽" "2018" "5.378408" "8.94283" "0.1779193"
sd NA "2.44948974278318" "0.891766530379519" "0.210751866607562" "0.038245004714806"
open area educost college
n "8" "8" "8" "8"
mean "0.444225675" "2" "1145.7525" "118.125"
min "0.3282538" "2" "817.2" "116"
max "0.6216082" "2" "1501.18" "119"
sd "0.098407796964842" "0" "220.31628905151" "1.1259916264596"
------------------------------------------------------------------------------
说明:仅显示了安徽地区的统计,其他这里未显示!
3 hist()函数绘制分布图
> hist(mydata1$eduyear)
当然,hist()函数中直接使用默认参数绘制的图形不好看,可以修改参数后重新绘制直方图。
> hist(mydata1$eduyear,breaks=20,col="#88ADA6",xlab="受教育年",main="受教育年分布",xlim=c(7,13))
说明:breaks=20设置分成20组;col="#88ADA6"设置图形颜色;xlim=c(7,13)设置横坐标轴的范围。