描述性统计

Reads: 2243 Edit

1 summary()函数进行描述性统计

> mydata1=read_excel("D:/Desktop/EconomicData.xlsx",sheet="Sheet2")
> summary(mydata1)
     prov                year           pgdp           eduyear            pfdi          
 Length:240         Min.   :2011   Min.   : 1.644   Min.   : 7.514   Min.   :0.0004388  
 Class :character   1st Qu.:2013   1st Qu.: 3.505   1st Qu.: 8.719   1st Qu.:0.0403062  
 Mode  :character   Median :2014   Median : 4.373   Median : 9.100   Median :0.0825892  
                    Mean   :2014   Mean   : 5.255   Mean   : 9.186   Mean   :0.1247431  
                    3rd Qu.:2016   3rd Qu.: 6.338   3rd Qu.: 9.452   3rd Qu.:0.1389363  
                    Max.   :2018   Max.   :15.368   Max.   :12.555   Max.   :0.8509067  
      open               area      educost          college      
 Min.   : 0.05078   Min.   :1   Min.   : 131.4   Min.   :  9.00  
 1st Qu.: 0.34682   1st Qu.:1   1st Qu.: 642.8   1st Qu.: 57.00  
 Median : 0.55283   Median :2   Median : 908.6   Median : 84.50  
 Mean   : 1.77374   Mean   :2   Mean   :1051.9   Mean   : 84.77  
 3rd Qu.: 2.01176   3rd Qu.:3   3rd Qu.:1304.5   3rd Qu.:118.00  
 Max.   :13.26505   Max.   :3   Max.   :4268.4   Max.   :167.00  

2 by()函数进行分组描述性统计

上述例子中为面板数据,我们想按照地区分组进行描述性统计。具体步骤为:

1.编写统计哪些指标的函数
2.使用sapply()函数对数据框中的变量进行循环
3.使用by()函数进行分组统计

mystat=function(inputdata){
  n=length(inputdata)
  mean=mean(inputdata)
  min=min(inputdata)
  max=max(inputdata)
  sd=sd(inputdata)
  return(c(n=n,mean=mean,min=min,max=max,sd=sd))
}

> mystats=function(x)sapply(x, mystat)
> by(mydata1,mydata1$prov,mystats) 

mydata1$prov: 安徽
     prov   year               pgdp                eduyear             pfdi               
n    "8"    "8"                "8"                 "8"                 "8"                
mean NA     "2014.5"           "3.659224625"       "8.6248075"         "0.1303890875"     
min  "安徽" "2011"             "2.563782"          "8.24831"           "0.0717416"        
max  "安徽" "2018"             "5.378408"          "8.94283"           "0.1779193"        
sd   NA     "2.44948974278318" "0.891766530379519" "0.210751866607562" "0.038245004714806"
     open                area educost           college          
n    "8"                 "8"  "8"               "8"              
mean "0.444225675"       "2"  "1145.7525"       "118.125"        
min  "0.3282538"         "2"  "817.2"           "116"            
max  "0.6216082"         "2"  "1501.18"         "119"            
sd   "0.098407796964842" "0"  "220.31628905151" "1.1259916264596"    
------------------------------------------------------------------------------ 

说明:仅显示了安徽地区的统计,其他这里未显示!

3 hist()函数绘制分布图

> hist(mydata1$eduyear)

r-65

当然,hist()函数中直接使用默认参数绘制的图形不好看,可以修改参数后重新绘制直方图。

> hist(mydata1$eduyear,breaks=20,col="#88ADA6",xlab="受教育年",main="受教育年分布",xlim=c(7,13))

说明:breaks=20设置分成20组;col="#88ADA6"设置图形颜色;xlim=c(7,13)设置横坐标轴的范围。

r-66



获取案例数据,请关注微信公众号并回复:R_dt3


Comments

Make a comment