1 读取Excel中的数据
1.1 需要在cmd中安装openpyxl库
pip install openpyxl
1.2 读取第一行包含列名的Excel数据
>>> import pandas as pd
>>> data_excel = pd.read_excel('D:\\Desktop\\EconomicData.xlsx', sheet_name='Sheet2', na_values='n/a')
>>> data_excel
prov year pgdp eduyear pfdi open area educost college
0 安徽 2011 2.563782 8.24831 0.071742 0.328254 2 817.20 116
1 安徽 2012 2.874424 8.51635 0.091082 0.347451 2 991.76 118
2 安徽 2013 3.189051 8.52448 0.109776 0.399829 2 1041.30 117
3 安徽 2014 3.427436 8.72823 0.124615 0.436284 2 1045.78 118
4 安徽 2015 3.581879 8.79599 0.138070 0.430795 2 1157.85 119
.. ... ... ... ... ... ... ... ... ...
235 重庆 2014 4.767868 8.95551 0.086924 1.695338 3 698.00 63
236 重庆 2015 5.210346 8.94135 0.077882 1.212252 3 797.10 64
237 重庆 2016 5.820404 9.07319 0.060801 1.129954 3 886.32 65
238 重庆 2017 6.316657 9.23159 0.049620 1.242047 3 948.35 65
239 重庆 2018 6.960110 9.30048 0.069336 1.454839 3 1021.63 65
[240 rows x 9 columns]
1.3 读取第一行不包含列名的Excel数据
如果第一行不是列名,则需要加参数header = None
,同时用names指定列名。
>>> data_excel = pd.read_excel('D:\\Desktop\\EconomicData.xlsx', sheet_name='Sheet3', na_values='n/a',header = None,names=['prov','year','pgdp','eduyear','pfdi','open','area','educost','college'])
>>> data_excel
prov year pgdp eduyear pfdi open area educost college
0 安徽 2011 2.563782 8.24831 0.071742 0.328254 2 817.20 116
1 安徽 2012 2.874424 8.51635 0.091082 0.347451 2 991.76 118
2 安徽 2013 3.189051 8.52448 0.109776 0.399829 2 1041.30 117
3 安徽 2014 3.427436 8.72823 0.124615 0.436284 2 1045.78 118
4 安徽 2015 3.581879 8.79599 0.138070 0.430795 2 1157.85 119
.. ... ... ... ... ... ... ... ... ...
235 重庆 2014 4.767868 8.95551 0.086924 1.695338 3 698.00 63
236 重庆 2015 5.210346 8.94135 0.077882 1.212252 3 797.10 64
237 重庆 2016 5.820404 9.07319 0.060801 1.129954 3 886.32 65
238 重庆 2017 6.316657 9.23159 0.049620 1.242047 3 948.35 65
239 重庆 2018 6.960110 9.30048 0.069336 1.454839 3 1021.63 65
[240 rows x 9 columns]
1.4 将一列作为行索引
>>> data_excel = pd.read_excel('D:\\Desktop\\EconomicData.xlsx', sheet_name='Sheet2', na_values='n/a',index_col='year')
>>> data_excel
prov pgdp eduyear pfdi open area educost college
year
2011 安徽 2.563782 8.24831 0.071742 0.328254 2 817.20 116
2012 安徽 2.874424 8.51635 0.091082 0.347451 2 991.76 118
2013 安徽 3.189051 8.52448 0.109776 0.399829 2 1041.30 117
2014 安徽 3.427436 8.72823 0.124615 0.436284 2 1045.78 118
2015 安徽 3.581879 8.79599 0.138070 0.430795 2 1157.85 119
... ... ... ... ... ... ... ... ...
2014 重庆 4.767868 8.95551 0.086924 1.695338 3 698.00 63
2015 重庆 5.210346 8.94135 0.077882 1.212252 3 797.10 64
2016 重庆 5.820404 9.07319 0.060801 1.129954 3 886.32 65
2017 重庆 6.316657 9.23159 0.049620 1.242047 3 948.35 65
2018 重庆 6.960110 9.30048 0.069336 1.454839 3 1021.63 65
[240 rows x 8 columns]
1.5 将多列作为行索引,跳过n行,读取n行
>>> data_excel = pd.read_excel('D:\\Desktop\\EconomicData.xlsx', sheet_name='Sheet3', na_values='n/a',header = None,names=['prov','year','pgdp','eduyear','pfdi','open','area','educost','college'],index_col=[0,1],skiprows=8,nrows=8)
>>> data_excel
pgdp eduyear pfdi open area educost college
prov year
北京 2011 8.051090 11.5550 0.225703 4.137171 1 737.38 87
2012 8.640313 11.8360 0.245325 3.925019 1 871.75 89
2013 9.362970 12.0280 0.249626 3.852769 1 999.84 89
2014 9.913939 11.8540 0.258120 4.085807 1 1093.74 89
2015 10.603360 12.0807 0.372929 3.752679 1 1117.12 91
2016 11.813310 12.3037 0.398281 3.739240 1 1193.47 91
2017 12.905950 12.5024 0.756735 3.782837 1 1251.27 92
2018 15.368120 12.5550 0.531770 3.914133 1 1352.54 92
2 读取CSV中的数据
>>> data_csv = pd.read_csv('D:\\Desktop\\EconomicData.csv',encoding='gbk')
>>> data_csv
prov year pgdp eduyear pfdi open area educost college
0 安徽 2011 2.563782 8.24831 0.071742 0.328254 2 817.20 116
1 安徽 2012 2.874424 8.51635 0.091082 0.347451 2 991.76 118
2 安徽 2013 3.189051 8.52448 0.109776 0.399829 2 1041.30 117
3 安徽 2014 3.427436 8.72823 0.124615 0.436284 2 1045.78 118
4 安徽 2015 3.581879 8.79599 0.138070 0.430795 2 1157.85 119
.. ... ... ... ... ... ... ... ... ...
235 重庆 2014 4.767868 8.95551 0.086924 1.695338 3 698.00 63
236 重庆 2015 5.210346 8.94135 0.077882 1.212252 3 797.10 64
237 重庆 2016 5.820404 9.07319 0.060801 1.129954 3 886.32 65
238 重庆 2017 6.316657 9.23159 0.049620 1.242047 3 948.35 65
239 重庆 2018 6.960110 9.30048 0.069336 1.454839 3 1021.63 65
[240 rows x 9 columns]
说明:在读取中文时经常会遇到编码错误,可以尝试加入encoding='gbk'
参数!
>>> data_csv = pd.read_csv('D:\\Desktop\\EconomicData.csv',encoding='gbk',header=0,names=['index1','index2','x1','x2','x3','x4','x5','x6','x7'],index_col=[0,1])
>>> data_csv
x1 x2 x3 x4 x5 x6 x7
index1 index2
安徽 2011 2.563782 8.24831 0.071742 0.328254 2 817.20 116
2012 2.874424 8.51635 0.091082 0.347451 2 991.76 118
2013 3.189051 8.52448 0.109776 0.399829 2 1041.30 117
2014 3.427436 8.72823 0.124615 0.436284 2 1045.78 118
2015 3.581879 8.79599 0.138070 0.430795 2 1157.85 119
... ... ... ... ... .. ... ...
重庆 2014 4.767868 8.95551 0.086924 1.695338 3 698.00 63
2015 5.210346 8.94135 0.077882 1.212252 3 797.10 64
2016 5.820404 9.07319 0.060801 1.129954 3 886.32 65
2017 6.316657 9.23159 0.049620 1.242047 3 948.35 65
2018 6.960110 9.30048 0.069336 1.454839 3 1021.63 65
[240 rows x 7 columns]
3 读取txt数据
>>> pd.read_csv('D:\Desktop\DRUG.txt')
Age Sex BP Cholesterol Na K Drug
0 23 F HIGH HIGH 0.792535 0.031258 drugY
1 47 M LOW HIGH 0.739309 0.056468 drugC
2 47 M LOW HIGH 0.697269 0.068944 drugC
3 28 F NORMAL HIGH 0.563682 0.072289 drugX
4 61 F LOW HIGH 0.559294 0.030998 drugY
.. ... .. ... ... ... ... ...
195 56 F LOW HIGH 0.848774 0.073380 drugC
196 16 M LOW HIGH 0.743021 0.061886 drugC
197 52 M NORMAL HIGH 0.549945 0.055581 drugX
198 23 M NORMAL NORMAL 0.784520 0.055959 drugX
199 40 F LOW NORMAL 0.683503 0.060226 drugX
[200 rows x 7 columns]
>>> data_txt=pd.read_csv('D:\\Desktop\\news.txt',encoding='gbk')
>>> data_txt.head(5)
10万预算的新款小型SUV 这4款很快就登场
0 一条[蜗牛车志]后台最新的昨天留言提问:“我在规划人生的第一台车,预算10万元左右。今年底能...
1 10万元预算,要SUV、要新车、还要有科技感…。你的要求有点多哈。不过没关系,有问题咱们...
2 10万预算的新款小型SUV 这4款很快就登场
3 说来也巧,就在节目前,我还正好看到一则新车的官方消息,很符合今天的话题,所以不妨作为我们...
4 当然它其实是一台标准的燃油动力车,配备1.5T涡轮增压发动机加上湿式7速双离合双速器的动...