200字范文,内容丰富有趣,生活中的好帮手!
200字范文 > Pandas统计分析基础(2):Pandas之数据的读写(读取csv和excel文件)

Pandas统计分析基础(2):Pandas之数据的读写(读取csv和excel文件)

时间:2022-10-29 12:01:08

相关推荐

Pandas统计分析基础(2):Pandas之数据的读写(读取csv和excel文件)

✅作者简介:大家好我是Xlong,一枚正在学习COMSOL、Python的工科研究僧

📃个人主页: Xlong的个人博客主页

🔥系列专栏: Python大数据分析

💖如果觉得博主的文章还不错的话,请👍支持一下博主哦🤞

目录

一、数据的读写

1.1 读写csv:pd.read_csv和pd.DataFrame.to_csv

1.2 读写excel:pd.read_excel 和 pd.DataFrame.to_excel

一、数据的读写

1.1 读写csv:pd.read_csv和pd.DataFrame.to_csv

注意区分pandas和numpy在读写csv方面的区别下面代码中的‘GDP.csv’文件可私聊博主获取!!!

(1)错误示范

import numpy as npgdp_np=np.loadtxt('GDP.csv',delimiter=',')print(gdp_np)

运行结果:

ValueError: could not convert string to float: ''

显示报错了!

(2)正确操作

import numpy as npgdp_np=np.loadtxt('GDP.csv',delimiter=',',dtype='str')print(gdp_np)

运行结果:

[['' '北京市' '天津市' '河北省' '山西省' '内蒙古自治区' '辽宁省' '吉林省' '黑龙江省' '上海市' '江苏省'

'浙江省' '安徽省' '福建省' '江西省' '山东省' '河南省' '湖北省' '湖南省' '广东省' '广西壮族自治区' '海南省'

'重庆市' '四川省' '贵州省' '云南省' '西藏自治区' '陕西省' '甘肃省' '青海省' '宁夏回族自治区' '新疆维吾尔自治区']

['' '30319.98' '18809.64' '36010.27' '16818.11' '17289.22'

'25315.35' '15074.62' '16361.62' '32679.87' '92595.4' '56197.15'

'30006.82' '35804.04' '21984.78' '76469.67' '48055.86' '39366.55'

'36425.78' '97277.77' '20352.51' '4832.05' '20363.19' '40678.13'

'14806.45' '17881.12' '1477.63' '24438.32' '8246.07' '2865.23'

'3705.18' '12199.08']

['' '28014.94' '18549.19' '34016.32' '15528.42' '16096.21'

'23409.24' '14944.53' '15902.68' '30632.99' '85869.76' '51768.26'

'27018' '32182.09' '20006.31' '72634.15' '44552.83' '35478.09'

'33902.96' '89705.23' '18523.26' '4462.54' '19424.73' '36980.22'

'13540.83' '16376.34' '1310.92' '21898.81' '7459.9' '2624.83' '3443.56'

'10881.96']

['' '25669.13' '17885.39' '32070.45' '13050.41' '18128.1' '22246.9'

'14776.8' '15386.09' '28178.65' '77388.28' '47251.36' '24407.62'

'28810.58' '18499' '68024.49' '40471.79' '32665.38' '31551.37'

'80854.91' '18317.64' '4053.2' '17740.59' '32934.54' '11776.73'

'14788.42' '1151.41' '19399.59' '7200.37' '2572.49' '3168.59' '9649.7']

['' '23014.59' '16538.19' '29806.11' '12766.49' '17831.51'

'28669.02' '14063.13' '15083.67' '25123.45' '70116.38' '42886.49'

'2.63' '25979.82' '16723.78' '63002.33' '37002.16' '29550.19'

'28902.21' '72812.55' '16803.12' '3702.76' '15717.27' '30053.1'

'10502.56' '13619.17' '1026.39' '18021.86' '6790.32' '2417.05'

'2911.77' '9324.8']

['' '21330.83' '15726.93' '29421.15' '12761.49' '17770.19'

'28626.58' '13803.14' '15039.38' '23567.7' '65088.32' '40173.03'

'20848.75' '24055.76' '15714.63' '59426.59' '34938.24' '27379.22'

'27037.32' '67809.85' '15672.89' '3500.72' '14262.6' '28536.66'

'9266.39' '12814.59' '920.83' '17689.94' '6836.82' '2303.32' '2752.1'

'9273.46']

['' '19800.81' '14442.01' '28442.95' '12665.25' '16916.5'

'27213.22' '13046.4' '14454.91' '21818.15' '59753.37' '37756.59'

'19229.34' '21868.49' '14410.19' '55230.32' '32191.3' '24791.83'

'24621.67' '62474.79' '14449.9' '3177.56' '12783.26' '26392.07'

'8086.86' '11832.31' '815.67' '16205.45' '6330.69' '2122.06' '2577.57'

'8443.84']

['' '17879.4' '12893.88' '26575.01' '12112.83' '15880.58'

'24846.43' '11939.24' '13691.58' '1.72' '54058.22' '34665.33'

'17212.05' '19701.78' '12948.88' '50013.24' '29599.31' '22250.45'

'22154.23' '57067.92' '13035.1' '2855.54' '11409.6' '23872.8' '6852.2'

'10309.47' '701.03' '14453.68' '5650.2' '1893.54' '2341.29' '7505.31']

['' '16251.93' '11307.28' '24515.76' '11237.55' '14359.88'

'22226.7' '10568.83' '12582' '19195.69' '49110.27' '32318.85'

'15300.65' '17560.18' '11702.82' '45361.85' '26931.03' '19632.26'

'19669.56' '53210.28' '11720.87' '2522.66' '10011.37' '21026.68'

'5701.84' '8893.12' '605.83' '12512.3' '5020.37' '1670.44' '2102.21'

'6610.05']

['' '14113.58' '9224.46' '20394.26' '9200.86' '11672' '18457.27'

'8667.58' '10368.6' '17165.98' '41425.48' '27722.31' '12359.33'

'14737.12' '9451.26' '39169.92' '23092.36' '15967.61' '16037.96'

'46013.06' '9569.85' '2064.5' '7925.58' '17185.48' '4602.16' '7224.18'

'507.46' '10123.48' '4120.75' '1350.43' '1689.65' '5437.47']

['' '12153.03' '7521.85' '17235.48' '7358.31' '9740.25' '15212.49'

'7278.75' '8587' '15046.45' '34457.3' '22990.35' '10062.82' '12236.53'

'7655.18' '33896.65' '19480.46' '12961.1' '13059.69' '39482.56'

'7759.16' '1654.21' '6530.01' '14151.28' '3912.68' '6169.75' '441.36'

'8169.8' '3387.56' '1081.27' '1353.31' '4277.05']

['' '11115' '6719.01' '16011.97' '7315.4' '8496.2' '13668.58'

'6426.1' '8314.37' '14069.86' '30981.98' '21462.69' '8851.66'

'10823.01' '6971.05' '30933.28' '18018.53' '11328.92' '11555'

'36796.71' '7021' '1503.06' '5793.66' '12601.23' '3561.56' '5692.12'

'394.85' '7314.58' '3166.82' '1018.62' '1203.92' '4183.21']

['' '9846.81' '5252.76' '13607.32' '6024.45' '6423.18' '11164.3'

'5284.69' '7104' '12494.01' '26018.48' '18753.73' '7360.92' '9248.53'

'5800.25' '25776.91' '15012.46' '9333.4' '9439.6' '31777.01' '5823.41'

'1254.17' '4676.13' '10562.39' '2884.11' '4772.52' '341.43' '5757.29'

'2703.98' '797.35' '919.11' '3523.16']

['' '8117.78' '4462.74' '11467.6' '4878.61' '4944.25' '9304.52'

'4275.12' '6211.8' '10572.24' '21742.05' '15718.47' '6112.5' '7583.85'

'4820.53' '21900.19' '12362.79' '7617.47' '7688.67' '26587.76'

'4746.16' '1065.67' '3907.23' '8690.24' '2338.98' '3988.14' '290.76'

'4743.61' '2277.35' '648.5' '725.9' '3045.26']

['' '6969.52' '3905.64' '10012.11' '4230.53' '3905.03' '8047.26'

'3620.27' '5513.7' '9247.66' '18598.69' '13417.68' '5350.17' '6554.69'

'4056.76' '18366.87' '10587.42' '6590.19' '6596.1' '22557.37' '3984.1'

'918.75' '3467.72' '7385.1' '.42' '3462.73' '248.8' '3933.72'

'1933.98' '543.32' '612.61' '2604.19']

['' '6033.21' '3110.97' '8477.63' '3571.37' '3041.07' '6672'

'3122.01' '4750.6' '8072.83' '15003.6' '11648.7' '4759.3' '5763.35'

'3456.7' '15021.84' '8553.79' '5633.24' '5641.94' '18864.62' '3433.5'

'819.66' '3034.58' '6379.63' '1677.8' '3081.91' '220.34' '3175.58'

'1688.49' '466.1' '537.11' '2209.09']

['' '5007.21' '2578.03' '6921.29' '2855.23' '2388.38' '6002.54'

'2662.08' '4057.4' '6694.23' '12442.87' '9705.02' '3923.11' '4983.67'

'2807.41' '12078.15' '6867.7' '4757.45' '4659.99' '15844.64' '2821.11'

'713.96' '2555.72' '5333.09' '1426.34' '2556.02' '185.09' '2587.72'

'1399.83' '390.2' '445.36' '1886.35']

['2002年' '4315' '2150.76' '6018.28' '2324.8' '1940.94' '5458.22'

'2348.54' '3637.2' '5741.03' '10606.85' '8003.67' '3519.72' '4467.55'

'2450.48' '10275.5' '6035.48' '4212.82' '4151.54' '13502.42' '2523.73'

'642.73' '2232.86' '4725.01' '1243.43' '2312.82' '162.04' '2253.39'

'1232.03' '340.65' '377.16' '1612.65']

['2001年' '3707.96' '1919.09' '5516.76' '2029.53' '1713.81' '5033.08'

'2120.35' '3390.1' '5210.12' '9456.84' '6898.34' '3246.71' '4072.85'

'2175.68' '9195.04' '5533.01' '3880.53' '3831.9' '12039.25' '2279.34'

'579.17' '1976.86' '4293.49' '1133.27' '2138.31' '139.16' '.62'

'1125.37' '300.13' '337.44' '1491.6']

['2000年' '3161.66' '1701.88' '5043.96' '1845.72' '1539.12' '4669.06'

'1951.51' '3151.4' '4771.17' '8553.69' '6141.03' '2902.09' '3764.54'

'.07' '8337.47' '5052.99' '3545.39' '3551.49' '10741.25' '2080.04'

'526.82' '1791' '3928.2' '1029.92' '.19' '117.8' '1804' '1052.88'

'263.68' '295.02' '1363.56']]

Process finished with exit code 0

注意:numpy在读csv文件时,是将csv文件读作一个数组,数组要求其中的每一个元素都是相同的数据类型。因此,当有文字内容时,得读作str类型

import numpy as npgdp_pd=pd.read_csv('GDP.csv')print(gdp_pd)

运行结果:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb1 in position 1: invalid start byte

import numpy as npgdp_pd=pd.read_csv('GDP.csv',encoding='gbk')print(gdp_pd)

运行结果:

Unnamed: 0 北京市 天津市 ... 青海省 宁夏回族自治区 新疆维吾尔自治区

0 30319.98 18809.64 ... 2865.23 3705.18 12199.08

1 28014.94 18549.19 ... 2624.83 3443.56 10881.96

2 25669.13 17885.39 ... 2572.49 3168.59 9649.70

3 23014.59 16538.19 ... 2417.05 2911.77 9324.80

4 21330.83 15726.93 ... 2303.32 2752.10 9273.46

5 19800.81 14442.01 ... 2122.06 2577.57 8443.84

6 17879.40 12893.88 ... 1893.54 2341.29 7505.31

7 16251.93 11307.28 ... 1670.44 2102.21 6610.05

8 14113.58 9224.46 ... 1350.43 1689.65 5437.47

9 12153.03 7521.85 ... 1081.27 1353.31 4277.05

10 11115.00 6719.01 ... 1018.62 1203.92 4183.21

11 9846.81 5252.76 ... 797.35 919.11 3523.16

12 8117.78 4462.74 ... 648.50 725.90 3045.26

13 6969.52 3905.64 ... 543.32 612.61 2604.19

14 6033.21 3110.97 ... 466.10 537.11 2209.09

15 5007.21 2578.03 ... 390.20 445.36 1886.35

16 2002年 4315.00 2150.76 ... 340.65 377.16 1612.65

17 2001年 3707.96 1919.09 ... 300.13 337.44 1491.60

18 2000年 3161.66 1701.88 ... 263.68 295.02 1363.56

[19 rows x 32 columns]

import numpy as npgdp_pd=pd.read_csv('GDP.csv',encoding='gbk')#print(gdp_pd)print(gdp_pd.index)

运行结果:

RangeIndex(start=0, stop=19, step=1)

默认情况下,会自动生成index,从0开始的数字

如果csv里有填好的index,可以通过index_col参数指定

import numpy as npgdp_pd=pd.read_csv('GDP.csv',encoding='gbk',index_col=0)print(gdp_pd)

运行结果:

北京市 天津市 河北省 ... 青海省 宁夏回族自治区 新疆维吾尔自治区

30319.98 18809.64 36010.27 ... 2865.23 3705.18 12199.08

28014.94 18549.19 34016.32 ... 2624.83 3443.56 10881.96

25669.13 17885.39 32070.45 ... 2572.49 3168.59 9649.70

23014.59 16538.19 29806.11 ... 2417.05 2911.77 9324.80

21330.83 15726.93 29421.15 ... 2303.32 2752.10 9273.46

19800.81 14442.01 28442.95 ... 2122.06 2577.57 8443.84

17879.40 12893.88 26575.01 ... 1893.54 2341.29 7505.31

16251.93 11307.28 24515.76 ... 1670.44 2102.21 6610.05

14113.58 9224.46 20394.26 ... 1350.43 1689.65 5437.47

12153.03 7521.85 17235.48 ... 1081.27 1353.31 4277.05

11115.00 6719.01 16011.97 ... 1018.62 1203.92 4183.21

9846.81 5252.76 13607.32 ... 797.35 919.11 3523.16

8117.78 4462.74 11467.60 ... 648.50 725.90 3045.26

6969.52 3905.64 10012.11 ... 543.32 612.61 2604.19

6033.21 3110.97 8477.63 ... 466.10 537.11 2209.09

5007.21 2578.03 6921.29 ... 390.20 445.36 1886.35

2002年 4315.00 2150.76 6018.28 ... 340.65 377.16 1612.65

2001年 3707.96 1919.09 5516.76 ... 300.13 337.44 1491.60

2000年 3161.66 1701.88 5043.96 ... 263.68 295.02 1363.56

[19 rows x 31 columns]

import pandas as pdgdp_pd=pd.read_csv('GDP.csv',encoding='gbk',index_col=0)#print(gdp_pd)print(gdp_pd.index)

运行结果:

Index(['', '', '', '', '', '', '', '',

'', '', '', '', '', '', '', '',

'2002年', '2001年', '2000年'],

dtype='object')

gdp_pd.to_csv('another_gdp.csv') #默认会保存行名和列名

注意:默认保存的编码是utf-8,但excel读取的默认编码是gbk,所以会出现乱码。可以在保存时指定编码解决。

gdp_pd.to_csv('another_gdp.csv',encoding='gbk') #默认会保存行名和列名

注意:只有Dataframe对象才有to_csv这个功能,ndarray不可以

import pandas as pdgdp_pd=pd.read_csv('GDP.csv',encoding='gbk',index_col=0)print(type(gdp_pd))

运行结果:

<class 'pandas.core.frame.DataFrame'>

pip update pandas用于更新pandas包

没权限的就用管理员身份运行cmd

1.2 读写excel:pd.read_excel 和 pd.DataFrame.to_excel

下面代码中的‘GDPandPopulation.xlsx’文件可私聊博主获取!!!

import pandas as pdgdpandpop_excel=pd.read_excel('GDPandPopulation.xlsx')print(gdpandpop_excel)

运行结果:

Missing optional dependency 'openpyxl'. Use pip or conda to install openpyxl.

我们发现:在使用pandas导入文件数据的时候,运行read_csv正常,运行的read_excel 出现错误,直接按照提示安装openpyxl。

解决方法:

左上角 File-->Settings--->project -->Python interpreter -->+号搜索安装openpyxl-->Install Package

再次运行如下:

Unnamed: 0 北京市 天津市 ... 青海省 宁夏回族自治区 新疆维吾尔自治区

0 30319.98 18809.64 ... 2865.23 3705.18 12199.08

1 28014.94 18549.19 ... 2624.83 3443.56 10881.96

2 25669.13 17885.39 ... 2572.49 3168.59 9649.70

3 23014.59 16538.19 ... 2417.05 2911.77 9324.80

4 21330.83 15726.93 ... 2303.32 2752.10 9273.46

5 19800.81 14442.01 ... 2122.06 2577.57 8443.84

6 17879.40 12893.88 ... 1893.54 2341.29 7505.31

7 16251.93 11307.28 ... 1670.44 2102.21 6610.05

8 14113.58 9224.46 ... 1350.43 1689.65 5437.47

9 12153.03 7521.85 ... 1081.27 1353.31 4277.05

10 11115.00 6719.01 ... 1018.62 1203.92 4183.21

11 9846.81 5252.76 ... 797.35 919.11 3523.16

12 8117.78 4462.74 ... 648.50 725.90 3045.26

13 6969.52 3905.64 ... 543.32 612.61 2604.19

14 6033.21 3110.97 ... 466.10 537.11 2209.09

15 5007.21 2578.03 ... 390.20 445.36 1886.35

16 2002年 4315.00 2150.76 ... 340.65 377.16 1612.65

17 2001年 3707.96 1919.09 ... 300.13 337.44 1491.60

18 2000年 3161.66 1701.88 ... 263.68 295.02 1363.56

[19 rows x 32 columns]

在不指明sheetname的时候,会读取Excel里排在第一个的sheet,要注意!

import pandas as pdpopulation=pd.read_excel('GDPandPopulation.xlsx',sheet_name='Population',index_col=0)print(population)

运行结果:

北京市 天津市 河北省 山西省 内蒙古自治区 ... 陕西省 甘肃省 青海省 宁夏回族自治区 新疆维吾尔自治区

2154 1560 7556 3718 2534 ... 3864 2637 603 688 2487

2171 1557 7520 3702 2529 ... 3835 2626 598 682 2445

2173 1562 7470 3682 2520 ... 3813 2610 593 675 2398

2171 1547 7425 3664 2511 ... 3793 2600 588 668 2360

2152 1517 7384 3648 2505 ... 3775 2591 583 662 2298

2115 1472 7333 3630 2498 ... 3764 2582 578 654 2264

2069 1413 7288 3611 2490 ... 3753 2578 573 647 2233

1355 7241 3593 2482 ... 3743 2564 568 639 2209

1962 1299 7194 3574 2472 ... 3735 2560 563 633 2185

1860 1228 7034 3427 2458 ... 3727 2555 557 625 2159

1771 1176 6989 3411 2444 ... 3718 2551 554 618 2131

1676 1115 6943 3393 2429 ... 3708 2548 552 610 2095

1601 1075 6898 3375 2415 ... 3699 2547 548 604 2050

1538 1043 6851 3355 2403 ... 3690 2545 543 596

1493 1024 6809 3335 2393 ... 3681 2541 539 588 1963

1456 1011 6769 3314 2386 ... 3672 2537 534 580 1934

2002年 1423 1007 6735 3294 2384 ... 3662 2531 529 572 1905

2001年 1385 1004 6699 3272 2381 ... 3653 2523 523 563 1876

2000年 1364 1001 6674 3247 2372 ... 3644 2515 517 554 1849

[19 rows x 31 columns]

import pandas as pdpopulation=pd.read_excel('GDPandPopulation.xlsx',sheet_name='Population',index_col=0)#print(population)print(type(population))

运行结果:

<class 'pandas.core.frame.DataFrame'>

import pandas as pdpopulation=pd.read_excel('GDPandPopulation.xlsx',sheet_name='Population',index_col=0)gdp=pd.read_excel('GDPandPopulation.xlsx',sheet_name='GDP',index_col=0)#要注意,使用to_excel时,文件名必须是xlsx或者xls等excel的专用文件名population.to_excel('p.xlsx',sheet_name='pop')#直接使用Dataframe.to_excel(),会把之前的excel文件替换掉,即使设了不同的sheet_name也不行gdp.to_excel('g.xlsx',sheet_name='gdp')

运行结果:

生成'p.xlsx'和'g.xlsx'两个数据文件,分别存放population和gdp数据

注意:想要将多个sheet保存到excel中,必须通过excelwriter对象来传递文件路径

import pandas as pdpopulation=pd.read_excel('GDPandPopulation.xlsx',sheet_name='Population',index_col=0)gdp=pd.read_excel('GDPandPopulation.xlsx',sheet_name='GDP',index_col=0)excel=pd.ExcelWriter('output.xlsx') #生成excel_writer对象gdp.to_excel(excel,sheet_name='gdp') #把之前输入文件名的位置替换成excel_writer对象population.to_excel(excel,sheet_name='pop')excel.save() #运行后才会生成output.xlsx文件

运行结果:

生成'output.xlsx'文件,存储两个sheet到Excel中。这两个sheet分别存放population和gdp数据

以上就是Pandas统计分析基础(2)Pandas之数据的读写,如果有改进的建议,欢迎在评论区留言交流~

持续更新中......原创不易,各位看官请随手点下Follow和Star,感谢!!!

本内容不代表本网观点和政治立场,如有侵犯你的权益请联系我们处理。
网友评论
网友评论仅供其表达个人看法,并不表明网站立场。