【问题标题】:How to set a specific column to a int type with pandas如何使用 pandas 将特定列设置为 int 类型
【发布时间】:2018-02-20 23:46:53
【问题描述】:

我有这个脚本用于将一些 csv 文件从文件夹写入 excel:

from pandas.io.excel import ExcelWriter
import pandas
import os

path = 'data/'
ordered_list = sorted(os.listdir(path), key = lambda x: int(x.split(".")[0]))


with ExcelWriter('my_excel.xlsx') as ew:
    for csv_file in ordered_list:
        pandas.read_csv(path + csv_file).to_excel(ew, index = False, sheet_name=csv_file[:-4], encoding='utf-8')

现在我的问题是所有列(比如说 G:H)都是字符串格式(例如 '400 或 '10),之前有一个 ',我认为它们以字符串形式出现,因为 csv 将它们转换为字符串,我需要他们是 int,我怎样才能使 G:H int?!我用python 3,谢谢!

PS(这是一个 csv 样本):

ANPIS,,,,,,,
AGENTIA JUDETEANA PENTRU PLATI SI INSPECTIE SOCIALA TIMIS,,,,,,,
,,,,,,,
Macheta Comparativa CREDITORI - numai pentru Beneficiile a caror Evidenta se tine si in Contabilitate si in aplicatia SAFIR,,,,,,,
Situatie ANALITICA - NOMINAL la 30.06.2017,,,,,,,
1. ALOCATIA DE STAT PENTRU COPII,,,,,,,
Nr. Benef,Nume Prenume,CNP,Data Constituirii,Suma Contabilitate,Suma SAFIR,Differenta Suma,Explicatii daca exista diferente
1,2,3,4,5,6,7=5-6,8
1,CAZACU MIHAI,133121140,Aug 2016,84,84
2,NICOARA PETRU,143152638,"Aug 2014, Sept 2014",126,84
3,CERNEA NICOLAE DAN,143354723,Dec 2015,84,84
4,LUDWIG PETRU,144091376,Nov 2014,42,42
5,POPA REMUS,1440915363,Iun 2015,84,84
6,BOGDAN MARCEL,144154726,"Feb 2015, Apr 2015, Sept 2015, Oct 2015, Feb 2016",336,336
7,HENDRE AUGUSTIN,145054704,Feb 2015,42,42
8,COJOC VASILE,147050307,"Sept 2014, Oct 2014",84,84
9,RADULESCU VICTOR,147352628,"Sept 2014, Oct 2014, Nov 2014, Dec 2014",168,168
10,RADAU DUMITRU,148054764,"Feb 2017, Mar 2017",168,168
11,COVACIU PETRU,148054802,Iun 2016,84,84
12,BOT IOAN,14808634,"Aug 2014, Sept 2014, Oct 2014, Nov 2014",168,168

^^ 头是这个:

ANPIS,,,,,,,
AGENTIA JUDETEANA PENTRU PLATI SI INSPECTIE SOCIALA TIMIS,,,,,,,
,,,,,,,
Macheta Comparativa CREDITORI - numai pentru Beneficiile a caror Evidenta se tine si in Contabilitate si in aplicatia SAFIR,,,,,,,
Situatie ANALITICA - NOMINAL la 30.06.2017,,,,,,,
1. ALOCATIA DE STAT PENTRU COPII,,,,,,,
Nr. Benef,Nume Prenume,CNP,Data Constituirii,Suma Contabilitate,Suma SAFIR,Differenta Suma,Explicatii daca exista diferente
1,2,3,4,5,6,7=5-6,8

【问题讨论】:

标签: python excel python-3.x pandas csv


【解决方案1】:

您可以读取每个文件两次 - 第一个标题仅使用参数nrows,然后正文使用skiprows

那么也需要写两次。

解决方案有点复杂,因为 pandas 解析数据错误 - 不支持 8 级别的 MulttiIndex。如果不设置 headers,则 header 中的数据与 body 连接,输出混乱。

with ExcelWriter('my_excel.xlsx') as ew:
    for csv_file in ordered_list:
        df1 = pandas.read_csv(path + csv_file, nrows=8, header=None)
        df2 = pandas.read_csv(path + csv_file, skiprows=8, header=None)
        df1.to_excel(ew, index = False, sheet_name=csv_file[:-4], encoding='utf-8', header=False)
        row = len(df1.index)
        df2.to_excel(ew, index = False, sheet_name=csv_file[:-4], encoding='utf-8', startrow=row , startcol=0, header=False)

使用apply 删除' by strip 并转换为int by astype

cols = ['G','H']

with ExcelWriter('my_excel.xlsx') as ew:
    for csv_file in ordered_list:
        df = pandas.read_csv(path + csv_file)
        df[cols] = df[cols].astype(str).apply(lambda x: x.str.strip("'")).astype(int)
        print (df.head())
        df.to_excel(ew, index = False, sheet_name=csv_file[:-4], encoding='utf-8')

另一种解决方案是使用带有自定义函数的参数converters

cols = ['G','H']

def converter(x):
    return int(x.strip("'"))
#define each column
converters={x:converter for x in cols}

with ExcelWriter('my_excel.xlsx') as ew:
    for csv_file in ordered_list:
        df = pandas.read_csv(path + csv_file, converters=converters)
        print (df.head())
        df.to_excel(ew, index = False, sheet_name=csv_file[:-4], encoding='utf-8')

【讨论】:

  • 第一种方法我会得到:IndexError: 至少一张纸必须是可见的,而第二种则什么都没有发生。
  • 我再次检查并使用我的样本数据进行测试。第一个解决方案已更改。第二个解决方案有效,但也很简单。你现在可以检查吗?我通过print (df.head()) 添加测试df - 它返回带有int 列的DataFrame G, H
  • 我认为我的标题字符串弄乱了格式,当我删除所有 csv 数据上的标题时很好,我可以以某种方式跳过标题格式以便它不会影响吗?
  • 如果不需要读取列,我认为您可以在 read_csv 中使用header=None, skiprows=1)
  • 哇,干得好,先生,非常感谢!非常尊重!真的很感激!
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 2021-02-18
  • 1970-01-01
  • 1970-01-01
  • 2018-09-25
  • 2019-12-21
  • 1970-01-01
  • 2011-03-19
相关资源
最近更新 更多