【问题标题】:remove columns and replace title in different column when merging csv files合并csv文件时删除列并替换不同列中的标题
【发布时间】:2019-06-13 21:16:49
【问题描述】:

我使用 pd.merge 合并了两个单独的 csv 文件。结果如下所示:

基本上我想在每种情况下删除区域列并将其替换为列的名称,我想要的结果如下所示:

我当前用于合并 csv 文件并将结算日期向左移动的代码是这样的:

import pandas as pd
data1 = pd.read_csv("QLD.csv") 
data2 = pd.read_csv("VIC.csv")
result = pd.merge(data1[['REGION', 'TOTALDEMAND', 'RRP','SETTLEMENTDATE']], data2[['REGION', 'TOTALDEMAND', 'RRP','SETTLEMENTDATE']], on='SETTLEMENTDATE')
cols = result.columns.tolist()
cols.insert(0, cols.pop(cols.index('SETTLEMENTDATE')))
result = result.reindex(columns= cols)
result.to_csv("masterfile.csv", index=False)

我的问题是如何修改我的代码以达到我想要的结果?

错误:

Traceback (most recent call last):
  File "/Users/george/Desktop/collate/merge pdf.py", line 9, in <module>
    result.columns=['SETTLEMENTDATE','QLD DEMAND','QLD RRP','VLC DEMAND','VLC RRP']
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/generic.py", line 4389, in __setattr__
    return object.__setattr__(self, name, value)
  File "pandas/_libs/properties.pyx", line 69, in pandas._libs.properties.AxisProperty.__set__
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/generic.py", line 646, in _set_axis
    self._data.set_axis(axis, labels)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/internals.py", line 3323, in set_axis
    'values have {new} elements'.format(old=old_len, new=new_len))
ValueError: Length mismatch: Expected axis has 6 elements, new values have 5 elements

编辑1:

import pandas as pd
data1 = pd.read_csv("QLD.csv") 
data2 = pd.read_csv("VIC.csv")
result = pd.merge(data1[['REGION', 'TOTALDEMAND', 'RRP','SETTLEMENTDATE']], data2[['REGION', 'TOTALDEMAND', 'RRP','SETTLEMENTDATE']], on='SETTLEMENTDATE')
cols = result.columns.tolist()
cols.insert(0, cols.pop(cols.index('SETTLEMENTDATE')))
result = result.reindex(columns= cols)
result = result.drop(result.columns[[1, 4]], axis=1)
result = result.rename(columns={'SETTLEMENTDATE': 'SETTLEMENTDATE', 'TOTALDEMAND_x': 
                    'QLD DEMAND','RRP_x':'QLD RRP','TOTALDEMAND_x':'VIC DEMAND','RRP_y':'VIC RRP'})
result.to_csv("masterfile.csv", index=False)

Excel 文件:

谢谢!

【问题讨论】:

    标签: python pandas csv numpy


    【解决方案1】:

    您可以像这样删除区域列并在代码中包含result.columns=['col1','col2',....] 以重命名列。

        import pandas as pd
        data1 = pd.read_csv("QLD.csv") 
        data2 = pd.read_csv("VIC.csv")
        result = pd.merge(data1[['REGION', 'TOTALDEMAND', 'RRP','SETTLEMENTDATE']], data2[['REGION', 'TOTALDEMAND', 'RRP','SETTLEMENTDATE']], on='SETTLEMENTDATE')
        cols = result.columns.tolist()
        cols.insert(0, cols.pop(cols.index('SETTLEMENTDATE')))
        result = result.reindex(columns= cols)
        result = result[result.columns.drop('REGION_x','REGION_y')]
        result.columns=['SETTLEMENTDATE','QLD DEMAND','QLD RRP','VLC DEMAND','VLC RRP']
        result.to_csv("masterfile.csv", index=False)
    

    【讨论】:

    • 当我运行你的代码时我收到一个错误,错误现在在我的原始帖子中(我编辑了它)
    【解决方案2】:

    合并数据框后,您可以使用 drop 删除列。然后 jsut 使用rename 重命名列。

    result = result.drop(result.columns[[1, 4]], axis=1)  # df.columns is zero-based pd.Index 
    result = result.rename(columns={'SETTLEMENTDATE': 'SETTLEMENTDATE', 'TOTALDEMAND_x': 
                        'QLD DEMAND','RRP_x':'QLD RRP','TOTALDEMAND_y':'VIC DEMAND','RRP_y':'VIC RRP'})
    

    【讨论】:

    • 您的代码看起来是正确的,但它在某处塞满了东西?我编辑了我的帖子来给你看这个?
    • @newtoR 我编辑了我的帖子。抱歉,我写的不是TOTALDEMAND_y,而是TOTALDEMAND_x
    猜你喜欢
    • 2014-10-15
    • 1970-01-01
    • 1970-01-01
    • 2012-12-08
    • 1970-01-01
    • 1970-01-01
    • 2016-01-15
    • 2021-05-13
    相关资源
    最近更新 更多