【问题标题】:pandas merge on date column issue熊猫在日期列问题上合并
【发布时间】:2017-08-06 08:05:09
【问题描述】:

我正在尝试合并日期列上的两个数据框(尝试使用 objectdatetime.date 类型,但未能提供所需的合并输出:

import pandas as pd
df1 =  pd.DataFrame({'amt': {0: 1549367.9496070854,
      1: 2175801.78219801,
      2: 1915613.1629125737,
      3: 1703063.8323954903,
      4: 1770040.7987461537},
     'month': {0: '2015-02-01',
      1: '2015-03-01',
      2: '2015-04-01',
      3: '2015-05-01',
      4: '2015-06-01'}})
print(df1)


        amt             month
    0   1.549368e+06    2015-02-01
    1   2.175802e+06    2015-03-01
    2   1.915613e+06    2015-04-01
    3   1.703064e+06    2015-05-01
    4   1.770041e+06    2015-06-01



df2 =  {'factor': {datetime.date(2015, 2, 1): 1.0,
      datetime.date(2015, 3, 1): 1.0,
      datetime.date(2015, 4, 1): 1.0,
      datetime.date(2015, 5, 1): 1.0,
      datetime.date(2015, 6, 1): 0.99889679025914435},
     'month': {datetime.date(2015, 2, 1): datetime.date(2015, 2, 1),
      datetime.date(2015, 3, 1): datetime.date(2015, 3, 1),
      datetime.date(2015, 4, 1): datetime.date(2015, 4, 1),
      datetime.date(2015, 5, 1): datetime.date(2015, 5, 1),
      datetime.date(2015, 6, 1): datetime.date(2015, 6, 1)}}
df2 = pd.DataFrame(df2)
print(df2)

                factor      month
    2015-02-01  1.000000    2015-02-01
    2015-03-01  1.000000    2015-03-01
    2015-04-01  1.000000    2015-04-01
    2015-05-01  1.000000    2015-05-01
    2015-06-01  0.998897    2015-06-01


pd.merge(df2, df1, how='outer', on='month')

        factor       month            amt
    0   1.000000     2015-02-01      NaN
    1   1.000000     2015-03-01      NaN
    2   1.000000     2015-04-01      NaN
    3   1.000000     2015-05-01      NaN
    4   0.998897     2015-06-01      NaN
    5   NaN           2015-02-01    1.549368e+06
    6   NaN           2015-03-01    2.175802e+06
    7   NaN           2015-04-01    1.915613e+06
    8   NaN           2015-05-01    1.703064e+06
    9   NaN           2015-06-01    1.770041e+06

【问题讨论】:

    标签: python pandas merge data-manipulation


    【解决方案1】:

    我认为您需要先转换两列to_datetime,因为需要相同的dtypes

    df1.month = pd.to_datetime(df1.month)
    df2.month = pd.to_datetime(df2.month)
    
    print (pd.merge(df2, df1, how='outer', on='month'))
         factor      month           amt
    0  1.000000 2015-02-01  1.549368e+06
    1  1.000000 2015-03-01  2.175802e+06
    2  1.000000 2015-04-01  1.915613e+06
    3  1.000000 2015-05-01  1.703064e+06
    4  0.998897 2015-06-01  1.770041e+06
    

    #convert to str date column
    df2.month = df2.month.astype(str)
    
    print (pd.merge(df2, df1, how='outer', on='month'))
         factor       month           amt
    0  1.000000  2015-02-01  1.549368e+06
    1  1.000000  2015-03-01  2.175802e+06
    2  1.000000  2015-04-01  1.915613e+06
    3  1.000000  2015-05-01  1.703064e+06
    4  0.998897  2015-06-01  1.770041e+06
    

    【讨论】:

    • 存在无法将字符串列与日期列匹配的问题 - 需要相同的数据类型。
    • 但我得到df1.month.dtype = dtype('O')df2.month.dtype = dtype('O'),所以如果两者都是字符串类型,那为什么重要
    • 是的,你是对的,因为datelistsetstr 都是对象——见here
    • 转换为 str 没有帮助,转换为 pd.to_datetime()
    • 这有点混乱 - dtype 是对象,但 type 不是
    猜你喜欢
    • 2017-12-29
    • 2023-03-11
    • 1970-01-01
    • 1970-01-01
    • 2016-12-08
    • 2018-12-07
    • 2020-04-08
    • 2019-05-26
    相关资源
    最近更新 更多