pandas：从第二个数据框的索引创建列答案

【问题标题】：pandas : create column from the indices of a second dataframepandas：从第二个数据框的索引创建列
【发布时间】：2019-07-24 04:58:38
【问题描述】：

我正在比较两个数据帧（Small_df 和 Big_df）。两个数据框都有一个时间列。 Big_df 时间列按时间顺序排列，时间步长为 10 秒，而 Small_df 时间列没有固定时间步长。 Big_df 中的一些时间值存在于 Small_df 中，有时不止一次。

我想要完成的是在 Small_df 中创建一个新列，其中包含 Big_df 中具有匹配时间值的行的索引。这是两个数据帧的结构：（注意时间是时间戳格式）

Small_df：

print(Small_df['Date'].head())
0   2019-05-22 15:37:05
1   2019-05-22 15:40:25
2   2019-05-22 15:40:45
3   2019-05-22 15:40:45
4   2019-05-22 15:41:55

大_df：

print(Big_df['Date'].head())
0    2019-05-22 15:20:25
1    2019-05-22 15:20:35
2    2019-05-22 15:20:45
3    2019-05-22 15:20:55
4    2019-05-22 15:21:05

我们可以在 Big_df 的这个位置找到 Small_df 显示的对应时间：

print(Big_df['Date'].iloc[100:130])
100    2019-05-22 15:37:05
101    2019-05-22 15:37:15
102    2019-05-22 15:37:25
103    2019-05-22 15:37:35
104    2019-05-22 15:37:45
105    2019-05-22 15:37:55
106    2019-05-22 15:38:05
107    2019-05-22 15:38:15
108    2019-05-22 15:38:25
109    2019-05-22 15:38:35
110    2019-05-22 15:38:45
111    2019-05-22 15:38:55
112    2019-05-22 15:39:05
113    2019-05-22 15:39:15
114    2019-05-22 15:39:25
115    2019-05-22 15:39:35
116    2019-05-22 15:39:45
117    2019-05-22 15:39:55
118    2019-05-22 15:40:05
119    2019-05-22 15:40:15
120    2019-05-22 15:40:25
121    2019-05-22 15:40:35
122    2019-05-22 15:40:45
123    2019-05-22 15:40:55
124    2019-05-22 15:41:05
125    2019-05-22 15:41:15
126    2019-05-22 15:41:25
127    2019-05-22 15:41:35
128    2019-05-22 15:41:45
129    2019-05-22 15:41:55

我正在寻找的结果是这样的：

print(Small_df[['Date','Big_df_idx']].head())
0   2019-05-22 15:37:05   100
1   2019-05-22 15:40:25   120
2   2019-05-22 15:40:45   122
3   2019-05-22 15:40:45   122
4   2019-05-22 15:41:55   129

我可以通过这样做获得匹配值的相应索引：

Big_df_idx = Big_df[Big_df['Date'].isin(Small_df['Date'].astype(str).tolist())].index

print(Big_df_idx[0:10])
 Int64Index([100, 120, 122, 129, 153, 156, 159, 160, 177, 178], dtype='int64')

但是，这只返回一次索引，而我需要一些可以解释重复索引的东西。

谢谢

【问题讨论】：

Pandas: add dataframes to dataframe - match on index and column value的可能重复
您将如何处理Small_df 中新列中的索引值？
@jeschwar 我会和他们一起做 +1 并获得 Big_df 的后续时间步长，然后用这个更新的时间替换 Small_df 时间列。

标签： python pandas dataframe

【解决方案1】：

执行任务运行：

pd.merge(Small_df, Big_df.reset_index().rename(
    columns={'index': 'Big_df_idx'}), how='left')

成功的关键是将Big_df的索引复制到常规列中并将其重命名为Big_df_idx。

这样一个临时的DataFrame然后在left模式下与Small_df合并，仅获取来自Small_df 的日期，但具有来自的相应索引 Big_df 专栏。

【讨论】：

令人着迷。我只需将 .astype(str) 添加到 Small_df 就可以了。谢谢！

【解决方案2】：

在相对较小的数据上，您可以使用 map() 函数来解决您的问题，而不是创建新的 DataFrame 对象：

Small_df['id'] = Small_df['Date'].map(dict(zip(Big_df['Date'], Big_df.index)))

【讨论】：