Python Pandas Dataframe 检查列表列并从另一个 Dataframe 返回 ID答案

【问题标题】：Python Pandas Dataframe check column of lists and return ID from another DataframePython Pandas Dataframe 检查列表列并从另一个 Dataframe 返回 ID
【发布时间】：2017-09-11 18:32:51
【问题描述】：

我有一个熊猫数据框df1，它有一个索引和一列列表，看起来像：

index   IDList
0   [1,3,5,7]
1   [2,4,5,8]
2   [6,8,9]
3   [1,2]

我有另一个 Pandas 数据框 df2，它以 NewID 作为索引，以及一列如下所示的列表：

NewID   IDList
1       [3]
2       [4,5]
3       [1,7]
4       [2]
5       [9,3]
6       [8]
7       [6]

我需要做的是，如果df1.IDList 中的任何项目存在于df2.IDList 中，则返回相关df2.NewID 的列表。

所以返回的d1 数据框看起来像：

index   IDList      NewID
0       [1,3,5,7]   [3,1,2,3,5]
1       [2,4,5,8]   [4,2,2,6]
2       [6,8,9]     [7,6,5]
3       [1,2]       [3,4]

编辑：请注意，在df2 中，IDList 中的 ID 可以出现在多行中（参见 @987654331 中的 ID 3 @ 并且 ID 3 出现在 df2 第 1 行和第 5 行）

我在想某种包含“any”和列表理解的np.where 语句？但不确定如何在df1 中申请每个IDList，它查看整个df2.IDList。也许某种.stack()？或.melt()？这在具有 df2 的 vlookup 的电子表格中很容易......

帮助感谢...

【问题讨论】：

标签： python pandas numpy list-comprehension melt

【解决方案1】：

# expand and map ids from IDList to NewID
flat_ids = pd.DataFrame({
    "NewID": pd.np.repeat(df2.NewID, df2.IDList.str.len().tolist()),
    "IDList": [x for l in df2.IDList for x in l]
}).set_index("IDList").NewID

# extract ids from flat ids using loc
df1['NewID'] = df1['IDList'].map(lambda x: flat_ids.loc[x].tolist())

【讨论】：

拍摄，df2的IDList列可能有重复。我会编辑
好的。我弄错了。如果 IDList 列中有重复项，这也应该有效。
获取：TypeError：repeat() 接受 2 个位置参数，但给出了 3 个
也许你有一些不平衡的括号或者你使用的是哪个版本的python和pandas？在 python 2.7.9 和 pandas 0.19.2 上，它似乎工作正常。
python 3.4.5，熊猫 17.1