【问题标题】:matching rows between dataframes in pandas in pythonpython中pandas中数据帧之间的匹配行
【发布时间】:2018-04-28 14:08:03
【问题描述】:

我有两个数据框,

df1,

 Names
 one two three
 Sri is a good player
 Ravi is a mentor
 Kumar is a cricketer

df2,

 values
 sri
 NaN
 sri, is
 kumar,cricketer

我正在尝试获取 df1 中包含 df2 中所有项目的行

我的预期输出是,

 values             Names
 sri                Sri is a good player
 NaN
 sri, is            Sri is a good player
 kumar,cricketer    Kumar is a cricketer

我试过了,df1["Names"].str.contains("|".join(df2["values"].values.tolist()))

但我无法达到预期的输出,因为它有 (",")。请帮忙

【问题讨论】:

  • 应该是匹配,顺序不是问题

标签: python pandas dataframe data-analysis


【解决方案1】:

使用集合

s1 = df1.Names.dropna()
s1.loc[:] = [set(x.lower().split()) for x in s1.values.tolist()]
a1 = s1.values

s2 = df2['values'].dropna()
s2.loc[:] = [set(x.replace(' ', '').lower().split(',')) for x in s2.values.tolist()]
a2 = s2.values

i = np.column_stack([a1 >= a2[:, None], [True] * len(a2)]).argmax(1)

df2.assign(Names=pd.Series(
    np.append(df1.Names.values, np.nan)[i], s2.index
))

            values                 Names
0              sri  Sri is a good player
1              NaN                   NaN
2          sri, is  Sri is a good player
3  kumar,cricketer  Kumar is a cricketer

【讨论】:

  • 我不想单独输出df。我想将它添加到我的 df2
  • 然后将结果分配回df2。或者直接分配给一个新列,而不是使用assing。喜欢df2.loc[:, 'Names'] = pd.Series(np.append(df1.Names.values, np.nan)[i], s2.index)
  • 成功了,你能给我推荐一个轻松学习熊猫的最佳资源吗?
  • 没有什么是容易的!
  • 1.开始here 了解熊猫可以做什么。 2.给自己一个数据分析任务。并使用熊猫来解决。如果需要,请提出问题。 3.回答别人的问题。即使您不发布答案,也请阅读问题并找出答案。阅读其他人对您刚刚尝试回答的问题的回答。 4. 练习!
【解决方案2】:
import pandas as pd
names =  [
    'one two three',
    'Sri is a good player',
    'Ravi is a mentor',
    'Kumar is a cricketer'
]
values = [
    'sri',
    'NaN',
    'sri, is',
    'kumar,cricketer',
]

names = pd.Series(names)
values = pd.DataFrame(values, columns=['values'])

def foo(words):
    names_copy = names.copy()

    for word in words.split(','):
        names_copy = names_copy[names_copy.str.contains(word, case=False)]

    return names_copy.values

 values['names'] = values['values'].map(foo)
 values


    values          names
0   sri             [Sri is a good player]
1   NaN             []
2   sri, is         [Sri is a good player]
3   kumar,cricketer [Kumar is a cricketer]

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2020-05-24
    • 2018-05-08
    • 1970-01-01
    • 2012-08-20
    • 1970-01-01
    • 2016-01-21
    • 1970-01-01
    相关资源
    最近更新 更多