【问题标题】:how to map two rows of different dataframe based on a condition in pandas如何根据熊猫中的条件映射两行不同的数据框
【发布时间】:2018-08-07 22:29:08
【问题描述】:

我有两个数据框,

df1,

 Names
 one two three
 Sri is a good player
 Ravi is a mentor
 Kumar is a cricketer player

df2,

 values
 sri
 NaN
 sri, is
 kumar,cricketer player

我正在尝试获取 df1 中包含 df2 中所有项目的行

我的预期输出是,

 values                  Names
 sri                     Sri is a good player
 NaN
 sri, is                 Sri is a good player
 kumar,cricketer player  Kumar is a cricketer player

我试过了,df1["Names"].str.contains("|".join(df2["values"].values.tolist())) 我也试过了,

但我无法达到预期的输出,因为它有 (",")。请帮忙

【问题讨论】:

    标签: python pandas dataframe data-analysis


    【解决方案1】:

    在 Numpy 广播中使用集合逻辑。

    d1 = df1['Names'].fillna('').str.lower().str.split('[^a-z]+').apply(set).values
    d2 = df2['values'].fillna('').str.lower().str.split('[^a-z]+').apply(set).values
    
    i, j = np.where(d1 >= d2[:, None])
    
    df2.assign(Names=pd.Series(df1['Names'].values[j], df2['values'].index[i]))
    
                       values                        Names
    0                     sri         Sri is a good player
    1                     NaN                          NaN
    2                 sri, is         Sri is a good player
    3  kumar,cricketer player  Kumar is a cricketer player
    

    【讨论】:

      【解决方案2】:

      试试 -

      import pandas as pd
      
      df1 = pd.read_csv('sample.csv')
      df2 = pd.read_csv('sample_2.csv')
      
      df2['values']= df2['values'].str.lower()
      df1['names']= df1['names'].str.lower()
      
      df2["values"] = df2['values'].str.replace('[^\w\s]',' ')
      df2['values']= df2['values'].replace('\s+', ' ', regex=True)
      
      df1["names"] = df1['names'].str.replace('[^\w\s]',' ')
      df1['names']= df1['names'].replace('\s+', ' ', regex=True)
      
      df2['list_values'] = df2['values'].apply(lambda x: str(x).split())
      df1['list_names'] = df1['names'].apply(lambda x: str(x).split())
      
      list_names = df1['list_names'].tolist()
      
      def check_names(x, list_names):
          output = ''
          for list_name in list_names:
              if set(list_name) >= set(x):
                  output = ' '.join(list_name)
                  break
          return output
      
      df2['Names'] = df2['list_values'].apply(lambda x: check_names(x, list_names))
      print(df2)
      

      输出

      values                        Names
      0                     sri         sri is a good player
      1                     NaN                             
      2                  sri is         sri is a good player
      3  kumar cricketer player  kumar is a cricketer player
      

      解释

      这是一个模糊匹配问题。所以这是我应用的步骤 -

      1. 删除标点符号并拆分以获得df 上的唯一单词
      2. 全部小写以进行标准化匹配。
      3. 通过将字符串拆分为列表进行转换。
      4. 最后通过check_names()函数进行匹配得到想要的输出

      【讨论】:

        猜你喜欢
        • 2016-12-30
        • 1970-01-01
        • 2022-01-25
        • 1970-01-01
        • 2022-11-18
        • 2021-06-18
        • 2019-11-15
        • 2017-05-13
        • 1970-01-01
        相关资源
        最近更新 更多