【问题标题】:Map a column from df2 based on check whether string column value of df1 matches with any column(list type) of df2根据检查 df1 的字符串列值是否与 df2 的任何列(列表类型)匹配,从 df2 映射列
【发布时间】:2021-08-07 14:54:19
【问题描述】:

我有两个数据框 A 和 B。我想在数据框 B 中创建一个新列 'suggested_Vendor',其中包含基于某些检查的数据框 A 的相应映射:

  1. 添加数据框 A 中的第一个“suggested_Vendor”,数据框 B 的水果值与数据框 A 的“preferred_fruits”列表类型列之间存在任何匹配项。
  2. 如果不存在匹配项,则在数据帧 B 输出中将“suggested_Vendor”返回为“None”
  3. 如果 vendor_capacity 超过,则匹配数据框 A 中第二个最佳首选供应商的名称,依此类推。
  4. 两个数据帧中的 Id 、userid 之间没有关系

数据框 A

| Id | vendor_name| preferred_fruits          |vendor_capacity|
| ---| -----------| --------------------------|---------------|
| 1  | X          |['apple','orange','banana']|2              |
| 2  | Y          |['kiwi']                   |1              |
| 3  | Z          |['banana','orange']        |1              | 
| 4  | W          |['apple']                  |1              |

数据框 B

| userid | fruit      |
| ---    | -----------|
| 1      | apple      |
| 2      | orange     |
| 3      | apple      | 
| 4      | banana     |
| 5      | kiwi       |
| 6      | strawberry |

输出数据帧 B

| userid | fruit      | suggested_Vendor|
| ---    | -----------|-----------------|
| 1      | apple      | X               |
| 2      | orange     | X               |
| 3      | apple      | W               |
| 4      | banana     | Z               |
| 5      | kiwi       | Y               | 
| 6      | strawberry | None            |  

任何pythonic方式。我希望能对代码进行一些解释。

【问题讨论】:

  • 您将在用户 ID 4 处获得X
  • 没有 X 的供应商容量是 2 。这就是为什么第二个匹配在这里映射

标签: python pandas dataframe


【解决方案1】:

请在下面找到答案,我已经在cmets中解释了步骤。 我在 dfA 中进行了修改,删除了 fruits 列中带有列表的行,这样 dfA 就有多行相同的供应商但有不同的水果(也是更好的数据库设计)。

import pandas as pd

# Create Dataframes
dfA = pd.DataFrame()
dfA["vendor_name"] = ["X","Y","Z","W"]
dfA["fruits"] = [['apple','orange','banana'],['kiwi'],['banana','orange'],['apple']]
dfA["cap"] = [2,1,1,1]

dfB = pd.DataFrame()
dfB["userid"] = [1,2,3,4,5,6]
dfB["fruit"] = ["apple","orange","apple","banana","kiwi","strawberry"]

"""
Add new rows in dfA, by splitting the "fruits" list
Now, each row in dfA will have a single fruit only
"""

l = len(dfA)
for index, row in dfA.iterrows():
    for fruit in row["fruits"]:
        newrow = pd.Series([row["vendor_name"],fruit, row["cap"]], index=["vendor_name","fruits","cap"])
        dfA = dfA.append(newrow, ignore_index=True)

# removing the earlier rows with list of fruits in each column
dfA = dfA[l:]

# Add current capacity column in dfA
dfA["curr_cap"] = dfA["cap"].copy()

# Add vendor column in dfB
dfB["vendor"] = ""

# Loop over dfB to select vendor
for index,row in dfB.iterrows():
    # get fruit
    fruit = row["fruit"]
    # get available vendors
    df = dfA[(dfA['fruits'] == fruit) & (dfA["curr_cap"] > 0)]
    # if vendors are available
    if len(df):
        if len(df) > 1:
            # if more than 1 vendor available, sort (descending) by current capacity
            df = df.sort_values(by = 'curr_cap', ascending=False)
        # get vendor name
        selected_vendor = df.iloc[0]["vendor_name"]
        # reduce capacity of the vendor in all rows where vendor exists
        dfA.loc[dfA['vendor_name'] == selected_vendor, 'curr_cap'] -= 1
        # set selected vendor in dfB
        dfB.at[index,"vendor"] = selected_vendor
    # if no vendors available
    else:
        dfB.at[index, "vendor"] = None


print(dfB)

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2019-07-06
    • 2021-03-28
    • 1970-01-01
    • 2018-08-06
    • 2020-09-08
    • 1970-01-01
    • 2021-04-07
    相关资源
    最近更新 更多