【发布时间】:2020-04-17 10:43:06
【问题描述】:
我有一个数据框,记录了不同人拥有的水果的数量和类型。我想添加一列,指示每个人的顶级水果。如果一个人有 2 个以上排名靠前的水果(也就是领带),我想要一个列表(或元组)。
输入
例如,假设我的输入是这个数据框:
# Create all the fruit data
data = [{'fruit0':'strawberry','fruit0_count':23,'fruit1':'orange','fruit1_count':4,'fruit2':'grape','fruit2_count':27},
{'fruit0':'apple','fruit0_count':45,'fruit1':'mango','fruit1_count':45,'fruit2':'orange','fruit2_count':12},
{'fruit0':'blueberry','fruit0_count':30,'fruit1':'grapefruit','fruit1_count':32,'fruit2':'cherry','fruit2_count':94},
{'fruit0':'pineapple','fruit0_count':4,'fruit1':'grape','fruit1_count':4,'fruit2':'lemon','fruit2_count':67}]
# Add people's names as an index
df = pd.DataFrame(data, index=['Shawn', 'Monica','Jamal','Tracy'])
# Print the dataframe
df
。 . .创建输入数据框:
fruit0 fruit0_count fruit1 fruit1_count fruit2 fruit2_count
Shawn strawberry 23 orange 4 grape 27
Monica apples 45 mango 45 orange 12
Jamal blueberry 30 grapefruit 32 cherry 94
Tracy pineapple 4 grape 4 lemon 67
目标输出
我想要的是一个新列,它给出了每个人的顶级水果的名称。如果此人有两个(或更多)水果并列第一,我想要这些水果的列表或元组:
fruit0 fruit0_count fruit1 fruit1_count fruit2 fruit2_count top_fruit
Shawn strawberry 23 orange 4 grape 27 grape
Monica apple 45 mango 45 orange 12 (apple,mango)
Jamal blueberry 30 grapefruit 32 cherry 94 cherry
Tracy pineapple 4 grape 4 lemon 67 lemon
我的尝试远
我得到的最接近的是基于https://stackoverflow.com/a/38955365/6480859。
问题:
- 如果顶果有平局,它只会捕获一个顶果(莫妮卡的顶果只有苹果。)
- 真的很复杂。不是真的问题,但是如果有更直接的路径,我想学习它。
# List the columns that contain count numbers
cols = ['fruit0_count', 'fruit1_count', 'fruit2_count']
# Make a new dataframe with just those columns.
only_counts_df=pd.DataFrame()
only_counts_df[cols]=df[cols].copy()
# Indicate how many results you want. Note: If you increase
# this from 1, it gives you the #2, #3, etc. ranking -- it
# doesn't represent tied results.
nlargest = 1
# The next two lines are suggested from
# https://stackoverflow.com/a/38955365/6480859. I don't totally
# follow along . . .
order = np.argsort(-only_counts_df.values, axis=1)[:, :nlargest]
result = pd.DataFrame(only_counts_df.columns[order],
columns=['top{}'.format(i) for i in range(1, nlargest+1)],
index=only_counts_df.index)
# Join the results back to our original dataframe
result = df.join(result).copy()
# The dataframe now reports the name of the column that
# contains the top fruit. Convert this to the fruit name.
def id_fruit(row):
if row['top1'] == 'fruit0_count':
return row['fruit0']
elif row['top1'] == 'fruit1_count':
return row['fruit1']
elif row['top1'] == 'fruit2_count':
return row['fruit2']
else:
return "Failed"
result['top_fruit'] = result.apply(id_fruit,axis=1)
result = result.drop(['top1'], axis=1).copy()
result
。 . .输出:
fruit0 fruit0_count fruit1 fruit1_count fruit2 fruit2_count top_fruit
Shawn strawberry 23 orange 4 grape 27 grape
Monica apple 45 mango 45 orange 12 apple
Jamal blueberry 30 grapefruit 32 cherry 94 cherry
Tracy pineapple 4 grape 4 lemon 67 lemon
莫妮卡最喜欢的水果应该是苹果和芒果。
欢迎任何提示,谢谢!
【问题讨论】:
标签: python pandas numpy dataframe