python pandas数据框索引匹配答案

【问题标题】：python pandas dataframe index matchpython pandas数据框索引匹配
【发布时间】：2016-05-30 23:26:36
【问题描述】：

在 python pandas 数据框“df”中，我有以下三列：

song_id | user_id | play_count

我有一个基于 play_count（用户听一首歌的次数）发明的评分表：

play_count | rating
1-33       | 1
34-66      | 2
67-99      | 3   
100-199    | 4
>200       | 5

我正在尝试根据播放次数在此表中添加一列“评分”。例如，如果 play_count=2，则评分将为“1”。

原来是这样的

song_id | user_id | play_count | rating
X232    | u8347   | 2          | 1
X987    | u3701   | 50         | 2
X271    | u9327   | 10         | 1
X523    | u1398   | 175        | 4

在 excel 中，我会使用匹配/索引来执行此操作，但我不知道如何在 python/pandas 中执行此操作。

它会是 if/else 循环和 isin 的组合吗？

【问题讨论】：

标签： python if-statement pandas dataframe

【解决方案1】：

您需要这些范围的端点，就像在 Excel 中需要的那样：

import numpy as np
bins = [1, 33, 66, 99, 199, np.inf]

然后你可以使用pd.cut找到对应的评分：

pd.cut(df['play_count'], bins=bins, include_lowest=True, labels=[1, 2, 3, 4, 5]).astype(int)

我在末尾添加了astype(int)，因为 pd.cut 返回一个分类序列，因此您无法对其进行算术计算。

【讨论】：

太好了，我刚刚做了这个 df['rating']= pd.cut(df['play_count'], bins=bins, include_lowest=True, labels=[1, 2, 3, 4 , 5]).astype(int) 并且有效！非常感谢您的快速回复，非常有帮助！现在我可以创建一个内容过滤推荐模型

【解决方案2】：

我认为如果您将 play_count 表更改为使用最小值/最大值，如下所示：

playcount:

min | max | rating
1   |33   | 1
34  |66   | 2
67  |99   | 3   
100 |199  | 4
200 |np.inf  | 5

当然你需要import numpy as np

然后你可以这样做：

df['rating'] = play_count[(df['play_count'] >= play_count['min']) & (df['play_count'] <= play_count['max'])].rating

【讨论】：