如何遍历数据框答案

【问题标题】：How to iterate over a data frame如何遍历数据框
【发布时间】：2019-08-06 01:47:21
【问题描述】：

我有一个用户、书籍和评分的数据集，我想找到对特定书籍评分高的用户，而对于这些用户，我想找到他们也喜欢的其他书籍。

我的数据如下：

df.sample(5)

    User-ID     ISBN    Book-Rating
49064   102967  0449244741  8
60600   251150  0452264464  9
376698  52853   0373710720  7
454056  224764  0590416413  7
54148   25409   0312421273  9

到目前为止我做到了：

df_p = df.pivot_table(index='ISBN', columns='User-ID', values='Book-Rating').fillna(0)
lotr = df_p.ix['0345339703'] # Lord of the Rings Part 1
like_lotr = lotr[lotr > 7].to_frame()
users = like_lotr['User-ID']

最后一行失败

KeyError: '用户 ID'

我想获得对 LOTR > 7 评分的用户，让那些用户进一步从矩阵中找到他们也喜欢的电影。

我们将不胜感激。谢谢。

【问题讨论】：

检查您的df.columns。可能在User-ID 部分之前或之后有一些额外的空间。列的名称应该完全正确，否则将引发 KeyError
@rafaelc 它只说 Index(['0345339703'], dtype='object') 很奇怪，因为我将它投射到框架中。
可能相关：stackoverflow.com/questions/16476924/…
df.columns 的确切输出是什么？粘贴到这里
@serafeim 输出：索引(['0345339703'], dtype='object')

标签： python pandas numpy

【解决方案1】：

在您的 like_lotr 数据框中，'User-ID' 是索引的名称，您不能像普通列一样选择它。这就是为什么users = like_lotr['User-ID'] 会引发KeyError。它不是一列。

此外，ix 已被弃用，最好在您的情况下使用loc。并且不要加引号：它必须是整数，因为 'User-ID' 最初是一列整数（至少来自您的示例）。

试试这样：

df_p = df.pivot_table(index='ISBN', columns='User-ID', values='Book-Rating').fillna(0)
lotr = df_p.loc[452264464] # used another number from your sample dataframe to test this code.
like_lotr = lotr[lotr > 7].to_frame()
users = like_lotr.index.tolist()

user 现在是一个包含您想要的 id 的列表。

使用上面的小样本和我用来测试的数字，user 是[251150]。

另一种解决方案是使用reset_index。最后两个 lin 应该如下所示：

like_lotr = lotr[lotr > 7].to_frame().reset_index()
users = like_lotr['User-ID']

reset_index 将索引放回列中。

【讨论】：

是的，就是这样！但是我现在如何过滤数据集/矩阵以仅保留这些用户并找到他们评价高的书籍？
那将是另一个问题。但是，一旦您在 users 中获得了 ID，您就可以返回到原始数据框并执行以下操作：df.loc[df['User-ID'].isin(users)]。这将选择所有用户，然后您可以从这里获取您想要的其他数据。