在 lightfm 中设置用户项目交互数据的正确方法答案

【问题标题】：Proper way to setup user-item interactions data in lightfm在 lightfm 中设置用户项目交互数据的正确方法
【发布时间】：2020-09-03 10:04:28
【问题描述】：

在我有关于其他项目/产品的其他隐式数据的情况下，当输入到 lightfm 模型时，设置数据的正确方法是什么。例如，我有 100k 个用户 x 200 个项目 交互数据，但是在实际应用中，我希望模型仅提供 200 个项目中的 50 个的推荐。那么如何设置数据呢？我正在考虑 2 个案例，但我不确定哪种方法是正确的：

案例 1：将整个矩阵（100k 用户 x 200 项）直接作为 lightfm 中的 interactions 参数提供。这种方式是更多的协作学习。

案例 2：仅将（100k 个用户 x 50 个项目）提供给 interactions 参数并将（100k x 150 个项目）矩阵用作 user_features。这样就可以进行更多基于内容的学习。

哪一个是正确的？此外，对于案例 1，模型评估（精度、召回率等）的效用函数是否可以仅针对选定项目进行推荐，例如，前 k 个推荐项目应仅从 50 个项目中获取，而不应推荐其他项目并从中计算精度、召回率等。

【问题讨论】：

标签： python lightfm

【解决方案1】：

您应该遵循案例 1。使用整个交互数据训练模型。在进行预测时，您可以将所需（50）项的索引作为参数传递给 model.predict。

从 lightfm 文档中，您可以看到 model.predict 将项目 ID 作为参数（在这种情况下将是您的 50 个项目的 ID）。

https://making.lyst.com/lightfm/docs/_modules/lightfm/lightfm.html#LightFM.predict

def predict(self, user_ids, item_ids, item_features=None, user_features=None, num_threads=1): """ 计算用户-项目对的推荐分数。

    Arguments
    ---------

    user_ids: integer or np.int32 array of shape [n_pairs,]
         single user id or an array containing the user ids for the
         user-item pairs for which a prediction is to be computed
    item_ids: np.int32 array of shape [n_pairs,]
         an array containing the item ids for the user-item pairs for which
         a prediction is to be computed
    user_features: np.float32 csr_matrix of shape [n_users, n_user_features], optional
         Each row contains that user's weights over features
    item_features: np.float32 csr_matrix of shape [n_items, n_item_features], optional
         Each row contains that item's weights over features
    num_threads: int, optional
         Number of parallel computation threads to use. Should
         not be higher than the number of physical cores.

【讨论】：

是的，我明白了，但我只是想知道哪种方法更有效。由于这 150 件商品不是我推荐的产品的一部分，因此最好将其包含在交互数据中或将其作为 user_features 提供，哪一项会产生良好的效果。我想我只需要尝试找出答案，但还是谢谢。