【发布时间】:2025-11-23 21:10:01
【问题描述】:
我是数据科学的新手,在通过协同过滤创建图书推荐系统时遇到问题。有人可以就以下错误提出建议。
import pandas as pd
from sklearn.model_selection import train_test_split
import numpy as np
data = pd.read_csv('BX-Book-Ratings.csv',engine = 'python')
df = data.iloc[1:10000,:]
print(df)
print(df.dtypes)
df['isbn']= pd.to_numeric(df['isbn'], errors = 'coerce')
df = df[np.isfinite(df).all(1)]
df['isbn'] = df['isbn'].astype(np.int64)
from sklearn.model_selection import train_test_split
n_users = df.user_id.unique().shape[0]
n_book = df.isbn.unique().shape[0]
train_data, test_data = train_test_split(df, test_size=0.5)
print(n_users , n_book)
train_data_matrix = np.zeros((n_users, n_book))
for line in train_data.itertuples():
#[user_id index, book_id index] = given rating.
train_data_matrix[line[1] - 1, line[2] - 1] = line[3]
train_data_matrix
--------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-125-caa0bcd40167> in <module>
2 for line in train_data.itertuples():
3 #[user_id index, book_id index] = given rating.
----> 4 train_data_matrix[line[1] - 1, line[2] - 1] = line[3]
5 train_data_matrix
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
【问题讨论】:
-
据我从您的代码中了解到,您试图通过从每个字段中减去 1 作为训练数据部分的 isbn 字段作为索引。问题是 isbn 值不能反映我猜的实际索引,也可能不在数据框的范围内。
标签: python-3.x data-science collaborative-filtering