TypeError：int() 参数必须是字符串、类似字节的对象或数字，而不是“DataFrame”

【问题标题】：TypeError: int() argument must be a string, a bytes-like object or a number, not 'DataFrame'TypeError：int() 参数必须是字符串、类似字节的对象或数字，而不是“DataFrame”
【发布时间】：2016-12-06 08:42:18
【问题描述】：

我有数据框，我需要在使用最近邻法之前对质量进行评估。我使用sklearn.cross_validation.KFold，但我不知道，我怎样才能给这个函数一个数据框。

quality = KFold(df, n_folds=5, shuffle=True, random_state=42)

但它会返回

TypeError: int() argument must be a string, a bytes-like object or a number, not 'DataFrame'

我该如何解决？

【问题讨论】：

标签： python pandas scikit-learn

【解决方案1】：

您应该传递要执行拆分的行数：

quality = KFold(len(df), n_folds=5, shuffle=True, random_state=42)

这将使用 df 的行数并返回一个索引数组来执行拆分，然后您可以使用它来切片 df：

for train_index, test_index in quality:
    # do something with slices
    df.iloc[train_index]
    df.iloc[test_index]

如果您的 df 索引是 int64 索引并且是单调的并且从 0 增加，那么您可以使用 loc 而不是 iloc

【讨论】：