模糊 C 均值中的 ValidationError答案

【问题标题】：ValidationError in fuzzy-c-means模糊 C 均值中的 ValidationError
【发布时间】：2021-11-05 20:33:00
【问题描述】：

我尝试在 jupyter notebook 和 colab 中运行，但仍然收到 fcmeans 的此错误。但它在不同的笔记本电脑上运行良好。 这是用于分割数据集的代码

# Stratified Sampling using Scikit-learn's Stratified Shuffle Split Class
from sklearn.model_selection import StratifiedShuffleSplit

split = StratifiedShuffleSplit(n_splits=1, test_size=0.25, random_state=42)

for train_index, test_index in split.split(data1, data1["class"]):
    strat_train_set = data1.loc[train_index]

    strat_test_set = data1.loc[test_index]
    train_set = strat_train_set.drop("class", axis=1) # drop labels for training set

train_labels = strat_train_set["class"].copy()
test_set = strat_test_set.drop("class", axis=1) # drop labels for testing set
test_labels = strat_test_set["class"].copy()

那我错过了什么？

【问题讨论】：

请不要发布图片并创建一个完整的示例。我们不知道tr_set 来自哪里。我们可以猜测它的类型，但如果我们不必这样做会更好。它使回答更容易。此外，您似乎应该查看它的数据类型以进一步调查。
感谢您的建议。 dtype 是 int64
最好是共享 colab 笔记本，以便我可以运行它。
这里是驱动链接drive.google.com/drive/folders/…

标签： python numpy machine-learning scikit-learn fuzzy-c-means

【解决方案1】：

这里的问题是，tr_set 不是numpy.ndarray。因此，您需要做的就是将数据帧作为 numpy 数组传递。

在您的情况下，如果在将数据传递给fit 之前使用to_numpy 函数（例如fcm.fit(tr_set.to_numpy())），它将起作用。

这在 fcm 文档中非常清楚。

【讨论】：

在 colab 上运行良好时再次在 jupyter notebook 中出现错误。 ValueError Traceback（最近一次调用最后一次） in 5 clf2 = DecisionTreeClassifier() 6 clf = GridSearchCV(clf2, parameters, score = 'balanced_accuracy', cv=10) ---- > 7 res1=clf.fit(tr_fin, train_labels) 8 res1.best_estimator_
@Aka001 如果是不同的错误，可能会提出一个新问题并标记我或让我知道链接。问题要简洁，直接用Minimal completeverifiable example指出错误。
感谢您的回复。实际上，由于最大编号，这是一个拟合错误。选择了 FCM(n_cluster)。假设，如果数据集中有 5 列，则“FCM(n_clusters=4)”或应固定在大约 70% 的原始列以下。这里的“FCM(n_clusters=4)”在 colab 中运行良好，但不适合 jupyter notebook 中的 scikit 分类器。所以我把“FCM(n_clusters=4)”改成了“FCM(n_clusters=2)”，终于成功了！当我尝试在 colab 中使用其他数据集运行相同的代码时，我不知道实际原因，它要求“FCM（n_clusters =）”函数的原始列大小完全相同。
@Aka001 然后它与您收到的初始错误不同。无论如何，如果将来出现相关问题，请在单独的问题中提问。这样得到答案的概率要高得多。
是的，会记住的。