如何从训练有素的随机森林模型中获得预测？答案

【问题标题】：how to get prediction from trained random forest model?如何从训练有素的随机森林模型中获得预测？
【发布时间】：2019-06-07 01:47:03
【问题描述】：

我有一个包含两列用户帖子（帖子）和个性类型（类型）的数据集，我需要根据使用此数据集的帖子的个性类型，所以我使用随机森林回归进行预测这是我的代码：-

df = pd.read_csv('personality_types.csv')

count_vectorizer = CountVectorizer(decode_error='ignore')
X = count_vectorizer.fit_transform(df['posts'])
y = df['type'].values

Xtrain, Xtest, Ytrain, Ytest = train_test_split(X, y, test_size=0.33)

random_forest = RandomForestClassifier(n_estimators=100)
random_forest.fit(Xtrain, Ytrain)
Y_prediction = random_forest.predict(Xtest)

准确度：

random_forest.score(Xtrain, Ytrain)
acc_random_forest = round(random_forest.score(Xtrain, Ytrain) * 100, 2)
print(round(acc_random_forest,2,), "%")

100%

现在我想从自定义文本中获得预测，我该如何实现？如何使用此模型分别获取帖子的个性类型。

【问题讨论】：

标签： python machine-learning scikit-learn random-forest

【解决方案1】：

在同一个数据集中创建一个新列 df 。将其命名为 custom_text 或 user_text 或其他任何内容。将输入存储在该列中，以便该列的所有行包含相同的值

custom_text = input("Enter Text")
custom_text = count_vectorizer.transform(df['custom_text'])
value_predicted = random_forest.predict(custom_text)
print(value_predicted[0])

因为 value_predicted 的所有值都包含相同的值

【讨论】：

【解决方案2】：

如果df 的自定义文本与posts 格式相同，您可以执行以下操作：

custom_text = count_vectorizer.transform(df['custom_text'])
value_predicted = random_forest.predict(custom_text)

value_predicted 包含结果。当然，count_vectorizer 和 random_forest 应该是您示例中的训练模型。

另外，你的例子中可能有一个错字，你应该检查测试的表现，而不是火车：

random_forest.score()
acc_random_forest = round(random_forest.score(Xtest, Ytest) * 100, 2)
print(round(acc_random_forest,2,), "%")
Out:
<Some score>

100% 准确度分数看起来像 overfitting。

【讨论】：

试过了！得到这个 ValueError: 模型的特征数必须与输入匹配。模型 n_features 为 14542，输入 n_features 为 286
看起来count_vectorizer 在某处被更改了。会不会意外被新版本替换（例如，您对新数据执行.fit_transform）？
另外，如果有这样的问题，你可以编辑你的问题并添加这个问题。如果您在jupyter notebook工作，可以尝试重新启动它并小心地从上到下运行-有时这个错误会消失，因为通常它是错误状态的错误。