在 Python 中使用 factorize() 后如何获取原始值？答案

【问题标题】：How to get original values after using factorize() in Python?在 Python 中使用 factorize() 后如何获取原始值？
【发布时间】：2018-02-18 10:13:53
【问题描述】：

我是一名初学者，尝试使用 Python 中的随机森林创建预测模型，并使用训练和测试数据集。 train["ALLOW/BLOCK"] 可以取 4 个期望值中的 1 个（所有字符串）。 test["ALLOW/BLOCK"] 是需要预测的。

y,_ = pd.factorize(train["ALLOW/BLOCK"])

y
Out[293]: array([0, 1, 0, ..., 1, 0, 2], dtype=int64)

我使用predict 进行预测。

clf.predict(test[features])

clf.predict(test[features])[0:10]
Out[294]: array([0, 0, 0, 0, 0, 2, 2, 0, 0, 0], dtype=int64)

如何获取原始值而不是数字值？下面的代码实际上是在比较实际值和预测值吗？

z,_= pd.factorize(test["AUDIT/BLOCK"])

z==clf.predict(test[features])
Out[296]: array([ True, False, False, ..., False, False, False], dtype=bool)

【问题讨论】：

标签： python pandas random-forest prediction

【解决方案1】：

首先需要将pd.factorize返回的label保存如下：

y, label = pd.factorize(train["ALLOW/BLOCK"])

然后在得到数值预测后，就可以通过label[pred]提取对应的标签：

pred = clf.predict(test[features])
pred_label = label[pred]

pred_label 包含具有原始值的预测。

不，您不应该重新分解测试预测，因为标签很可能会有所不同。考虑以下示例：

pd.factorize(['a', 'b', 'c'])
# (array([0, 1, 2]), array(['a', 'b', 'c'], dtype=object))

pd.factorize(['c', 'a', 'b'])
# (array([0, 1, 2]), array(['c', 'a', 'b'], dtype=object))

所以标签取决于元素的顺序。

【讨论】：