【发布时间】:2021-03-22 03:08:53
【问题描述】:
我使用加载数据
X_train, X_test, y_train, y_test = load_data()
# y_train here has a string class name inside
然后我使用 sklearn 的编码器将我的数据更改为分类
from sklearn.preprocessing import LabelEncoder
数据示例
y_train = ["tomato","strawberry", "strawberry", "potato", "strawberry", "potato", "lemon"]
y_test = ["strawberry", "lemon", "lemon", "lemon"]
encoder = LabelEncoder()
y_train = encoder.fit_transform(y_train)
print(y_train)
>>>[3 2 2 1 2 1 0]
y_test = encoder.transform(y_test)
print(y_test)
>>>[2 0 0 0]
我可以打电话给encoder.inverse_transform(y_test) 来取回字符串
如何建立一个汇总表来显示类似的表格
Label | y_train_count | y_test_count
----------+---------------+-------------
strawberry| 3 | 1
potato | 2 | 0
... | ... | ...
【问题讨论】:
标签: python pandas numpy dataframe group-by