【发布时间】:2018-10-12 08:24:31
【问题描述】:
import pandas as pd
import numpy
from sklearn import cross_validation
from sklearn.naive_bayes import GaussianNB
fi = "df.csv"
# Open the file for reading and read in data
file_handler = open(fi, "r")
data = pd.read_csv(file_handler, sep=",")
file_handler.close()
# split the data into training and test data
train, test = cross_validation.train_test_split(data,test_size=0.6, random_state=0)
# initialise Gaussian Naive Bayes
naive_b = GaussianNB()
train_features = train.ix[:,0:127]
train_label = train.iloc[:,127]
test_features = test.ix[:,0:127]
test_label = test.iloc[:,127]
naive_b.fit(train_features, train_label)
test_data = pd.concat([test_features, test_label], axis=1)
test_data["p_malw"] = naive_b.predict_proba(test_features)
print "test_data\n",test_data["p_malw"]
print "Accuracy:", naive_b.score(test_features,test_label)
我编写了这段代码来接受来自 csv 文件的输入,该文件有 128 列,其中 127 列是特征,第 128 列是类标签。
我想预测样本属于每个类别的概率(有 5 个类别 (1-5))并将其打印到矩阵中并根据预测确定样本类别。 predict_proba() 没有给出所需的输出。请提出必要的更改建议。
【问题讨论】:
-
@mr_mo 你能帮忙吗
标签: python machine-learning scikit-learn naivebayes multiclass-classification