SVM - 检测恶意软件流量答案

【问题标题】：SVM - Detect malware trafficSVM - 检测恶意软件流量
【发布时间】：2021-07-08 11:45:25
【问题描述】：

您好，我是机器学习新手

我从一些 pcap 收集了一些数据：示例如下所示

Byte                 Count    ByteAvg        isMalware
[74, 74, 74, ...]     3570    188.298880           0
[66, 69, 90, ...]     915     157.691803           0
                            .....
[103, 103, 76 ...]    1075    127.526512           1 
[66, 69, 90, ...]     6877    140.671805           1

我尝试了 svm 的示例代码

df = pd.read_csv("traffic.csv")
# split data and class
X = df.drop('isMalware', axis=1)
y = df['isMalware']

# Spilt train and test data 
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20)
# Training
svclassifier = SVC(kernel='linear')
svclassifier.fit(X_train, y_train)
y_pred = svclassifier.predict(X_test)

print(confusion_matrix(y_test,y_pred))
print(classification_report(y_test,y_pred))

但是会引发错误

ValueError: could not convert string to float: '[74, 74, 74, 66, 66, 66, 280

如何更改 CSV 格式并将其放入 SVM 分类器？

【问题讨论】：

标签： python pandas csv svm

【解决方案1】：

字节列没有正确的类型。

SVM 只使用数值（int，foat）特征，你可以阅读这个https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html

你可以删除字节列，这样 X 就变成了：

X = df.drop(columns = ['isMalware', 'Bytes'])

或者您可以使用来自 sklearn https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html 的 LabelEncode 标记列字节

from sklearn import preprocessing


le = preprocessing.LabelEncoder()
le.fit(X["Byte"])
X["Byte"] = le.transform(X["Byte"])

【讨论】：