文本 Blob 朴素贝叶斯分类答案

【问题标题】：Text Blob Naive Bayes classification文本 Blob 朴素贝叶斯分类
【发布时间】：2018-07-29 08:22:40
【问题描述】：

我正在使用 textblob lib 进行分类，使用 naive bayes 进行分类，我有一个训练集，想检查我是否传递了一个单词，它应该在训练中检查并进行相应的分类，如果训练中不存在这个单词，它不应该建议任何分类。

示例：kartik 不在训练集中，但将其分类为“1”，对于训练集中不存在的任何其他单词也是如此。

如果我建议一些不在 train 中的词不应该给出“1”，有什么办法吗？

from textblob import TextBlob
from textblob.classifiers import NaiveBayesClassifier


train = [
 ('System is working fine', '1'),
 ('Issue Resolved ', '1'),
 ('Working Fine ', '1'),
 ('running smoothly', '1'),
 ("server is working fine ", '1'),
 ('software installed properly', '1'),
 ('Ticket resolved ', '1'),
 ("Laptop is not working ", '-1'),
 ('laptop issue', '-1'),
 ('upgrade laptop', '-1'),
 ('software not working','-1'),
 ('fix the issue','-1'),
 ('WIFI is not working','-1'),
 ('server is down','-1'),
 ('system is not working','-1')


]

c1 = NaiveBayesClassifier(train)
c1.classify("kartik")

【问题讨论】：

标签： python naivebayes textblob

【解决方案1】：

您可以尝试使用获取分类概率然后设置阈值，忽略给定以下的类标签。

prob_dist = cl.prob_classify("Lorem Ispum dolor sit amet")
cl.classify("Lorem Ipsum Dolor sit amet")
print(round(prob_dist.prob("1"), 2))
print(round(prob_dist.prob("-1"),2))

0.61

0.39

我观察到所有不存在的单词对第 1 类的概率为 0.61。您可以以此为起点。

但是，请正确测试所有正确的案例。设置阈值可能会对某些正确分类产生不利影响。

无论如何，增加训练数据的大小，您会看到更好的结果，这可以帮助您设置阈值

【讨论】：