【问题标题】:Sci-Kit Learn: Incorporate Naive Bayes Model Predictions into Logistic Regression?Scikit Learn:将朴素贝叶斯模型预测纳入逻辑回归?
【发布时间】:2023-04-09 07:48:01
【问题描述】:

我有关于各种客户属性(自我描述和年龄)的数据,以及这些客户是否会购买特定产品的二元结果

  {"would_buy": "No", 
  "self_description": "I'm a college student studying biology", 
  "Age": 19}, 

我想在self-description 上使用MultinomialNB 来预测would_buy,然后将这些预测合并到would_buy 上的逻辑回归模型中,该模型也将age 作为协变量。

到目前为止的文本模型代码(我是 SciKit 的新手!),带有一个简化的数据集。

from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report

#Customer data that includes whether a customer would buy an item (what I'm interested), their self-description, and their age. 
data = [
  {"would_buy": "No", "self_description": "I'm a college student studying biology", "Age": 19}, 
  {"would_buy": "Yes", "self_description": "I'm a blue-collar worker", "Age": 20},
  {"would_buy": "No", "self_description": "I'm a Stack Overflow denzien", "Age": 56}, 
  {"would_buy": "No", "self_description": "I'm a college student studying economics", "Age": 20}, 
  {"would_buy": "Yes", "self_description": "I'm a UPS worker", "Age": 35}, 
  {"would_buy": "No", "self_description": "I'm a Stack Overflow denzien", "Age": 56}
  ]

def naive_bayes_model(customer_data):
  self_descriptions = [customer['self_description'] for customer in customer_data]
  decisions = [customer['would_buy'] for customer in customer_data]

  vectorizer = TfidfVectorizer(stop_words='english', ngram_range=(1,2))
  X = vectorizer.fit_transform(self_descriptions, decisions)
  naive_bayes = MultinomialNB(alpha=0.01)
  naive_bayes.fit(X, decisions)
  train(naive_bayes, X, decisions)

def train(classifier, X, y):
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=22)
    classifier.fit(X_train, y_train)

    print(classification_report(classifier.predict(X_test), y_test))


def main():
  naive_bayes_model(data)



main()

【问题讨论】:

    标签: python-3.x machine-learning scikit-learn nlp logistic-regression


    【解决方案1】:

    简短的回答是在经过训练的naive_bayes 上使用predict_probapredict_log_proba 方法来为逻辑回归模型创建输入。这些可以与 Age 值连接,为您的 LogisticRegression 模型创建训练和测试集。

    但是,我确实想指出,您编写的代码在训练后无法访问您的 naive_bayes 模型。所以你肯定需要重构你的代码。

    抛开这个问题不谈,这就是我将naive_bayes 的输出合并到 LogisticRegression 中的方式:

    descriptions = np.array([customer['self_description'] for customer in data])
    decisions = np.array([customer['would_buy'] for customer in data])
    ages = np.array([customer['Age'] for customer in data])
    
    vectorizer = TfidfVectorizer(stop_words='english', ngram_range=(1,2))
    desc_vec = vectorizer.fit_transform(descriptions, decisions)
    naive_bayes = MultinomialNB(alpha=0.01)
    desc_train, desc_test, age_train, age_test, dec_train, dec_test = train_test_split(desc_vec, ages, decisions, test_size=0.25, random_state=22)
    
    naive_bayes.fit(desc_train, dec_train)
    nb_train_preds = naive_bayes.predict_proba(desc_train)
    lr = LogisticRegression()
    lr_X_train = np.hstack((nb_tarin_preds, age_train.reshape(-1, 1)))
    lr.fit(lr_X_train, dec_train)
    
    lr_X_test = np.hstack((naive_bayes.predict_proba(desc_test), age_test.reshape(-1, 1)))
    lr.score(lr_X_test, dec_test)
    

    【讨论】:

    • 感谢您的评论;这正是我想要的。如果可以,如果附加变量而不是 age 实际上是分类变量(例如:city),我会怎么做?您是否建议使用 one-hot 编码或制作两个朴素贝叶斯分类器然后乘以概率?
    • 我自己一直是一个热门编码的粉丝。
    猜你喜欢
    • 2021-02-09
    • 2015-09-16
    • 2020-03-26
    • 2013-10-08
    • 2016-08-21
    • 2017-06-21
    • 2013-09-09
    • 2016-08-16
    • 2016-05-01
    相关资源
    最近更新 更多