【发布时间】:2023-04-09 07:48:01
【问题描述】:
我有关于各种客户属性(自我描述和年龄)的数据,以及这些客户是否会购买特定产品的二元结果
{"would_buy": "No",
"self_description": "I'm a college student studying biology",
"Age": 19},
我想在self-description 上使用MultinomialNB 来预测would_buy,然后将这些预测合并到would_buy 上的逻辑回归模型中,该模型也将age 作为协变量。
到目前为止的文本模型代码(我是 SciKit 的新手!),带有一个简化的数据集。
from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
#Customer data that includes whether a customer would buy an item (what I'm interested), their self-description, and their age.
data = [
{"would_buy": "No", "self_description": "I'm a college student studying biology", "Age": 19},
{"would_buy": "Yes", "self_description": "I'm a blue-collar worker", "Age": 20},
{"would_buy": "No", "self_description": "I'm a Stack Overflow denzien", "Age": 56},
{"would_buy": "No", "self_description": "I'm a college student studying economics", "Age": 20},
{"would_buy": "Yes", "self_description": "I'm a UPS worker", "Age": 35},
{"would_buy": "No", "self_description": "I'm a Stack Overflow denzien", "Age": 56}
]
def naive_bayes_model(customer_data):
self_descriptions = [customer['self_description'] for customer in customer_data]
decisions = [customer['would_buy'] for customer in customer_data]
vectorizer = TfidfVectorizer(stop_words='english', ngram_range=(1,2))
X = vectorizer.fit_transform(self_descriptions, decisions)
naive_bayes = MultinomialNB(alpha=0.01)
naive_bayes.fit(X, decisions)
train(naive_bayes, X, decisions)
def train(classifier, X, y):
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=22)
classifier.fit(X_train, y_train)
print(classification_report(classifier.predict(X_test), y_test))
def main():
naive_bayes_model(data)
main()
【问题讨论】:
标签: python-3.x machine-learning scikit-learn nlp logistic-regression