【发布时间】:2022-01-19 14:02:04
【问题描述】:
我有一个第 9583 行的数据,我将其拆分为 train_test_split。我想像这个例子一样使用 barplot 可视化我的数据训练和数据测试:
import pandas as pd
df = pd.read_excel("Data/data_clean_spacy_for_implementation.xlsx")
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
df["text"], df["label"], test_size=0.2, stratify=df["label"], random_state=42)
from sklearn.feature_extraction.text import TfidfVectorizer
vectorizer = TfidfVectorizer()
X_train = vectorizer.fit_transform(X_train)
X_test = vectorizer.transform(X_test)
X_array = X_train.toarray()
print(X_train.shape) #output (7666, 12222)
print(X_test.shape) #output (1917, 12222)
怎么做?
我的数据github
【问题讨论】:
标签: python pandas matplotlib seaborn