【问题标题】:TypeError during resampling重采样期间的 TypeError
【发布时间】:2021-01-28 08:00:58
【问题描述】:

我正在尝试对具有不平衡类的数据集应用重采样。 我所做的如下:

from sklearn.utils import resample

y = df.Label

vectorizer = CountVectorizer()
X = vectorizer.fit_transform(df['Text'].replace(np.NaN, ""))

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.30, stratify=y)

# concatenate our training data back together
X = pd.concat([X_train, y_train], axis=1)

# separate minority and majority classes
not_df = X[X.Label==0]
df = X[X.Label==1]

# upsample minority
df_upsampled = resample(df,
                          replace=True,
                          n_samples=len(not_df), 
                          random_state=27) 

# combine majority and upsampled minority
upsampled = pd.concat([not_df, df_upsampled])

不幸的是,我在这一步遇到了一些问题:X = pd.concat([X_train, y_train], axis=1),即

/anaconda3/lib/python3.7/site-packages/pandas/core/reshape/concat.py in concat(objs, axis, join, ignore_index, keys, levels, names, verify_integrity, sort, copy)
    279         verify_integrity=verify_integrity,
    280         copy=copy,
--> 281         sort=sort,
    282     )
    283 

/anaconda3/lib/python3.7/site-packages/pandas/core/reshape/concat.py in __init__(self, objs, axis, join, keys, levels, names, ignore_index, verify_integrity, copy, sort)
    355                     "only Series and DataFrame objs are valid".format(typ=type(obj))
    356                 )
--> 357                 raise TypeError(msg)
    358 
    359             # consolidate

TypeError: cannot concatenate object of type '<class 'scipy.sparse.csr.csr_matrix'>'; only Series and DataFrame objs are valid

您可以将 Text 列视为

Text
Have a non-programming question?
More helpful links
I am trying to apply...

希望你能帮我处理一下。

【问题讨论】:

    标签: python scikit-learn countvectorizer


    【解决方案1】:

    您必须先将X_train 转换为Dataframe,然后才能使用concat

    X = pd.concat([pd.DataFrame(X_train), y_train], axis=1)
    

    【讨论】:

      猜你喜欢
      • 2017-12-02
      • 2019-09-07
      • 1970-01-01
      • 1970-01-01
      • 2018-08-05
      • 2020-12-28
      • 1970-01-01
      • 2012-11-26
      相关资源
      最近更新 更多