【发布时间】:2019-06-06 00:40:05
【问题描述】:
我试图遍历我的列,如果该列是类别而不是其他类别,则采取不同的行动。
使用以下方法适用于属于类别的系列,但在使用object dtype 检查系列时会出错。
if series.dtype == 'category':
# do something
适用于类别,但如果 dtype 为 object,则抛出:
错误:
Traceback (most recent call last):
File "", line 382, in trace_task
R = retval = fun(*args, **kwargs)
File "", line 54, in run_data_template_task
data_template.run(data_bundle, columns=columns)
File "", line 531, in run
self.to_parquet(data_bundle, columns=columns)
File "", line 195, in to_parquet
df = self.parse_df(df, columns=columns, overwrite_columns=overwrite_columns)
File "", line 378, in parse_df
df[col.name] = parse_series_with_nans(df[col.name], 'str')
File "", line 369, in parse_series_with_nans
if series.dtype == 'category':
TypeError: data type "category" not understood
另一方面,使用:
if series.dtype is 'category':
# do something
即使 dtype 是 category,也会返回 False(这是有道理的,因为它显然不是同一个对象)
一个可复制的例子:
df = pd.DataFrame({'category_column': ['a', 'b', 'c'], 'other_column': [1, 2, 3]})
df['category_column'] = df['category_column'].astype('category')
df['category_column'].dtype is 'category'
Out[46]: False
df['category_column'].dtype == 'category'
Out[47]: True
df['other_column'].dtype == 'category'
Traceback (most recent call last):
File "", line 3296, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-48-c6cc61c458d0>", line 1, in <module>
d['other_column'].dtype == 'category'
TypeError: data type "category" not understood
【问题讨论】:
-
可能最好使用
select_dtypes,但您也可以检查dtype的名称 -
df['other_column'].dtype.name=='category'返回False,正如@user3483203 提到的那样。您可以查看:stackoverflow.com/questions/26924904/…