【发布时间】:2021-07-31 19:08:11
【问题描述】:
def opinion_df_gen(preprocessor):
op_files = {'POSITIVE': Path('clustering/resources/positive_words.txt').resolve(),
'NEGATIVE': Path('clustering/resources/negative_words.txt').resolve()}
df_ = pd.DataFrame()
for i, (sentiment, filepath) in enumerate(op_files.items()):
print(filepath)
word_set = preprocessor.lemmatize_words(file_path=filepath)
print(len(word_set))
df_['tokens'] = list(word_set)
preprocessor 是一个自定义类,preprocessor.lemmatize_words 返回单词/令牌的set。问题是df_['tokens'] = list(word_set) 抛出错误。
Traceback (most recent call last):
File "E:/Anoop/cx-index-score/sentiment_analysis/main.py", line 184, in <module>
main()
File "E:/Anoop/cx-index-score/sentiment_analysis/main.py", line 155, in main
to_label_df, token_col, target_col = opinion_df_gen(preprocessor)
File "E:/Anoop/cx-index-score/sentiment_analysis/main.py", line 35, in opinion_df_gen
df_['tokens'] = list(word_set)
File "C:\ProgramData\Anaconda3\envs\cx_env_\lib\site-packages\pandas\core\frame.py", line 3044, in __setitem__
self._set_item(key, value)
File "C:\ProgramData\Anaconda3\envs\cx_env_\lib\site-packages\pandas\core\frame.py", line 3120, in _set_item
value = self._sanitize_column(key, value)
File "C:\ProgramData\Anaconda3\envs\cx_env_\lib\site-packages\pandas\core\frame.py", line 3768, in _sanitize_column
value = sanitize_index(value, self.index)
File "C:\ProgramData\Anaconda3\envs\cx_env_\lib\site-packages\pandas\core\internals\construction.py", line 748, in sanitize_index
"Length of values "
ValueError: Length of values (4085) does not match length of index (1782)
但是如果你看到代码,数据框是空的。我不明白它说索引的长度为 1782。这段代码应该根据我的理解工作。 pandas 版本为1.1.5,python 版本为3.8.0。
【问题讨论】:
-
我不认为问题出在第一次迭代上。您需要确保所有组的长度相同或自行处理。
-
哦,现在我明白了。我应该把
df_ = pd.DataFrame()放在 for 循环中 -
这样你就可以为每个集合创建不同的DataFrame。这是你想要的吗?
标签: python pandas dataframe indexing