【发布时间】:2019-07-08 10:41:59
【问题描述】:
给定一个DataFrame,它有一个单列Text:
Text
0 chest pain nstemi this 84-year old man present on 26/5 with
chest pain associate with profuse sweating and nausea
我想创建两个新列,其中包含为之前的 DataFrame 生成的一元和二元。
这是我用来生成 ngram 的方法:
def generate_ngrams(self, s, n):
# Convert to lowercases
s = s.lower()
# Replace all none alphanumeric characters with spaces
s = re.sub(r'[^a-zA-Z0-9\s]', ' ', s)
# Break sentence in the token, remove empty tokens
tokens = [token for token in s.split(" ") if token != ""]
# Use the zip function to help us generate n-grams
# Concatentate the tokens into ngrams and return
ngrams = zip(*[tokens[i:] for i in range(n)])
return [" ".join(ngram) for ngram in ngrams]
这就是我试图填充我的DataFrame:
for index, row in featuresDF.iterrows():
featuresDF.at[index, '1-gram'] = generate_ngrams(infoDF.at[index, 'Text'], 1)
featuresDF.at[index, '2-gram'] = generate_ngrams(infoDF.at[index, 'Text'], 2)
当我运行它时,我收到以下错误:ValueError: setting an array element with a sequence.
这是回溯:
Traceback (most recent call last):
File "<ipython-input-64-e014e2e1c7e2>", line 3, in <module>
featuresDF.at[index, '1-gram'] = featureExtraction.generate_ngrams(infoDF.at[index, 'Text'], 1)
File "C:\Users\as\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexing.py", line 2287, in __setitem__
self.obj._set_value(*key, takeable=self._takeable)
File "C:\Users\as\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\frame.py", line 2815, in _set_value
engine.set_value(series._values, index, value)
File "pandas/_libs/index.pyx", line 95, in pandas._libs.index.IndexEngine.set_value
File "pandas/_libs/index.pyx", line 106, in pandas._libs.index.IndexEngine.set_value
我知道当我将一元和二元分配给DataFrame 时,这是一个问题,对吧?但我不知道如何解决它。谢谢!
【问题讨论】:
标签: python arrays pandas dataframe