【发布时间】:2016-10-17 23:29:15
【问题描述】:
我有文本数据要分类。使用我指定单个字符串的 for 循环,我正在确定特定单词或短语是否存在于另一列的行中。如果为 true,则循环将特定值附加到新列表中。然后将新列表添加到DataFrame。然而,这种方法对于我的实际数据来说太笨拙了,因为我需要为大量测试指定大量字符串。
有没有一种方法可以将单个字符串分组到循环可以在其中搜索的单个数据结构中?这意味着循环中的每个测试都将仅引用一个数据结构,而不是循环内拼写的单个字符串。这个可以吗?
下面是我目前正在做的一个可重现的例子,突出了这个问题。
data = {
'opinion': ['He said it was too expensive',
'She said it was too costly',
'He thought it was not fast enough',
'They thought is was not right and too much money',
'Her view was that it was too small and too slow',
]}
df = pd.DataFrame(data, columns = ['opinion'])
df
创建这个:
opinion
0 He said it was too expensive
1 She said it was too costly
2 He thought it was not fast enough
3 They thought is was not right and too much money
4 Her view was that it was too small and too slow
然后这个 for 循环进行以下分类。
new_col=[]
for row in df['opinion']:
if 'too expensive' in row or 'too costly' in row or 'too much money' in row:
new_col.append('Too Expensive')
elif 'not fast enough' in row or 'too slow' in row:
new_col.append('Too Slow')
df['reason'] = new_col
df
opinion reason
0 He said it was too expensive Too Expensive
1 She said it was too costly Too Expensive
2 He thought it was not fast enough Too Slow
3 They thought is was not right and too much money Too Expensive
4 Her view was that it was too small and too slow Too Slow
在我的实际数据中,虽然我无法在每个测试的循环内写入多个单独的字符串,但数量太多了。
【问题讨论】:
标签: python string pandas search dataframe