【问题标题】:Remove Words appears less than 2 times in text' from Pandas Series从 Pandas 系列中删除单词在文本中出现的次数少于 2 次
【发布时间】:2020-06-14 22:55:39
【问题描述】:

我正在尝试删除 Pandas 系列中每个标量值中出现的所有单词。最好的方法是什么?这是我失败的尝试:

from collections import Counter
df = pd.DataFrame({'text':["The quick brown fox", "jumped over the lazy dog","jumped over the lazy dog"]})
d=''.join(df['text'][:])
m=d.split()
q=Counter(m)
print (q)
df['text'].str.split().map(lambda el: " ".join(Counter(el for el in q.elements() if q[el] >= 2)))

out put :
Counter({'over': 2, 'the': 2, 'lazy': 2, 'The': 1, 'quick': 1, 'brown': 1, 'foxjumped': 1, 'dogjumped': 1, 'dog': 1})
0    over the lazy
1    over the lazy
2    over the lazy
Name: text, dtype: object

【问题讨论】:

  • 您的''.join() 中需要一个空格,例如' '.join()
  • 显示相同 :0 跳过懒狗 1 跳过懒狗 2 跳过懒狗 Name: text, dtype: object
  • in 0 应该是空的不是:0 跳过了懒狗

标签: python dictionary counter


【解决方案1】:
from collections import Counter

df = pd.DataFrame({'text':["The quick brown fox", "jumped over the lazy dog","jumped over the lazy dog"]})
c = Counter(df.text.str.split().explode())
print( df.text.apply(lambda x: ' '.join(w for w in x.split() if c[w] >= 2).strip()) )

打印:

0                            
1    jumped over the lazy dog
2    jumped over the lazy dog
Name: text, dtype: object

【讨论】:

    猜你喜欢
    • 2023-03-06
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2018-01-30
    • 1970-01-01
    • 2014-11-12
    • 2014-05-07
    • 2023-02-20
    相关资源
    最近更新 更多