【发布时间】:2021-05-14 04:00:16
【问题描述】:
我有一个包含剧情简介的多部电影的数据框。
Title Synopsis
Movie1 Old Macdonald had a farm [Written by ABC rewrite]
Movie2 Wheels on the bus (Source: Melon)
Movie3 Tayo the bus [Produced by Wills Garage]
Movie4 James and Giant Apple (Source: Kismet)
我想删除 NLP 不需要的尾随词,以便我在下面得到一个数据框
Title Synopsis
Movie1 Old Macdonald had a farm
Movie2 Wheels on the bus
Movie3 Tayo the bus
Movie4 James and Giant Apple
我尝试了以下代码,但我的概要列以一些字符串结尾,例如“0”Iodfosomhgooad,somh...\n1GaBauadFal...” 想知道我是否可以解决这个问题,感谢任何形式的帮助,谢谢。
removelist = [('[Written by]', '') ,('(Source:)', '')]
for old, new in removelist:
df['Synopsis'] = re.sub(old, new, str(df['Synopsis']))
【问题讨论】:
-
每一行都存在那些不必要的数据吗?
-
@RishabhKumar,不一定,不需要的数据可以出现在任何一行。
标签: python regex pandas dataframe replace