【问题标题】:Pandas DataFrame partial string match based on a list基于列表的 Pandas DataFrame 部分字符串匹配
【发布时间】:2020-08-20 04:28:00
【问题描述】:

我有一个DataFrame,如下所示。

import pandas as pd
import numpy as np

df = pd.DataFrame([['One person has died after two motorbikes crashed in the Bay of Plenty.', 'The crash occurred at 3.15pm on Bell Rd in Nukuhou south of Whakatāne police said.', 'Another person suffered minor injuries in the crash.', 'WorkSafe had been advised and the Serious Crash Unit was in attendance police said.',
 'One person has died and another has minor injuries following a serious crash in Nukuhou near Whakatāne in Eastern Bay of Plenty.', 'The crash involved two motorcycles and was reported around 3.15pm.', 'The Serious Crash Unit and WorkSafe are attending the scene.']]).T
df.columns = ['col1']

我想在DataFrame 中查找以下字符串列表是否可用“col1”。

loc_list = ['Bay of Plenty', 'Bell Rd', 'Nukuhou']

以下是我尝试过的。

df['location_mapped_title'] = (df.col1.str
                             .findall('|'.join(loc_list))
                             .str[0])
    col1    location_mapped_title
0   One person has died after two motorbikes crash...   Bay of Plenty
1   The crash occurred at 3.15pm on Bell Rd in Nuk...   Bell Rd
2   Another person suffered minor injuries in the ...   NaN
3   WorkSafe had been advised and the Serious Cras...   NaN
4   One person has died and another has minor inju...   Nukuhou
5   The crash involved two motorcycles and was rep...   NaN
6   The Serious Crash Unit and WorkSafe are attend...   NaN

但它不会打印所有匹配的子字符串。例如,在索引 4 处还有另一个“丰盛湾”。如何找到所有匹配项?

【问题讨论】:

  • 您已经找到它,但仅使用str[0] 使用第一个匹配项(如果匹配)。

标签: python-3.x pandas list dataframe


【解决方案1】:

试试findallextractall

df.col1.str.findall(f'({"|".join(loc_list)})', flags=re.IGNORECASE)

df.col1.str.extractall(f'({"|".join(loc_list)})', flags=re.IGNORECASE)

【讨论】:

    猜你喜欢
    • 2022-10-19
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2017-07-29
    • 2017-07-15
    • 1970-01-01
    • 1970-01-01
    • 2023-03-24
    相关资源
    最近更新 更多