根据 pandas 数据框中的短语保留文本并删除所有其他文本答案

【问题标题】：Retaining text based on phrases in a pandas dataframe & removing all other text根据 pandas 数据框中的短语保留文本并删除所有其他文本
【发布时间】：2022-01-11 21:59:53
【问题描述】：

我的数据框中有一列包含如下文本：

Sunny, with a high near 82. Light and variable wind becoming northwest 5 to 7 mph in the afternoon.

但有时包含如下文字：

A 50 percent chance of showers.  Partly sunny, with a high near 61.

我想对其进行操作，以便保留温度值（即 82 或 61），同时删除所有其他信息。所以它会变成“82”或“61”。我不能在固定索引上执行此操作，因为数据帧条目的长度是可变的，数字长度也是可变的，因为它是温度。

我想使用“high near”、“low near”等短语来解析字符串以查找温度值。有没有一种令人愉悦的方式来实现这一点？

【问题讨论】：

标签： python-3.x pandas string dataframe

【解决方案1】：

您可以使用带有 pandas 的正则表达式，例如 near (\d+) 将找到紧随其后的数字

【讨论】：

感谢您的提示，我是正则表达式/字符串操作的新手，所以我会查找并进一步调查。我相信我会再次需要它。

【解决方案2】：

试试这个：

df['temperature'] = df['text'].str.extract('(?:high|low) near (\d+)')[0]

输出：

>>> df
                                                text temperature
0  Sunny, with a high near 82. Light and variable...          82
1  A 50 percent chance of showers.  Partly sunny,...          61

【讨论】：