在python中提取2个字符串之间的子字符串答案

【问题标题】：Extract substring between 2 strings in python在python中提取2个字符串之间的子字符串
【发布时间】：2018-07-21 15:43:49
【问题描述】：

我有一个带有字符串列的 python 数据框，我想将它分成几列。

DF 的某些行如下所示：

COLUMN

ORDP//NAME/iwantthispart/REMI/MORE TEXT
/REMI/SOMEMORETEXT
/ORDP//NAME/iwantthispart/ADDR/SOMEADRESS
/BENM//NAME/iwantthispart/REMI/SOMEMORETEXT

所以基本上我想要'/NAME/'之后的所有内容，直到下一个'/'。然而。并非每一行都有“/NAME/iwantthispart/”字段，如第二行所示。

我尝试使用拆分函数，但结果错误。

mt['COLUMN'].apply(lambda x: x.split('/NAME/')[-1])

这只是给了我 /NAME/ 部分之后的所有内容，在没有 /NAME/ 的情况下，它会将完整的字符串返回给我。

有没有人有一些提示或解决方案？非常感谢您的帮助！（项目符号是为了使其更具可读性，实际上并不在数据中）。

【问题讨论】：

使用正则表达式匹配模式，如stackoverflow.com/questions/47175817/…

标签： python pandas substring

【解决方案1】：

您可以使用str.extract 提取选择的模式，使用正则表达式：

# Generally, to match all word characters:
df.COLUMN.str.extract('NAME/(\w+)')

或

# More specifically, to match everything up to the next slash:
df.COLUMN.str.extract('NAME/([^/]*)')

两者都返回：

0    iwantthispart
1              NaN
2    iwantthispart
3    iwantthispart

【讨论】：

谢谢！这正是我想要的！问题解决了:)

【解决方案2】：

无论第一个词是不是名字，这两行都会给你第二个词

mt["column"]=mt["column"].str.extract(r"(\w+/\w+/)")
mt["column"].str.extract(r"(\/\w+)")

这将在 pandas 数据框中以列的形式给出以下结果：

/iwantthispart
/SOMEMORETEXT
/iwantthispart
/iwantthispart

如果你只对包含 NAME 的行感兴趣，这对你来说很好：

mt["column"]=mt["column"].str.extract(r"(\NAME/\w+/)")
mt["column"].str.extract(r"(\/\w+)")

这将给出以下结果：

/iwantthispart
/NaN
/iwantthispart
/iwantthispar

【讨论】：

这看起来漂亮而优雅。但似乎需要一些调整，因为 OP 在 NAME 之后要求任何内容，所以第二个结果应该是空白的。
@JAponte 哦，如果是这样的话，我们可以轻松地将 NAME 放在第一行而不是 w+