需要使用 Alteryx 或 Pandas 从 excel 列中提取特定文本答案

【问题标题】：Need to extract specific text from a column on excel using either Alteryx or Pandas需要使用 Alteryx 或 Pandas 从 excel 列中提取特定文本
【发布时间】：2021-10-23 10:38:52
【问题描述】：

我有一列包含一组特定的文本，我需要保留这些文本，其余的将被删除或移动到另一列。不幸的是，由于文本排列的变化，我无法使用普通的文本到列。

例如，我需要将单词 Issue 和与之关联的 id 分开。我正在努力寻找一种方法来通过我需要的文本排列的变化来做到这一点。

如果有人可以帮助我找到使用 Alteryx 的解决方案，将不胜感激，如果不是，Pandas 也可以工作。

谢谢大家。

【问题讨论】：

标签： pandas numpy alteryx

【解决方案1】：

使用str.extract 和Pattern 从数据框中提取特定文本[Pandas]

df['After']=df['Before'].str.extract(pat='(ISSUE \d+|issue \d+)',expand=False)

【讨论】：

【解决方案2】：

对于仅 Alteryx 的解决方案，最简单的方法是使用 REGEX_Replace 的 Alteryx 公式：

REGEX_Replace([Before],".*(issue \d+).*","?1",1)

如果您不喜欢 RegEx，基本的字符串操作也可以做到：基本上它是一个子字符串...

Substring([Before], *starting index*, *length*)

起始索引很简单：就是FindString([Before],"ISSUE")

长度也不是太难：它是以“ISSUE”开头的子字符串中第一个逗号的索引（再次使用FindString）：SubString([Before],FindString([Before],"ISSUE"))

将所有这些结合起来并稍微展开：

Substring(
  [Before],
  FindString([Before],"ISSUE"),
  FindString(
    SubString(
      [Before],
      FindString([Before],"ISSUE")
    ),","
  )
)

【讨论】：