正则表达式从字符串中提取地址街道答案

【问题标题】：Regex Expression to extract Address Street from a string正则表达式从字符串中提取地址街道
【发布时间】：2021-01-18 10:54:18
【问题描述】：

鉴于示例文本，我想提取地址街道（星号之间的文本）。使用下面的正则表达式，我可以提取大多数句子的地址街道，但主要是 text4 和 text5 失败。

regex = r"(^[0-9]+[\s\-0-9,A-Za-z]+)"
text1 = *9635 E COUNTY ROAD, 1000 N*.
text2 = *8032 LIBERTY RD S*.
text3 = *2915 PENNSYLVANIA AVENUE*  40 Other income (loss) 15 Alternative minimum tax (AMT) ilems
A 2,321
text4 = *2241 Western Ave*. 10 Other income loss 15 — Altemative minimum tax AMT itams
text5 = *450 7TH STREET, APT 2-M*
text6 = *9635 East County Road 1000 North*

My code---
for k,v in val.items():
 if k == "Shareholder Address Street":
   text = " ".join(v)
   pattern1 = r"(^[0-9]+[\s\-0-9,A-Za-z]+)"
   addressRegex = re.compile(pattern1)
   match = addressRegex.search(text)
   if match is not None:
      delta = []
      delta.append("".join(match.group(0)))
      val[k] = delta

任何人都可以建议更改上述正则表达式，因为它适用于大多数文档吗？

【问题讨论】：

您需要向我们展示所有可能出现在您的文本中的各种形式的地址。否则，当您揭示某些极端情况时，下面给出的任何答案都可能立即无效。
目前，我得到了这 6 种不同形式的地址。
@RevolverRakk 你能分享你正在使用的代码吗？
@第四只鸟，我已经分享了我的代码 sn-p，我在其中应用正则表达式来提取地址街道并存储在字典中。

标签： python regex

【解决方案1】：

使用

^\d+(?:[ \t][\w,-]+)*

见proof

说明

--------------------------------------------------------------------------------
  ^                        the beginning of the string
--------------------------------------------------------------------------------
  \d+                      digits (0-9) (1 or more times (matching
                           the most amount possible))
--------------------------------------------------------------------------------
  (?:                      group, but do not capture (0 or more times
                           (matching the most amount possible)):
--------------------------------------------------------------------------------
    [ \t]                    any character of: ' ', '\t' (tab)
--------------------------------------------------------------------------------
    [\w,-]+                  any character of: word characters (a-z,
                             A-Z, 0-9, _), ',', '-' (1 or more times
                             (matching the most amount possible))
--------------------------------------------------------------------------------
  )*                       end of grouping

【讨论】：

感谢您的回答。正则表达式在在线测试器中运行良好，但对于 text3 是否失败，即“2915 PENNSYLVANIA AVENUE 40 其他收入（损失）......”在我的 IDE 中测试并返回整个字符串而不是“2915 PENNSYLVANIA AVENUE”
@RevolverRakk 抱歉，我无法复制，请参阅regex101.com/r/OONAEX/2
感谢您的指导，我会尽量根据我的要求进行一些更改。