【问题标题】:US dollar amount, thousands separated by commas美元金额,千位以逗号分隔
【发布时间】:2019-03-17 22:23:00
【问题描述】:

我是 python 新手。我正在尝试使用正则表达式从子字符串中提取以美元计价的金额。它在大多数情况下都有效,但是我面临着一些我无法解决的问题。

结果金额是一个字符串,由于逗号而无法识别为金额。它也不适用于小于$1(例如0.89)的小额金额。没有前导$。非常感谢任何帮助。

这是我所拥有的:

df['Amount']=df['description'].str.extract('(\d{1,3}?(\,\d{3})*\.\d{2})')

这是一个应该被解析的字符串:

000000000463 NYC DOF OPA CONCENTRATION ACCT. *00029265 07/01/2013 AP5378 1,107,844.38 Ven000000000463 Vch:00029265

我正在尝试在数据框对象的单独列中提取金额 1,107,844.38。我没有任何应该被拒绝的字符串。

【问题讨论】:

  • 能否请您发布应该解析的字符串和应该拒绝的字符串?
  • 当然,这是一个应该被解析的字符串。我正在尝试在数据框对象的单独列中提取金额 1,107,844.38。我没有任何应该被拒绝的字符串。谢谢! “000000000463 NYC DOF OPA 浓度帐户。*00029265 07/01/2013 AP5378 1,107,844.38 Ven000000000463 Vch:00029265”
  • 你能用它更新你的问题吗?谢谢!

标签: python regex currency


【解决方案1】:

给定您的示例字符串:

"000000000463 NYC DOF OPA CONCENTRATION ACCT. *00029265 07/01/2013 AP5378 1,107,844.38 Ven000000000463 Vch:00029265"

这是我想出的:

match = re.search(r"(?P<amount>\$?(?:\d+,)*\d+\.\d+)", subject)
if match:
    result = match.group("amount")  # result will be "1,107,844.38"
else:
    result = ""

提取金额。它还处理像0.38 这样的小额金额,像123456789.38 这样没有千位分隔符逗号的金额,并且金额前面可能有也可能没有美元符号$

正则表达式详细信息

(?<amount>\$?(?:\d+,)*\d+\.\d+)  Match the regular expression below and capture its match into backreference with name “amount” 
\$?                              Match the character “$” literally
?                                Between zero and one times, as many times as possible, giving back as needed (greedy) 
(?:\d+,)*                        Match the regular expression below 
*                                Between zero and unlimited times, as many times as possible, giving back as needed (greedy) 
\d+                              Match a single digit 0..9 
+                                Between one and unlimited times, as many times as possible, giving back as needed (greedy) 
,                                Match the character “,” literally 
\d+                              Match a single digit 0..9 
+                                Between one and unlimited times, as many times as possible, giving back as needed (greedy)
\.                               Match the character “.” literally 
\d+                              Match a single digit 0..9 
+                                Between one and unlimited times, as many times as possible, giving back as needed (greedy) 

【讨论】:

  • 这个答案对我有用。我发现“正则表达式详细信息”部分中的附加解释非常有见地。谢谢!
【解决方案2】:

你可以试试像这样的正则表达式

rx = r"\b(?<!/)(\d{1,3}(?:,\d{3})*(?:\.\d{2})?)\b(?!/)"
df['Amount']=df['description'].str.extract(rx)

regex demo

详情

  • \b - 单词边界
  • (?&lt;!/) - 没有 / 紧邻当前位置的左侧(以避免匹配日期时间值)
  • \d{1,3} - 1 到 3 位数字
  • (?:,\d{3})* - , 的 0+ 次重复和 3 位数字
  • (?:\.\d{2})? - 可选的 . 和 2 位数字
  • \b - 单词边界
  • (?!/) - 没有 / 紧邻当前位置(以避免匹配日期时间值)

【讨论】:

  • 这个答案也对我有用。感谢您提供详细信息以及“正则表达式演示”站点的链接。很有帮助。
猜你喜欢
  • 1970-01-01
  • 2019-03-22
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2017-08-23
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多