正则表达式捕获货币表达式答案

【问题标题】：Regex capturing monetary expressions正则表达式捕获货币表达式
【发布时间】：2016-05-13 01:43:21
【问题描述】：

我在简单的 python 程序中使用 Regex 时遇到问题。我试图捕捉所有写出的美元金额的货币表达方式（例如：“五百美元”、“三十万美元和四十美分”），但我遇到了麻烦。

我的程序只返回空字符串，虽然我收到的一些初步反馈是我的正则表达式“太贪婪”并被覆盖，但我不确定“如何”以及为什么它最终会导致空字符串以及如何修复它。

这是我的python代码：

import re; 
import sys;
file2 = open("test2.txt", "r")
input_txt2 = file2.read() 
distjunct3 = r"(?:(?:(?:a|one|two|three|four|five|six|seven|eight|nine|ten|eleven|twelve)?(?:(thir|four|fif|six|seven|eight|nine)teen)?)(?:(?:twen|thir|four|fif|six|seven|eight|nine)ty)?(?:(?:one|two|three|four|five|six|seven|eight|nine|ten) (?:(?:hundred|thousand|)|(?:\w.llion)))?(?: \w+)? dollar(?:s)?(?: and [0-9]{1,2} cents)?)"
def repl(matchobj):
return "[" + matchobj.group() + "]";
print re.findall(distjunct3, input_txt2)
file2.close()

这是我的正则表达式：

(?:(?:(?:a|one|two|three|four|five|six|seven|eight|nine|ten|eleven|twelve)?(?:(thir|four|fif|six|seven|eight|nine)teen)?)(?:(?:twen|thir|four|fif|six|seven|eight|nine)ty)?(?:(?:one|two|three|four|five|six|seven|eight|nine|ten) (?:(?:hundred|thousand|)|(?:\w.llion)))?(?: \w+)? dollar(?:s)?(?: and [0-9]{1,2} cents)?)

“我在http://regexr.com/ 上测试了我的代码，它似乎与这个示例文本一起工作：超过 16 美元 y 4 美元一头，但现在减少到了价值三千美元：一洛斯十万美元，十二英镑，一美元。只值一美元而不是六美元——十二张皮肤，用于优质、深色和塞子——八块或十块钱，按两块钱计算。 “八块钱；想想看！一，价值二十美元——这就是你的价值死了，二十块钱。 “要素在贸易中支付的七美元，八块钱的大衣。”

我很困惑，非常感谢任何指点，谢谢！！

【问题讨论】：

你只有非捕获组，因此他们不会捕获任何东西。
这对我来说并不是一个正则表达式的好工作。
Imo 更好的方法是在字符串中找到单词 Dollar(s)，然后向后查找，直到其中一个单词不在数字单词列表中。
@Natecat：你如何使用正则表达式倒退？另外，我在另一个示例中使用了非捕获组，它返回得很好。
你需要拒绝像“十三美元”这样的废话吗？

标签： python regex

【解决方案1】：

这实际上是一个更简单的模式。在伪正则表达式中，它的形式为：“(number words)+ dollars (and (number words)+ cents)?”：（适用于您的输入等）

((?:(?:a|one|two|twen|thir|three|four|five|fif|six|seven|eight|nine|ten|eleven|twelve|hundred|thousand|million|billion)(?:y|ty|teen)?[\s-]?)+(?:[\s-]?dollars?(?: (?:and|&) (?:[0-9]{1,2}|no|(?:a|one|two|twen|thir|three|four|five|fif|six|seven|eight|nine|ten|eleven|twelve|hundred|thousand|million|billion)(?:y|ty|teen)?)+ cents)?))

regex demo 输出：

【讨论】：

【解决方案2】：

numwords = ["and", "a" ,"one", "two", "three", "four", "five", "six", "seven", "eight",\
"nine", "ten", "eleven" "twelve", "thirteen", "fourteen", "fifteen", "sixteen",\
"seventeen", "eighteen", "nineteen", "twenty", "thirty", "fourty", "fifty", "sixty",\
"seventy", "eighty", "ninety", "hundred", "thousand", "million", "billion", "trillion"]
teststr = "exceed sixteen dollars y four dollars a head, but it is now reduced to one, and this charge they valuable andto three thousand dollars: a los hundred thousand dollars for twelve pounds for a dollar. Ths worth a dollar and n'tSix dollars--twelve skins, for a prime, dark and tuck--eight or ten dollars, according to only two dollars. \"orth eight dollars; think of that! one, worth twenty dollars--that's your value dead, twenty dollars"
splitstr = teststr.split()
dollarfound = []
for index, s in enumerate(splitstr):
    templist = []
    if s == "dollar" or s == "dollars":
        templist.append(splitstr[index])
        while (index-1 >= 0) and (splitstr[index-1] in numwords):
            templist.append(splitstr[index-1])
            index -=1
        dollarfound.append(" ".join(reversed(templist)))
print(dollarfound)

此代码查找单词dollar(s) 的实例并回溯以获取它之前的所有数字单词。您的用例实际上并不需要正则表达式。

【讨论】：

谢谢！！但是有没有使用正则表达式？这是我的程序的必需功能:( 编辑：在分配的参数中是必需的
什么是必需功能
不幸的是，作业中需要！