【问题标题】:RegEx for matching dates (Month Day, Year OR m/d/yy)匹配日期的正则表达式(月日、年或月/日/年)
【发布时间】:2019-09-27 15:49:51
【问题描述】:

我正在尝试编写一个正则表达式,它可用于在字符串中查找日期,该日期可能前面(或后面)有空格、数字、文本、行尾等。表达式应该处理 US日期格式是

1) 月份名称日、年 - 即 2019 年 1 月 10 日或
2) mm/dd/yy - 即 11/30/19

我找到了月份名称,日期年份

(Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)?|Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(ober)?|Nov(ember)?|Dec(ember)?)\s+\d{1,2},\s+\d{4}

(感谢 Veverke Regex to match date like month name day comma and year

这适用于 mm/dd/yy(以及 m/d/y 的各种组合)

(1[0-2]|0?[1-9])/(3[01]|[12][0-9]|0?[1-9])/(?:[0-9]{2})?[0-9]{2} 

(感谢 Steven Levithan 和 Jan Goyvaerts https://www.oreilly.com/library/view/regular-expressions-cookbook/9781449327453/ch04s04.html

我尝试过这样组合它们

((Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)?|Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(ober)?|Nov(ember)?|Dec(ember)?)\s+\d{1,2},\s+\d{4})|((1[0-2]|0?[1-9])/(3[01]|[12][0-9]|0?[1-9])/(?:[0-9]{2})?[0-9]{2})

当我在输入字符串“Paid on 1/1/2019”中搜索“on [regex above]”时,它确实找到了日期,但没有找到“on”这个词。如果我只使用

就可以找到字符串
(1[0-2]|0?[1-9])/(3[01]|[12][0-9]|0?[1-9])/(?:[0-9]{2})?[0-9]{2}

谁能看出我做错了什么?

编辑

我正在使用下面的 c# .net 代码:

    string stringToSearch = "Paid on 1/1/2019";
    string searchPattern = @"on ((Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)?|Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(ober)?|Nov(ember)?|Dec(ember)?)\s+\d{1,2},\s+\d{4})|((1[0-2]|0?[1-9])/(3[01]|[12][0-9]|0?[1-9])/(?:[0-9]{2})?[0-9]{2})";
    var match = Regex.Match(stringToSearch, searchPattern, RegexOptions.IgnoreCase);


    string foundString;
    if (match.Success)
        foundString= stringToSearch.Substring(match.Index, match.Length);

例如

string searchPattern = @"on ((Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)?|Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(ober)?|Nov(ember)?|Dec(ember)?)\s+\d{1,2},\s+\d{4})|((1[0-2]|0?[1-9])/(3[01]|[12][0-9]|0?[1-9])/(?:[0-9]{2})?[0-9]{2})";
stringToSearch = "Paid on Jan 1, 2019";
found = "on Jan 1, 2019" -- worked as expected, found the word "on" and the date

stringToSearch = "Paid on 1/1/2019";
found = "1/1/2019"  -- did not work as expected, found the date but did not include the word "on"

如果我反转模式

string searchPattern = @"on ((1[0-2]|0?[1-9])/(3[01]|[12][0-9]|0?[1-9])/(?:[0-9]{2})?[0-9]{2})|((Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)?|Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(ober)?|Nov(ember)?|Dec(ember)?)\s+\d{1,2},\s+\d{4})"";

stringToSearch = "Paid on Jan 1, 2019";
found = "Jan 1, 2019" -- did not work as expected, found the date but did not include the word "on"

stringToSearch = "Paid on 1/1/2019";
found = "on 1/1/2019" -- worked as expected, found the word "on" and the date

谢谢

【问题讨论】:

  • 正则表达式很好。如果您使用的是 java,请将您的代码链接给我。
  • 从正则表达式的差异来看,您必须将所有反斜杠加倍:\ > \\(在字符串文字中,\\ 用于表示一个反斜杠)。编程语言是什么?
  • 对不起@sln,我的问题不准确。正则表达式确实找到了日期,但“on”这个词不是结果的一部分。我正在使用 c# .net。我将编辑我的问题以澄清。谢谢
  • 感谢@Emma 的建议,包括输入和输出。这是我的第一篇文章(我已经阅读了很多其他文章),并感谢改进/澄清我的问题的建议。

标签: c# regex regex-lookarounds regex-group regex-greedy


【解决方案1】:

你的表达似乎都很好,他们俩。如果您希望在目标输出之前或之后捕获任何内容,您只需在左侧和右侧添加两个边界即可。比如请看this test

(.*)(((1[0-2]|0?[1-9])\/(3[01]|[12][0-9]|0?[1-9])\/(?:[0-9]{2})?[0-9]{2})|((Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)?|Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(ober)?|Nov(ember)?|Dec(ember)?)\s+\d{1,2},\s+\d{4}))(.*)

例如,您可以在其中添加两个类似于(.*) 的组,并将您的原始表达式包装在一个组中,这样就可以了。

正则表达式描述图

图表可视化您的表达式的工作原理,您可能希望在此link 中测试其他表达式:

C# 测试

using System;
using System.Text.RegularExpressions;

public class Example
{
    public static void Main()
    {
        string pattern = @"(.*)(((1[0-2]|0?[1-9])\/(3[01]|[12][0-9]|0?[1-9])\/(?:[0-9]{2})?[0-9]{2})|((Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)?|Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(ober)?|Nov(ember)?|Dec(ember)?)\s+\d{1,2},\s+\d{4}))(.*)";
        string input = @"Paid on Jan 1, 2019 And anything else that you wish to have after
Paid on 1/1/2019 And anything else that you wish to have after";
        RegexOptions options = RegexOptions.Multiline;

        foreach (Match m in Regex.Matches(input, pattern, options))
        {
            Console.WriteLine("'{0}' found at index {1}.", m.Value, m.Index);
        }
    }
}

JavaScript 演示

此 JavaScript 演示表明您的表达式有效:

const regex = /(.*)(((1[0-2]|0?[1-9])\/(3[01]|[12][0-9]|0?[1-9])\/(?:[0-9]{2})?[0-9]{2})|((Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)?|Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(ober)?|Nov(ember)?|Dec(ember)?)\s+\d{1,2},\s+\d{4}))(.*)/gm;
const str = `Paid on Jan 1, 2019 And anything else that you wish to have after
Paid on 1/1/2019 And anything else that you wish to have after`;
const subst = `\nGroup 1: $1 \nGroup 2: $2 \nGroup 3: $3 \nGroup 4: $4 `;

// The substituted value will be contained in the result variable
const result = str.replace(regex, subst);

console.log('Substitution result: ', result);

基本性能测试

此 JavaScript sn-p 返回运行时 100 万次 for 循环以提高性能。

const repeat = 1000000;
const start = Date.now();

for (var i = repeat; i >= 0; i--) {
	const string = 'Paid on Jan 1, 2019';
	const regex = /(.*)(((1[0-2]|0?[1-9])\/(3[01]|[12][0-9]|0?[1-9])\/(?:[0-9]{2})?[0-9]{2})|((Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)?|Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(ober)?|Nov(ember)?|Dec(ember)?)\s+\d{1,2},\s+\d{4}))(.*)/gm;
	var match = string.replace(regex, "\nGroup #1: $1\nGroup #2: $2 \n");
}

const end = Date.now() - start;
console.log("YAAAY! \"" + match + "\" is a match ??? ");
console.log(end / 1000 + " is the runtime of " + repeat + " times benchmark test. ? ");

改进

您可能希望围绕月份名称减少捕获组,如果您愿意,可以简单地将它们全部添加到一个捕获组中。

【讨论】:

  • 非常感谢 - 正是我需要的,示例和测试代码非常有帮助。
  • 你如何匹配返回日期?这将返回“adfadf 2021 年 1 月 1 日”
猜你喜欢
  • 2015-12-25
  • 1970-01-01
  • 2014-08-20
  • 1970-01-01
  • 1970-01-01
  • 2015-01-12
  • 2019-06-01
  • 1970-01-01
  • 2015-04-29
相关资源
最近更新 更多