【问题标题】:C#, extracting strings using regex or string splittingC#,使用正则表达式或字符串拆分提取字符串
【发布时间】:2016-12-11 23:33:47
【问题描述】:

阅读此问题的答案后:C# regex pattern to extract urls from given string - not full html urls but bare links as well 我想知道哪种方法是从文档中提取 url 的最快方法,使用正则表达式匹配或使用字符串拆分方法。

所以,您有一个包含 html 文档的字符串,并且想要提取 url。

正则表达式的方式是:

Regex linkParser = new Regex(@"\b(?:https?://|www\.)\S+\b", RegexOptions.Compiled | RegexOptions.IgnoreCase);
string rawString = "house home go www.monstermmorpg.com nice hospital http://www.monstermmorpg.com this is incorrect url http://www.monstermmorpg.commerged continue";
foreach(Match m in linkParser.Matches(rawString))
    MessageBox.Show(m.Value); 

以及字符串拆分方法:

string rawString = "house home go www.monstermmorpg.com nice hospital http://www.monstermmorpg.com this is incorrect url http://www.monstermmorpg.commerged continue";
var links = rawString.Split("\t\n ".ToCharArray(), StringSplitOptions.RemoveEmptyEntries).Where(s => s.StartsWith("http://") || s.StartsWith("www.") || s.StartsWith("https://"));
foreach (string s in links)
    MessageBox.Show(s);

哪种方式最高效?

【问题讨论】:

  • 你可以同时用秒表试试
  • 我很惭愧地承认我的第一个想法是“秒表是某种基准程序”
  • 我无法进行基准测试,因为我已经有几天无法使用 PC。

标签: c# regex string


【解决方案1】:

拆分速度更快。以下是一些您可以测试的代码: dotnetfiddle link

using System;
using System.Diagnostics;
using System.Linq;
using System.Text.RegularExpressions;

public class Program
{

    public void Main()
    {
        Stopwatch sw = new Stopwatch();

        sw.Start();

        for (int i=0; i < 500; i++)
        {
            Regex linkParser = new Regex(@"\b(?:https?://|www\.)\S+\b", RegexOptions.Compiled | RegexOptions.IgnoreCase);
            string rawString = "house home go www.monstermmorpg.com nice hospital http://www.monstermmorpg.com this is incorrect url http://www.monstermmorpg.commerged continue";
        }

        sw.Stop();

        var test1Time = sw.ElapsedMilliseconds;


        sw.Reset();
        sw.Start();

        for (int i=0; i < 500; i++)
        {
            string rawString = "house home go www.monstermmorpg.com nice hospital http://www.monstermmorpg.com this is incorrect url http://www.monstermmorpg.commerged continue";
            var links = rawString.Split("\t\n ".ToCharArray(), StringSplitOptions.RemoveEmptyEntries).Where(s => s.StartsWith("http://") || s.StartsWith("www.") || s.StartsWith("https://"));  
        }

        sw.Stop();

        var test2Time = sw.ElapsedMilliseconds;

        Console.WriteLine("Regex Test: " + test1Time.ToString());
        Console.WriteLine("Split Test: " + test2Time.ToString());
    }
}

【讨论】:

  • 太棒了。感谢您的回答,
  • 检查它作为答案怎么样。
猜你喜欢
  • 2022-11-30
  • 1970-01-01
  • 2017-02-23
相关资源
最近更新 更多