比较c#中包含随机URL的两个字符串[关闭]答案

【问题标题】：Compare two strings in c# containing random URL [closed]比较c#中包含随机URL的两个字符串[关闭]
【发布时间】：2015-12-22 22:24:35
【问题描述】：

我需要比较以下字符串。我遇到的问题是两个字符串中的 url 每次都会不同，例如：

www.google.com
http://www.google.com
google.co.uk！

所以 contains 无法匹配字符串，因为 URL 不匹配。

String1 = "This is my string http://www.google.co.uk and that was my url"
String2 = "this is my string google.gr and that was my url"

所以我基本上想比较字符串减去URl的内容，每个字符串每次都可以包含不同的文本，所以每次在同一位置寻找URL是行不通的。

我已在此处广泛搜索此问题的答案，但找不到可行的解决方案。

提前致谢

【问题讨论】：

你能详细说明你认为什么是匹配的吗？ http://www.google.co.uk“匹配”google.gr 吗？
如果字符串 1 中的所有文本都与字符串 2 中的文本匹配，则认为匹配。 String1 = "这是我的字符串 google.co.uk 那是我的网址" String2 = "这是我的字符串 google.gr 那是我的网址"
Get just the domain name from a URL?的可能重复
如果你能解释为什么你“需要”这样做，以及在比较它们之后你将如何处理这些字符串，这真的很有帮助。

标签： c# asp.net .net

【解决方案1】：

使用正则表达式删除链接：

        String string1 = "This is my string http://www.google.co.uk and that was my url";
        String string2 = "this is my string http://google.gr and that was";

        Regex rxp = new Regex(@"http://[^\s]*");
        String clean1 = rxp.Replace(string1, "");
        String clean2 = rxp.Replace(string2, "");

现在您可以比较 clean1 和 clean2。上面的 OFC 正则表达式只是一个示例，它只会删除以“http://”开头的 url。根据您的真实数据，您可能需要更复杂的东西。

【讨论】：

感谢您的回复。这不起作用，因为 URL 可以是“google.com”而没有“http://”，它也可以使用任何 TLD。
你可以尝试模式 [^\s]+\.[^\s]+ ，它应该匹配所有内部至少有一个点并且以空格开头和结尾的字符串部分。但是您需要根据实际用例检查它，因为这次它可能过于广泛。
这个答案不满足题主的要求！！
@johnsmith6 与您接受的答案完全相同:) 此外，您的要求并没有说“为我做我的工作并给我我需要的确切正则表达式”。

【解决方案2】：

使用正则表达式：

        Regex regex = new Regex(@"\s((?:\S+)\.(?:\S+))");

        string string1 = "This is my string http://www.google.co.uk and that was my url.";
        string string2 = "this is my string google.gr and that was my url.";

        var string1WithoutURI = regex.Replace(string1, ""); // Output: "This is my string and that was my url."
        var string2WithoutURI = regex.Replace(string2, ""); // Output: "this is my string and that was my url."

        // Regex.Replace(string1, @"\s((?:\S+)\.(?:\S+))", ""); // This can be used too to avoid having to declare the regex.

        if (string1WithoutURI == string2WithoutURI)
        {
            // Do what you want with the two strings
        }

解释正则表达式\s((?:\S+)\.(?:\S+))

1. \s 将匹配任何空白字符

2. ((?:\S+)\.(?:\S+)) 将匹配 url 直到下一个空格字符

2.1. (?:\S+) 将匹配任何非空白字符而不再次捕获组（使用 ?:)

2.2. \.会匹配字符“.”，因为它总是存在于一个url中

2.3. (?:\S+)) 同样，将匹配任何非空白字符而不再次捕获组（使用 ?:) 以获取点之后的所有内容。

这应该可以解决问题...

【讨论】：

使用[\s] 与单独使用\s 没有区别，与\. 相同。在某些情况下（尤其是\b）情况并非如此——[\b] 匹配退格字符，而不是单词边界。坏习惯就是坏习惯。
@Corey 感谢您的提醒，我已经更新了答案。
非常感谢！这段代码对我有用。
@johnsmith6 不要忘记将其标记为答案 ;)
完成，再次感谢加布里埃尔