【问题标题】:Notepad++ Regular Expression Remove Duplicate EmailsNotepad++ 正则表达式删除重复的电子邮件
【发布时间】:2013-05-22 08:07:27
【问题描述】:
所以我一直试图找出一种方法来使用正则表达式(正则表达式)从我拥有的文本文件中删除重复的电子邮件,但我根本无法得到任何工作。
这就是电子邮件在文本文件中的样子(示例)
examp@asdas.com
kork@kruu.com
gexx@moxx.com
hey@hayhay.cu
examp@asdas.com
geexx@modxx.com
我还没有找到删除所有重复项的方法,我只是在正则表达式中找到了一种方法来删除彼此正确的重复项。
有人有什么建议吗?
【问题讨论】:
标签:
regex
email
notepad++
duplicate-removal
【解决方案1】:
怎么样:
搜索:([^@]+@[^@]+)(.*?)\1
替换为:$1$2
正则表达式解释:
The regular expression:
(?-imsx:([^@]+@[^@]+)(.*?)\1)
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
[^@]+ any character except: '@' (1 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
@ '@'
----------------------------------------------------------------------
[^@]+ any character except: '@' (1 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
( group and capture to \2:
----------------------------------------------------------------------
.*? any character except \n (0 or more times
(matching the least amount possible))
----------------------------------------------------------------------
) end of \2
----------------------------------------------------------------------
\1 what was matched by capture \1
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------