【问题标题】:Extracting Domains from E-mail Database List从电子邮件数据库列表中提取域
【发布时间】:2020-08-31 03:34:37
【问题描述】:

我需要从数据集中的电子邮件中提取域并计算前 5 个域。

import re
from collections import Counter
with open("emails")
domain = re.search('@[\w.)]+, email')
 print(domain.group())

 jbutt@gmail.com  http://www.bentonjohnbjr.com
 josephine_darakjy@darakjy.org  http://www.chanayjeffreyaesq.com
 art@venere.org http://www.chemeljameslcpa.com
 lpaprocki@hotmail.com  http://www.feltzprintingservice.com
 donette.foller@cox.net http://www.printingdimensions.com

【问题讨论】:

    标签: python computer-science google-colaboratory


    【解决方案1】:

    这列出了前 5 个域:

    import re
    from collections import Counter 
    resultList = []
    with open("emails", "r") as email:
        for x in email:
            result = re.search('@(.*) ', x)
            resultList.append(result.group(1))
    occurence_count = Counter(resultList) 
    print(occurence_count.most_common(5))
    

    输出:

    [('gmail.com ', 1), ('darakjy.org ', 1), ('venere.org', 1), ('hotmail.com ', 1), ('cox.net', 1)]
    

    输出是 5 个最常见的域名

    【讨论】:

      猜你喜欢
      • 2011-11-19
      • 2021-12-23
      • 2018-12-08
      • 1970-01-01
      • 2015-07-09
      • 2017-05-29
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多