【问题标题】:Python data structure (dictionary)Python数据结构(字典)
【发布时间】:2021-01-12 16:56:27
【问题描述】:

我想从 txt 文件中提取电子邮件,并计算电子邮件的出现次数。 但是电子邮件的输出被分成每个字母。

其余的编码旨在计算出现次数。

name = input("Enter file:")
if len(name) < 1 : name = "mbox-short.txt"
handle = open(name)

di = {}
for line in handle:
    if line.startswith('From '): #I forgot the space while the first trial
        line = line.rstrip()
        words = line.split()
        email = words[1]
        # print(email)
        for em in email:
            di[em] = di.get(em, 0) + 1
        print(di)

意外的输出。

prolific_em = None
largest = -1
for v,k in di:
    if v > largest :
        largest = v
        prolific_em = k
print(prolific_em, largest)
6, 'b': 1, 'k': 1, 'y': 1, 'j': 2, 'w': 5, 'g': 4, 'v': 3}
{'s': 5, 't': 4, 'e': 28, 'p': 6, 'h': 8, 'n': 10, '.': 20, 'm': 11, 'a': 15, 'r': 12, 'q': 5, 'u': 34, 'd': 15, '@': 15, 'c': 14, 'z': 5, 'l': 7, 'o': 3, 'i': 27, 'b': 1, 'k': 2, 'y': 1, 'j': 2, 'w': 5, 'g': 5, 'v': 3}
{'s': 6, 't': 4, 'e': 28, 'p': 7, 'h': 8, 'n': 10, '.': 22, 'm': 16, 'a': 20, 'r': 13, 'q': 5, 'u': 34, 'd': 15, '@': 16, 'c': 16, 'z': 5, 'l': 9, 'o': 7, 'i': 28, 'b': 1, 'k': 3, 'y': 2, 'j': 2, 'w': 5, 'g': 7, 'v': 3}
{'s': 6, 't': 6, 'e': 28, 'p': 7, 'h': 9, 'n': 10, '.': 25, 'm': 16, 'a': 23, 'r': 14, 'q': 5, 'u': 35, 'd': 17, '@': 17, 'c': 18, 'z': 7, 'l': 9, 'o': 8, 'i': 30, 'b': 1, 'k': 3, 'y': 2, 'j': 2, 'w': 6, 'g': 7, 'v': 4}
{'s': 6, 't': 8, 'e': 28, 'p': 7, 'h': 10, 'n': 10, '.': 28, 'm': 16, 'a': 26, 'r': 15, 'q': 5, 'u': 36, 'd': 19, '@': 18, 'c': 20, 'z': 9, 'l': 9, 'o': 9, 'i': 32, 'b': 1, 'k': 3, 'y': 2, 'j': 2, 'w': 7, 'g': 7, 'v': 5}
{'s': 6, 't': 10, 'e': 28, 'p': 7, 'h': 11, 'n': 10, '.': 31, 'm': 16, 'a': 29, 'r': 16, 'q': 5, 'u': 37, 'd': 21, '@': 19, 'c': 22, 'z': 11, 'l': 9, 'o': 10, 'i': 34, 'b': 1, 'k': 3, 'y': 2, 'j': 2, 'w': 8, 'g': 7, 'v': 6}
{'s': 6, 't': 12, 'e': 28, 'p': 7, 'h': 12, 'n': 10, '.': 34, 'm': 16, 'a': 32, 'r': 17, 'q': 5, 'u': 38, 'd': 23, '@': 20, 'c': 24, 'z': 13, 'l': 9, 'o': 11, 'i': 36, 'b': 1, 'k': 3, 'y': 2, 'j': 2, 'w': 9, 'g': 7, 'v': 7}
{'s': 7, 't': 14, 'e': 30, 'p': 8, 'h': 13, 'n': 11, '.': 37, 'm': 17, 'a': 36, 'r': 19, 'q': 6, 'u': 40, 'd': 24, '@': 21, 'c': 26, 'z': 14, 'l': 9, 'o': 11, 'i': 36, 'b': 1, 'k': 3, 'y': 2, 'j': 2, 'w': 9, 'g': 7, 'v': 7}
{'s': 8, 't': 14, 'e': 35, 'p': 8, 'h': 13, 'n': 11, '.': 39, 'm': 18, 'a': 37, 'r': 20, 'q': 6, 'u': 42, 'd': 26, '@': 22, 'c': 26, 'z': 14, 'l': 11, 'o': 12, 'i': 38, 'b': 2, 'k': 4, 'y': 3, 'j': 2, 'w': 9, 'g': 7, 'v': 7}
{'s': 9, 't': 14, 'e': 40, 'p': 8, 'h': 13, 'n': 11, '.': 41, 'm': 19, 'a': 38, 'r': 21, 'q': 6, 'u': 44, 'd': 28, '@': 23, 'c': 26, 'z': 14, 'l': 13, 'o': 13, 'i': 40, 'b': 3, 'k': 5, 'y': 4, 'j': 2, 'w': 9, 'g': 7, 'v': 7}
{'s': 9, 't': 14, 'e': 45, 'p': 8, 'h': 13, 'n': 11, '.': 43, 'm': 20, 'a': 40, 'r': 23, 'q': 6, 'u': 45, 'd': 30, '@': 24, 'c': 26, 'z': 14, 'l': 14, 'o': 13, 'i': 41, 'b': 4, 'k': 6, 'y': 6, 'j': 2, 'w': 9, 'g': 7, 'v': 7}
{'s': 9, 't': 14, 'e': 47, 'p': 9, 'h': 13, 'n': 12, '.': 44, 'm': 20, 'a': 40, 'r': 23, 'q': 6, 'u': 48, 'd': 31, '@': 25, 'c': 27, 'z': 14, 'l': 14, 'o': 13, 'i': 43, 'b': 4, 'k': 6, 'y': 6, 'j': 2, 'w': 10, 'g': 7, 'v': 7}
{'s': 9, 't': 14, 'e': 49, 'p': 10, 'h': 13, 'n': 13, '.': 45, 'm': 20, 'a': 40, 'r': 23, 'q': 6, 'u': 51, 'd': 32, '@': 26, 'c': 28, 'z': 14, 'l': 14, 'o': 13, 'i': 45, 'b': 4, 'k': 6, 'y': 6, 'j': 2, 'w': 11, 'g': 7, 'v': 7}
{'s': 9, 't': 14, 'e': 51, 'p': 11, 'h': 13, 'n': 14, '.': 46, 'm': 20, 'a': 40, 'r': 23, 'q': 6, 'u': 54, 'd': 33, '@': 27, 'c': 29, 'z': 14, 'l': 14, 'o': 13, 'i': 47, 'b': 4, 'k': 6, 'y': 6, 'j': 2, 'w': 12, 'g': 7, 'v': 7}

【问题讨论】:

  • 请不要发布代码、数据或 Tracebacks 的图像。将其复制并粘贴为文本,然后将其格式化为代码(选择它并输入ctrl-k)...Discourage screenshots of code and/or errors
  • 如果email 是一个字符串,for em in email 会遍历该字符串的字符。您想要 for em in email 迭代什么?
  • @wwii 我已经更改了内容。感谢您的提醒。
  • @CharlesDuffy 直到我试了几次才知道。

标签: python python-3.x dictionary for-loop


【解决方案1】:

由于email 包含电子邮件字符串,因此for em in email 会遍历电子邮件字母,并且最好使用with 语句处理文件,以便它们正确关闭

di = {}
with open(name) as handle:
    for line in handle:
        if line.startswith('From '):  
            line = line.rstrip()
            words = line.split()
            email = words[1]
            di[email] = di.get(email, 0) + 1
            print(di)

你也可以使用 Conuter 来计算东西

from collections import Counter
di = Counter()
with open(name) as handle:
    for line in handle:
        if line.startswith('From '):
            line = line.rstrip()
            words = line.split()
            email = words[1]
            di.update([email])

【讨论】:

    【解决方案2】:

    错误

    我犯了几个错误:

    1. 我没有将提取的数据存储到列表中
    2. 我交换了键、值的位置
    name = input("Enter file:")
    if len(name) < 1 : name = "mbox-short.txt"
    handle = open(name)
    
    di = {}
    for line in handle:
        if line.startswith('From '): #I forgot the space while the first trial
            line = line.rstrip()
            words = line.split()
            email = list()
            # it should use list to store the email otherwise the loop would treat extracted data as a massive string
            email.append(words[1])
            for em in email:
                di[em] = di.get(em, 0) + 1
            # print(di)
    largest = -1
    prolific_em = None
    for k,v in di.items():
        if v > largest :
            largest = v
            prolific_em = k
    print(prolific_em, largest)
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2013-12-10
      • 2011-06-25
      • 2023-04-08
      • 2016-06-21
      • 1970-01-01
      • 2015-04-25
      相关资源
      最近更新 更多