数组元素的Python总和[关闭]答案

【问题标题】：Python sum of the array elements [closed]数组元素的Python总和[关闭]
【发布时间】：2013-11-15 16:43:26
【问题描述】：

我有以下几行的文件：

date:ip num#1 num#2   

2013.09:142.134.35.17 10 12
2013.09:142.134.35.17 4 4
2013.09:63.151.172.31 52 13
2013.09:63.151.172.31 10 10
2013.09:63.151.172.31 16 32
2013.10:62.151.172.31 16 32

同IP的最后两个元素如何总结得出这样的结论？

2013.09:142.134.35.17 14 16
2013.09:63.151.172.31 78 55
2013.10:62.151.172.31 16 32

【问题讨论】：

问：你试过什么？
我已经完成了：file = open("full_megalog.txt", "r") for line in file:
@JohnSmith 这不算尝试；\
您确定要汇总仅具有相同 IP 的元素，还是它们也需要在同一日期发生，即具有相同的 date:ip 值？

标签： python arrays file

【解决方案1】：

试试这个：

from collections import Counter
with open('full_megalog.txt') as f:
    data = [d.split() for d in f]

sum1, sum2 = Counter(), Counter()

for d in data:
    sum1[d[0]] += int(d[1])
    sum2[d[0]] += int(d[2])

for date_ip in sum1.keys():
    print date_ip, sum1[date_ip], sum2[date_ip]

【讨论】：

为什么是Counter 而不是defaultdict(int)？对于Counter，这不是一个好的用例
恐怕我不同意你的说法，这是Counter的一个很好的用例。
听听双方讨论的原因会很有建设性。
@TankorSmash 我已经给出了一个解释：一个简单、有效的解决方案，而且不言自明。
@1_CR -- Counter 的工作原理与defaultdict(int) 很相似，只是它有许多其他有用的方法可供您以后使用……我认为这是@ 987654328@，虽然它绝对没有展示柜台可以做的所有整洁的事情:)

【解决方案2】：

你可以这样做：

addrs='''\
2013.09:142.134.35.17 10 12
2013.09:142.134.35.17 4 4
2013.09:63.151.172.31 52 13
2013.09:63.151.172.31 10 10
2013.09:63.151.172.31 16 32
2013.10:62.151.172.31 16 32'''

class Dicto(dict):
    def __missing__(self, key):
        self[key]=[0,0]
        return self[key]

r=Dicto()
for line in addrs.splitlines():
    ip,n1,n2=line.split()
    r[ip][0]+=int(n1)
    r[ip][1]+=int(n2)

print r   
# {'2013.09:142.134.35.17': [14, 16], 
   '2013.09:63.151.172.31': [78, 55], 
   '2013.10:62.151.172.31': [16, 32]}

或者，如果您愿意，可以使用 defaultdict：

from collections import defaultdict
r=defaultdict(lambda: [0,0])
for line in addrs.splitlines():
    ip,n1,n2=line.split()
    r[ip][0]+=int(n1)
    r[ip][1]+=int(n2)

print r

【讨论】：

【解决方案3】：

编辑了@piokuc 的答案，因为他专门询问了 IP，而不是日期+IP。拆分和求和仅在 ip 上完成。

from collections import Counter
import re
data=\
"""2012.09:142.134.35.17 10 12
2013.09:142.134.35.17 4 4
2013.09:63.151.172.31 52 13
2013.09:63.151.172.31 10 10
2013.09:63.151.172.31 16 32
2013.10:62.151.172.31 16 32"""


data = [re.split('[: ]',d) for d in data.split('\n')]
print data
sum1 = Counter()
sum2 = Counter()
for d in data:
    sum1[d[1]] += int(d[2])
    sum2[d[1]] += int(d[3])

for date_ip in sum1.keys():
    print date_ip, sum1[date_ip], sum2[date_ip]

【讨论】：

【解决方案4】：

@piokuc 的回答很好；这是一个简单的解决方案，对于初学者来说应该很容易理解，而无需进入 Counter 的标准库。

您要查找的结果由一组两个（有序）值构成，每个值与一个唯一标签（date:ip 值）相关联。在 Python 中，此类任务的基本数据结构是 dict（字典）。

最好在打开文件时确保在不再需要它们时关闭它们。我将为此使用with 语句；如果您对它的工作原理感兴趣，this is a good resource，但如果您想不通，请记住，一旦 with 块结束，您正在使用的文件将被关闭自动。

这是代码 - 请记住，您从文件中读取的所有内容都是字符，这意味着您必须在对数字执行任何类型的数学运算之前适当地转换数字：

result = {}                                        # Create your empty dict

with open('full_megalog.txt', 'r') as file:        # Open your input file

    for line in file:                              # In each line of the file:

        date_ip, num1, num2 = line.split()         # 1:  Get key and 2 values

        if date_ip in result:                      # 2:  Check if key exists

            result[date_ip][0] += int(num1)        # 3a: If yes, add num1, num2
            result[date_ip][1] += int(num2)        #     to current sum.

        else:                                      # 3b: If no, add the new key
            result[date_ip] = int(num1), int(num2) #     and values to the dict

现在您有一个result 字典，它将num1 和num2 的总和与每个对应的date_ip 相关联。您可以使用result[date_ip] 访问(num1, num2) 元组，并且可以使用result[date_ip][0] 和result[date_ip][1] 单独访问这些值。

如果您想以原始格式编写此代码，则必须将每个键和两个值与一个空格字符连接在一起；冗长、容易评论的方式可能是这样的：

with open('condensed_log_file.txt', 'w') as out:       # open the output file;

    for date_ip in result:                             # loop through the keys;

        out.write(                                     # write to the logfile:

                  ' '.join(                            # joined by a space char,
                           (date_ip,                   # the key (date_ip);
                            str(result[date_ip][0]),   # the 1st value (num1);
                            str(result[date_ip][1]))   # & the 2nd value (num2).
                          )

我很好奇 piokuc 非常整洁干净的方法与我自己的幼稚方法的性能比较。这没有打印和输出文件写入语句：

>>> from timeit import timeit
>>> a = open("airthomas.py", "r")
>>> a = a.read()
>>> p = open("piokuc.py", "r")
>>> p = p.read()
>>> timeit(p)
115.33428788593137
>>> timeit(a)
103.95908962552267

所以，如果您需要在大量小文件上运行它，使用Counter() 可能会慢一点。当然，您可能只需要在一个或几个非常大的文件上运行它——在这种情况下，请自己进行测试！ ;P

【讨论】：

【解决方案5】：

您可以使用字典来帮助您解决问题，如下所示：

#assuming that your addresses are stored in a file:
with open('addresses.txt', 'r') as f:
    lines = f.readlines()
    ele = {}

    for line in lines:
        addr = line.split()
        s = [int(addr[1]), int(addr[2])]
        if addr[0] in ele:
            ele[addr[0]][0] += s[0]
            ele[addr[0]][1] += s[1]
        else:
            ele[addr[0]] = s

这给了你：

{'2013.09:142.134.35.17': [14, 16],
 '2013.09:63.151.172.31': [78, 55],
 '2013.10:62.151.172.31': [16, 32]}

【讨论】：