根据不同列中的条件为 True 的索引计算 1 列中值的出现次数答案

【问题标题】：Count occurrences of values in 1 column based on index from where condition in different columns is True根据不同列中的条件为 True 的索引计算 1 列中值的出现次数
【发布时间】：2014-08-26 21:55:35
【问题描述】：

我有 4 列中的数据，如下所示：

day month year value
 1    1   1880  1
Etc. for each day in a month for each month for 1880-2013.
Values range from 1 to 8

数据被存储

我想计算给定月份和年份每个“值”出现的次数。所以是这样的：

data = np.loadtxt('/path/to/file', skiprows=1)

def magic_func(data,year,month):
for each in zip(data[:,1], data[:,2]):
    if each == (month,year):
        #actual magic

例如输出：

[(1,0), (2,30), (3,0), (4,0), (5,1), (6,0), (7,0), (8,0)]

所以我想我需要一种方法来索引 data 的最后一列，基于上面 if 为 True 然后计数的索引（可能是 np.bincount？ ) 每个“值”出现多少次。虽然我在代码方面没有取得太大进展......

谁能帮忙？

编辑：我并不热衷于发布任何真实数据，因为即使是 1 年的 365 个数据点！数据存储在由制表符分隔的 .txt 文件中，但我也可以将其存储为 csv。下面是一个（非常小的）样本。虽然复制时并没有完全保留正确的选项卡，但实际文件显然在适当的名称下具有每一列......我最初所做的是将数据表单复制到其他地方并使用 Excel 的文本到列来创建我的 .txt。

day     month   year    value
1   1   1956    3
2   1   1956    3
3   1   1956    8
4   1   1956    8
5   1   1956    8
6   1   1956    3
7   1   1956    1
8   1   1956    1
9   1   1956    3
10  1   1956    3
11  1   1956    3
12  1   1956    3
13  1   1956    1
14  1   1956    1
15  1   1956    3
16  1   1956    2
17  1   1956    3
18  1   1956    3
19  1   1956    3
20  1   1956    3
21  1   1956    3
22  1   1956    3
23  1   1956    3
24  1   1956    3
25  1   1956    1
26  1   1956    7
27  1   1956    4
28  1   1956    4
29  1   1956    4
30  1   1956    1
31  1   1956    1

我想要的是“月”为“价值”下的所有项目，例如1 在这种情况下计算每个唯一“值”出现的次数。在这种特殊情况下，输出将是：

[(3,16), (8,3), (1,7), (2,1), (7,1), (4,3)] # format is (value, count)
# or if displaying all possible values
[(1,7), (2,1), (3,16), (4,3), (5,0), (6,0), (7,1), (8,3)]

这有意义吗？

干杯！

【问题讨论】：

您确定应该是(4,1) 而不是(4,3)？在 1956 年的第一个月有 /are/ 3 个 4 出现
啊，是的，我的错，虽然看起来你明白我的目的是什么:)。
在这种情况下，我的回答应该可以帮助您完成

标签： python numpy

【解决方案1】：

import csv
import collections

def count(infilepath):
    answer = collections.defaultdict(lambda : collections.defaultdict(lambda : collections.defaultdict(int)))
    with open(infilepath) as infile:
        infile.readline()
        for line in csv.reader(infile, delimiter='\t'):
            *_rest, month, year, value = [int(i) for i in line]
            answer[year][month][value] += 1
    return answer

用法：

counts = count('/path/to/input')
for year in sorted(counts):
    yeard = counts[year]
    for month in sorted(yeard):
        monthd = yeard[month]
        for day in sorted(monthd):
            occ = monthd[day]
            print("In year %d, in month %d, there were %d occurrences of day %d" %(year, month, val, day))

【讨论】：

我添加了一些示例数据。