蟒蛇：没有。字符串中每个字符的出现次数[重复]答案

【问题标题】：Python: no. of occurences of each character in a string [duplicate]蟒蛇：没有。字符串中每个字符的出现次数[重复]
【发布时间】：2012-10-01 13:24:06
【问题描述】：

可能重复：
how to get the number of occurrences of each character using python

获取字符串中每个字符的计数并存储它的最佳方法是什么（我为此使用字典 - 这个选择会产生很大的不同吗？）？我想到的几种方法：

for character in string:
    if character in characterCountsDict:
        characterCountsDict[character] += 1
    else:
        characterCountsDict[character] = 1

character = 0
while character < 127:
    characterCountsDict[str(unichr(character))] = string.count(str(unichr(character))
    character += 1

我认为第二种方法更好... 但是它们中的任何一个都好吗？有没有更好的方法来做到这一点？

【问题讨论】：

标签： python string collections

【解决方案1】：

>>> from collections import Counter
>>> Counter("asdasdff")
Counter({'a': 2, 's': 2, 'd': 2, 'f': 2})

请注意，您可以像字典一样使用Counter 对象。

【讨论】：

哦，等等……对于一个 20MB 的字符串，它实际上比方法 2(12 秒) 花费了更多的时间(30 秒)？
你会生成一个类似"asdasd" * 200的输入字符串，对吧？
我把一个文件读成一个字符串：striing = file.read()（文件大约20MB）
嗯，很有趣。重要问题：您要处理什么样的数据？你确定只需要0-127中的字符吗？
另外，如果您只需要可打印字符，只需使用{c: string.count(c) for c in printable}，其中printable 是从模块string 导入的

【解决方案2】：

如果您对最有效的方式感兴趣，它似乎是这样的：

from collections import defaultdict

def count_chars(s):
    res = defaultdict(int)
    for char in s:
        res[char] += 1
    return res

时间安排：

from collections import Counter, defaultdict

def test_counter(s):
    return Counter(s)

def test_get(s):
    res = {}
    for char in s:
        res[char] = res.get(char, 0) + 1
    return res

def test_in(s):
    res = {}
    for char in s:
        if char in res:
            res[char] += 1
        else:
            res[char] = 1
    return res

def test_defaultdict(s):
    res = defaultdict(int)
    for char in s:
        res[char] += 1
    return res


s = open('/usr/share/dict/words').read()
#eof

import timeit

test = lambda f: timeit.timeit(f + '(s)', setup, number=10)
setup = open(__file__).read().split("#eof")[0]
results = ['%.4f %s' % (test(f), f) for f in dir() if f.startswith('test_')]
print  '\n'.join(sorted(results))

结果：

0.8053 test_defaultdict
1.3628 test_in
1.6773 test_get
2.3877 test_counter

【讨论】：

谢谢 :) 肯定是过度杀伤了答案 :)
@JayanthKoushik: ;) 很早就想知道我......这就是为什么。
哇，我不敢相信 counter 表现如此糟糕，因为 Counter 是一个字典
@wim: Counter 在更新自己的时候使用self.get，所以它的性能应该和test_get差不多