如何计算字典中单词的长度答案

【问题标题】：How to count the length of words in a dictionary如何计算字典中单词的长度
【发布时间】：2021-11-03 00:24:15
【问题描述】：

我有一个这样的字典列表：

myList = [
    {
        'id':1,
        'text':[
            'I like cheese.', 
            'I love cheese.',
            'oh!'
        ],
        'text_2': [
            ('david', 'david', 'I do not like cheese.'),
            ('david', 'david', 'cheese is good.')
        ]
    },
    {
        'id':2,
        'text':[
            'I like strawberry.',
            'I love strawberry'
        ],
        'text_2':[
            ('alice', 'alice', 'strawberry is good.'),
            ('alice', 'alice', ' strawberry is so so.')
        ]
    }
]

我想通过“id”计算“text”和“text_2”的元素数量和长度。理想的输出是：

myList = [
    {
        'id':1,
        'text':(3,7),
        'text_2': (2,8)   
    },
    {
        'id':2,
        'text':(2,6),
        'text_2':(2,7)    
    }
]

'text':(3,7) 表示：3 个元素（'I like cheese.'、'I love cheese.'、'oh!'）； 7个字（我喜欢奶酪，我喜欢奶酪哦）

'text_2':(2,8) 表示：2个元素(('david','david','我不喜欢奶酪。'),('david','david','cheese is good. ')); 8个字（I, do, not, like, cheese, cheese, is good））

有什么建议吗？

【问题讨论】：

您也可以发布您的代码吗？
你能解决像['I like cheese.', 'I love cheese.', 'oh!']这样的单个列表的问题吗？如果可以，那么在您自己的估计中，是什么阻止您将该技术应用于整个数据结构？如果你不能，为了解决这个问题，你到底需要知道什么？请阅读How to Ask 和meta.stackoverflow.com/questions/284236/…，并注意您应该先尝试自己解决问题。

标签： python list dictionary word-count

【解决方案1】：

例如，像这样：

from itertools import chain
from string import punctuation

def remove_punctuation(text):
    return "".join(filter(lambda x: x not in punctuation, text))

def count_items_and_words(items, label):
    items_cnt = len(items)
    
    if label == "text":
        total_text = " ".join(items)
    elif label == "text_2":
        total_text = " ".join(chain(*[it[2:] for it in items]))
    total_text_clean = remove_punctuation(total_text)
    
    words_cnt = len(total_text_clean.split())
    return (items_cnt, words_cnt)

def count_all(my_list):
    results = list()
    for it in my_list:
        if not isinstance(it, dict):
            continue
        res = {"id": it["id"]}
        for label in "text", "text_2":
            res[label] = count_items_and_words(it[label], label)
        results.append(res)
    return results

results = count_all(myList)
results

输出：

[{'id': 1, 'text': (3, 7), 'text_2': (2, 8)},
 {'id': 2, 'text': (2, 6), 'text_2': (2, 7)}]

【讨论】：

【解决方案2】：

如果您是新手，我的回答很难消化，但我希望您能找到一些对您的未来有用的不错的组合......而且还因为您没有提供任何尝试。

' '.join(my_list) 使列表元素的字符串由空格分隔
my_string.split() 通过在单个空格处剪切来从字符串中列出一个列表（-> 这样你就可以数数了）
set(my_list) 移除一个元素的多次出现
itertools.chain 函数连接可迭代对象，将列表中的元组合并为单个对象
列表理解，例如[i for i in range(10) if i > 5]

由于您没有指定任何规则来处理同一元素的多次出现，我只计算一次（所以 'david','david' 被计为 1）

我对您的建议请求的回答是分而治之，将一个大问题分成小问题，解决它们，将它们粘合在一起。

import itertools as it

myList = # see dictionary in the question

for d in myList:
    for k, v in d.items():
        if isinstance(v, list):
           pair = len(v), len(' '.join(v).split()) if isinstance(v[0], str) else len(' '.join([t for t in set(it.chain(*v))]).split())
            print(pair)
        else:
            print(k, v)

输出

id 1
(3, 7)

(2, 9)

id 2
(2, 6)

(2, 8)

【讨论】：

@Alina 我忘了，程序只是打印到屏幕上。要存储输出，您可以像上一个问题一样进行操作
ìtertools 不是强制性的，但我只是不想将列表理解分解为多行块的代码

【解决方案3】：

见下文

lst = [
    {
        'id': 1,
        'text': [
            'I like cheese.',
            'I love cheese.',
            'oh!'
        ],
        'text_2': [
            ('david', 'david', 'I do not like cheese.'),
            ('david', 'david', 'cheese is good.')
        ]
    },
    {
        'id': 2,
        'text': [
            'I like strawberry.',
            'I love strawberry'
        ],
        'text_2': [
            ('alice', 'alice', 'strawberry is good.'),
            ('alice', 'alice', ' strawberry is so so.')
        ]
    }
]
out = []
for entry in lst:
    out.append({})
    for k, v in entry.items():
        if k == 'id':
            out[-1][k] = v
        elif k == 'text':
            out[-1][k] = (len(v), sum(len(x.split()) for x in v))
        else:
            out[-1][k] = (len(v),sum(len(x) for x in v))
print(out)

输出

[{'id': 1, 'text': (3, 7), 'text_2': (2, 6)}, {'id': 2, 'text': (2, 6), 'text_2': (2, 6)}]

【讨论】：