功能聚合无法正常工作答案

【问题标题】：Aggregation in function not working right功能聚合无法正常工作
【发布时间】：2021-01-01 14:50:44
【问题描述】：

您好，所以我有一个 python 函数正在工作，但不是我期望的方式，我不确定我的代码在哪里关闭。

def preprocess(text):
    case = truecase.get_true_case(text)
    doc = nlp(case)
    return doc

def summarize_texts(texts):
    actions = {}
    entities = {}
    for item in texts:
        doc = preprocess(item)
        for token in doc:
            if token.pos_ == "VERB":
                actions[str.lower(token.text)] = actions.get(token.text, 0) +1
        for token in doc.ents:
            entities[token.label_] = [token.text]
            if token.text not in entities[token.label_]:
                entities[token.label_].append(token.text)
    return {
        'actions': actions,
        'entities': entities
    }

当我为句子列表调用函数时，这是我得到的输出：

docs = [
    "Play something by Billie Holiday, and play again",
    "Set a timer for five minutes",
    "Play it again, Sam"
]

summarize_texts(docs)

output: {'actions': {'play': 1, 'set': 1},
 'entities': {'PERSON': ['Sam'], 'TIME': ['five minutes']}}

它正在寻找操作键和实体键，但我遇到了两个问题。

它没有计算正确的操作
它只存储每个实体的最后一个值。

输出应该是：

output: {'actions': {'play': 3, 'set': 1},
 'entities': {'PERSON': ['Billie','Sam'], 'TIME': ['five minutes']}}

任何帮助都会很棒！我有一种感觉，这很容易，但太费脑筋了，看不到它。

【问题讨论】：

标签： python aggregate-functions

【解决方案1】：

您正在替换数据结构，而不仅仅是更新值。如果此时不存在，您只想创建一个新容器。

对于动作：

if token.pos_ == "VERB":
    action_key = str.lower(token.text)

    if action_key not in actions:
        actions[action_key] = 0

    actions[action_key] += 1

对于实体：

for token in doc.ents:
    entity_key = token.label_
    entity_value = token.text

    if entity_key not in entities:
        entities[entity_key] = []

    if entity_value not in entities[entity_key]:
        entities[entity_key].append(entity_value)

请注意，您可以使用defaultdict 简化此逻辑。您也可以使用集合而不是每次都检查列表是否有重复项

actions = defaultdict(int)
entities = defaultdict(set)
...

if token.pos_ == "VERB":
    actions[str.lower(token.text)] += 1
...

for token in doc.ents:
    entities[token.label_].add(token.text)

【讨论】：

谢谢@flakes！唯一的问题是输出错误。我认为回报是错误的。 ``` 输出：{'actions': {'play': 14, 'set': 6}, 'entities': {'PERSON': ['Billie holiday', 'Sam'], 'TIME': ['五分钟']}}" 播放次数已过，设置应为 'play' : 3, 'set' : 1

【解决方案2】：

您在将标记转换为小写方面不一致。分配给字典时使用小写版本，但调用 actions.get() 时使用原始大小写。因此，如果令牌大小写混合，您将在调用actions.get() 时继续获取默认值，并继续将其设置为 1。

actions[token.text.lower()] = actions.get(token.text.lower(), 0) +1

【讨论】：