在Python中，如何从多个列表中的单词创建和排列单词，并出现单词[重复]答案

【问题标题】：In Python, how do I create and array from words from multiple lists, with word occurrences [duplicate]在Python中，如何从多个列表中的单词创建和排列单词，并出现单词[重复]
【发布时间】：2019-06-27 23:41:40
【问题描述】：

我有一个 JSON 文件，其中包含多个带有文本字段的对象：

{
"messages": 
[
    {"timestamp": "123456789", "timestampIso": "2019-06-26 09:51:00", "agentId": "2001-100001", "skillId": "2001-20000", "agentText": "That customer was great"},
    {"timestamp": "123456789", "timestampIso": "2019-06-26 09:55:00", "agentId": "2001-100001", "skillId": "2001-20001", "agentText": "That customer was stupid\nI hope they don't phone back"},
    {"timestamp": "123456789", "timestampIso": "2019-06-26 09:57:00", "agentId": "2001-100001", "skillId": "2001-20002", "agentText": "Line number 3"},
    {"timestamp": "123456789", "timestampIso": "2019-06-26 09:59:00", "agentId": "2001-100001", "skillId": "2001-20003", "agentText": ""}
]
}

我只对“agentText”字段感兴趣。

我基本上需要删除 agentText 字段中的每个单词并计算单词的出现次数。

所以我的python代码：

import json

with open('20190626-101200-text-messages.json') as f:
  data = json.load(f)

for message in data['messages']:
    splittext= message['agentText'].strip().replace('\n',' ').replace('\r',' ')
    if len(splittext)>0:
        splittext2 = splittext.split(' ')
        print(splittext2)

给我这个：

['That', 'customer', 'was', 'great']
['That', 'customer', 'was', 'stupid', 'I', 'hope', 'they', "don't", 'phone', 'back']
['Line', 'number', '3']

如何将每个单词添加到具有计数的数组中？太喜欢了；

That 2
customer 2
was 2
great 1
..

等等？

【问题讨论】：

标签： python json

【解决方案1】：

看看这个。

data = {
    "messages": 
        [
            {"timestamp": "123456789", "timestampIso": "2019-06-26 09:51:00", "agentId": "2001-100001", "skillId": "2001-20000", "agentText": "That customer was great"},
            {"timestamp": "123456789", "timestampIso": "2019-06-26 09:55:00", "agentId": "2001-100001", "skillId": "2001-20001", "agentText": "That customer was stupid\nI hope they don't phone back"},
            {"timestamp": "123456789", "timestampIso": "2019-06-26 09:57:00", "agentId": "2001-100001", "skillId": "2001-20002", "agentText": "Line number 3"},
            {"timestamp": "123456789", "timestampIso": "2019-06-26 09:59:00", "agentId": "2001-100001", "skillId": "2001-20003", "agentText": ""}
        ]
}

var = []

for row in data['messages']:
    new_row = row['agentText'].split()
    if new_row:
        var.append(new_row)

temp = dict()

for e in var:
    for j in e:
        if j in temp:
            temp[j] = temp[j] + 1
        else:
            temp[j] = 1

for key, value in temp.items():
    print(f'{key}: {value}')

【讨论】：

这种工作，但是，我的 3 条消息行没有用逗号（，）分隔 - 它们只是 for 循环中的一行接一行。如何将一行附加到另一行并用逗号分隔？
data.split() 它将帮助您用逗号分隔
看上面我已经对我的回复进行了更改。使用 split 以逗号分隔

【解决方案2】：

data = '''{"messages":
[
    {"timestamp": "123456789", "timestampIso": "2019-06-26 09:51:00", "agentId": "2001-100001", "skillId": "2001-20000", "agentText": "That customer was great"},
    {"timestamp": "123456789", "timestampIso": "2019-06-26 09:55:00", "agentId": "2001-100001", "skillId": "2001-20001", "agentText": "That customer was stupid I hope they don't phone back"},
    {"timestamp": "123456789", "timestampIso": "2019-06-26 09:57:00", "agentId": "2001-100001", "skillId": "2001-20002", "agentText": "Line number 3"},
    {"timestamp": "123456789", "timestampIso": "2019-06-26 09:59:00", "agentId": "2001-100001", "skillId": "2001-20003", "agentText": ""}
]
}
'''

import json
from collections import Counter
from pprint import pprint

def words(data):
    for m in data['messages']:
        yield from m['agentText'].split()

c = Counter(words(json.loads(data)))
pprint(c.most_common())

打印：

[('That', 2),
 ('customer', 2),
 ('was', 2),
 ('great', 1),
 ('stupid', 1),
 ('I', 1),
 ('hope', 1),
 ('they', 1),
 ("don't", 1),
 ('phone', 1),
 ('back', 1),
 ('Line', 1),
 ('number', 1),
 ('3', 1)]

【讨论】：

似乎不喜欢：c = Counter(words(json.loads(data))) pprint(c.most_common()) 它的 pprint 带有红色下划线