【问题标题】:Merge the json objects with same key value pair in a file using python使用python在文件中合并具有相同键值对的json对象
【发布时间】:2020-06-03 05:41:40
【问题描述】:

我有一个包含对象的文件,如下所示。

例如:Input.txt

1. {"Cp": "1000", "Af": "CBS", "Bp": "150", "Vt": "channel", "Ti": "Q2", "Cs": "K11HE-D", "Tg": "BROADCAST<>LOCAL<>HD", "Fd": "dish#K11HE-D", "Pi": "CHAF2", "Gi": "RV1688668060"}

2. {"Cp": "1000", "Af": "CBS", "Bp": "150", "Vt": "channel", "Ti": "Q2", "Cs": "K08JV-D", "Tg": "BROADCAST<>LOCAL<>HD", "Fd": "dish#K08JV-D", "Pi": "CHAF2", "Gi": "RV1714277379"}

3. {"Cp": "1000", "Af": "CBS", "Bp": "150", "Vt": "channel", "Ti": "ABCD", "Cs": "K20LT-D", "Tg": "BROADCAST<>LOCAL<>HD", "Fd": "dish#K20LT-D", "Pi": "CHAF2", "Gi": "RV1714278093"}

4. {"Cp": "1000", "Af": "CBS", "Bp": "150", "Vt": "channel", "Ti": "Q2", "Cs": "K08OW-D", "Tg": "BROADCAST<>LOCAL<>HD", "Fd": "dish#K08OW-D", "Pi": "CHAF2", "Gi": "RV1714277380"}

该文件包含数千行。

我想将文件中的所有这些 json 对象分组,该文件对键“Ti”具有相同的值。

下面是一个例子来详细说明我的要求。

您可以从上面的示例文件中看到,有 3 行与 key "Ti" 的值相同。即第 1、2 和 4 行。它们将“Ti”的所有值都设为“Q2”。

我需要一种方法来加入这些 JSON 对象,并且我想创建一个输出文件,如下所示。

例如:Output.txt

1. {"Cp": "[1000, 1000, 1000]", "Af": "['CBS', 'CBS', 'CBS']", "Bp": "[150, 150, 150]", "Vt": "['channel', 'channel', 'channel']", "Ti": "['Q2', 'Q2', 'Q2']", "Cs": "['K11HE-D', 'K08JV-D', 'K08OW-D' ]", "Tg": "['BROADCAST<>LOCAL<>HD', 'BROADCAST<>LOCAL<>HD, 'BROADCAST<>LOCAL<>HD]", "Fd": "['dish#K11HE-D', 'dish#K08JV-D', 'dish#K08OW-D']", "Pi": "['CHAF2','CHAF2','CHAF2']", "Gi": "['RV1688668060', 'RV1714277379', 'RV1714277380']"}

2. {"Cp": "[1000, 1000, 1000]", "Af": "['CBS', 'CBS', 'CBS']", "Bp": "[150, 150, 150]", "Vt": "['channel', 'channel', 'channel']", "Ti": "['Q2', 'Q2', 'Q2']", "Cs": "['K11HE-D', 'K08JV-D', 'K08OW-D' ]", "Tg": "['BROADCAST<>LOCAL<>HD', 'BROADCAST<>LOCAL<>HD, 'BROADCAST<>LOCAL<>HD]", "Fd": "['dish#K11HE-D', 'dish#K08JV-D', 'dish#K08OW-D']", "Pi": "['CHAF2','CHAF2','CHAF2']", "Gi": "['RV1688668060', 'RV1714277379', 'RV1714277380']"}

3. {"Cp": "1000", "Af": "CBS", "Bp": "150", "Vt": "channel", "Ti": "ABCD", "Cs": "K20LT-D", "Tg": "BROADCAST<>LOCAL<>HD", "Fd": "dish#K20LT-D", "Pi": "CHAF2", "Gi": "RV1714278093"}

4. {"Cp": "[1000, 1000, 1000]", "Af": "['CBS', 'CBS', 'CBS']", "Bp": "[150, 150, 150]", "Vt": "['channel', 'channel', 'channel']", "Ti": "['Q2', 'Q2', 'Q2']", "Cs": "['K11HE-D', 'K08JV-D', 'K08OW-D' ]", "Tg": "['BROADCAST<>LOCAL<>HD', 'BROADCAST<>LOCAL<>HD, 'BROADCAST<>LOCAL<>HD]", "Fd": "['dish#K11HE-D', 'dish#K08JV-D', 'dish#K08OW-D']", "Pi": "['CHAF2','CHAF2','CHAF2']", "Gi": "['RV1688668060', 'RV1714277379', 'RV1714277380']"}

请告诉我,我怎样才能做到这一点。

【问题讨论】:

  • 我能想到的最简单的方法是您可以将 json 加载到数据帧中,并通过组合具有相同“Ti”值的行来进行值操作,然后将数据帧转换回 json。这比尝试按原样操作 json 更容易。如果您共享原始 json 内容而不是在问题中对其进行格式化,并详细说明您尝试过的内容可能会更好

标签: python json merge


【解决方案1】:

你需要:

  1. 将字符串转换成字典
  2. 收集 Ti 值
  3. 遍历字典元素并根据 Ti 收集数据
import re

raw_data = open('test.txt', 'r')

data_list = raw_data.read().splitlines()
data_list = list(filter(None, data_list))

# create list of Ti values
ti_list = []
for item in data_list:
    number = re.search('\d+\.', item).group(0)
    row = re.sub('\d+\. ', '', item)
    row_dictionary = eval(row)
    ti_list.append(row_dictionary.get('Ti'))


# collect data into new dictionary
data = {}
i = 1
for ti in ti_list:
    raw = {}
    for item in data_list:
        number = re.search('\d+\.', item).group(0)
        row = re.sub('\d+\. ', '', item)
        row_dictionary = eval(row)

        if row_dictionary.get('Ti') == ti:
            for key, value in row_dictionary.items():
                raw.setdefault(key, []).append(value)

    data[str(i)+'.'] = raw
    i += 1

输出:

1. {'Cp': ['1000', '1000', '1000'], 'Af': ['CBS', 'CBS', 'CBS'], 'Bp': ['150', '150', '150'], 'Vt': ['channel', 'channel', 'channel'], 'Ti': ['Q2', 'Q2', 'Q2'], 'Cs': ['K11HE-D', 'K08JV-D', 'K08OW-D'], 'Tg': ['BROADCAST<>LOCAL<>HD', 'BROADCAST<>LOCAL<>HD', 'BROADCAST<>LOCAL<>HD'], 'Fd': ['dish#K11HE-D', 'dish#K08JV-D', 'dish#K08OW-D'], 'Pi': ['CHAF2', 'CHAF2', 'CHAF2'], 'Gi': ['RV1688668060', 'RV1714277379', 'RV1714277380']}
2. {'Cp': ['1000', '1000', '1000'], 'Af': ['CBS', 'CBS', 'CBS'], 'Bp': ['150', '150', '150'], 'Vt': ['channel', 'channel', 'channel'], 'Ti': ['Q2', 'Q2', 'Q2'], 'Cs': ['K11HE-D', 'K08JV-D', 'K08OW-D'], 'Tg': ['BROADCAST<>LOCAL<>HD', 'BROADCAST<>LOCAL<>HD', 'BROADCAST<>LOCAL<>HD'], 'Fd': ['dish#K11HE-D', 'dish#K08JV-D', 'dish#K08OW-D'], 'Pi': ['CHAF2', 'CHAF2', 'CHAF2'], 'Gi': ['RV1688668060', 'RV1714277379', 'RV1714277380']}
3. {'Cp': ['1000'], 'Af': ['CBS'], 'Bp': ['150'], 'Vt': ['channel'], 'Ti': ['ABCD'], 'Cs': ['K20LT-D'], 'Tg': ['BROADCAST<>LOCAL<>HD'], 'Fd': ['dish#K20LT-D'], 'Pi': ['CHAF2'], 'Gi': ['RV1714278093']}
4. {'Cp': ['1000', '1000', '1000'], 'Af': ['CBS', 'CBS', 'CBS'], 'Bp': ['150', '150', '150'], 'Vt': ['channel', 'channel', 'channel'], 'Ti': ['Q2', 'Q2', 'Q2'], 'Cs': ['K11HE-D', 'K08JV-D', 'K08OW-D'], 'Tg': ['BROADCAST<>LOCAL<>HD', 'BROADCAST<>LOCAL<>HD', 'BROADCAST<>LOCAL<>HD'], 'Fd': ['dish#K11HE-D', 'dish#K08JV-D', 'dish#K08OW-D'], 'Pi': ['CHAF2', 'CHAF2', 'CHAF2'], 'Gi': ['RV1688668060', 'RV1714277379', 'RV1714277380']}

【讨论】:

  • 感谢您的快速回复。
  • @KiranDas 这意味着你的数据行不包含number.
  • 你的代码工作得很好。但是,我看到您也在考虑行号。实际上,我的文件没有行号。我只是使用行号来突出说明。您能否建议,我们如何在不考虑行号的情况下实现这一目标。非常感谢您的帮助。
  • 以下 3 行准确显示文件中的行是如何出现的 {"Cp": "1000", "Af": "CBS", "Bp": "150", "Vt" :“频道”,“Ti”:“Q2”,“Cs”:“K11HE-D”,“Tg”:“BROADCASTLOCALHD”,“Fd”:“dish#K11HE-D”,“ Pi”:“CHAF2”,“Gi”:“RV1688668060”} {“Cp”:“1000”,“Af”:“CBS”,“Bp”:“150”,“Vt”:“通道”,“Ti ":"Q2","Cs":"K08JV-D","Tg":"BROADCASTLOCALHD","Fd":"dish#K08JV-D","Pi":"CHAF2", “Gi”:“RV1714277379”} {“Cp”:“1000”,“Af”:“CBS”,“Bp”:“150”,“Vt”:“通道”,“Ti”:“ABCD”,“ Cs”:“K20LT-D”,“Tg”:“广播本地HD”,“Fd”:“dish#K20LT-D”,“Pi”:“CHAF2”,“Gi”:“RV1714278093” }
  • 是正确的。数据行不包含数字。但是,如果我将行号插入所有行,那应该可以解决问题。非常感谢您的帮助。 :)
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 2022-11-01
  • 1970-01-01
  • 2018-06-04
  • 2015-08-30
  • 2019-07-27
  • 2018-03-09
  • 1970-01-01
相关资源
最近更新 更多