【问题标题】:Making a histogram from Json data从 Json 数据制作直方图
【发布时间】:2021-11-28 20:16:12
【问题描述】:

我有类似这样的 JSON 格式的数据

{
   "ts": 1393631983,
   "visitor_uuid": "ade7e1f63bc83c66",
   "visitor_source": "external",
   "visitor_device": "browser",
   "visitor_useragent": "Opera/9.80 (Windows NT 6.1) Presto/2.12.388 Version/12.16",
   "visitor_ip": "b5af0ba608ab307c",
   "visitor_country": "BR",
   "visitor_referrer": "53c643c16e8253e7",
   "env_type": "reader",
   "env_doc_id": "140222143932-91796b01f94327ee809bd759fd0f6c76",
   "event_type": "pagereadtime",
   "event_readtime": 1010,
   "subject_type": "doc",
   "subject_doc_id": "140222143932-91796b01f94327ee809bd759fd0f6c76",
   "subject_page": 3
} {
    "ts": 1393631983,
    "visitor_uuid": "232eeca785873d35",
    "visitor_source": "internal",
    "visitor_device": "browser",
    "visitor_useragent": "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.117 Safari/537.36",
    "visitor_ip": "fcf9c67037f993f0",
    "visitor_country": "MX",
    "visitor_referrer": "63765fcd2ff864fd",
    "env_type": "stream",
    "env_ranking": 10,
    "env_build": "1.7.118-b946",
    "env_name": "explore",
    "env_component": "editors_picks",
    "event_type": "impression",
    "subject_type": "doc",
    "subject_doc_id": "100713205147-2ee05a98f1794324952eea5ca678c026",
    "subject_page": 1
}

我的任务要求我找到与用户输入匹配的 subject_doc_id,然后显示一个直方图,显示查看该文档的国家/地区。

我已经能够通过我的代码阅读数据,并且我也熟悉如何绘制直方图,但我需要有关如何计算国家/地区并将其显示在直方图中的帮助。

例如,上面的数据中存在“visitor_country”:“MX”和“visitor_country”:“BR”,所以我想要每个国家的计数。

关于如何实现这一目标的任何想法?

【问题讨论】:

    标签: python json histogram


    【解决方案1】:

    您的 json 文件不是正确的 json 文件。 您需要在文件开头添加“[”,在文件末尾添加“]”,并用逗号分隔每个“{}”部分。 这是一个例子:

    数据.json

    [
        {
       "ts": 1393631983,
       "visitor_uuid": "ade7e1f63bc83c66",
       "visitor_source": "external",
       "visitor_device": "browser",
       "visitor_useragent": "Opera/9.80 (Windows NT 6.1) Presto/2.12.388 Version/12.16",
       "visitor_ip": "b5af0ba608ab307c",
       "visitor_country": "BR",
       "visitor_referrer": "53c643c16e8253e7",
       "env_type": "reader",
       "env_doc_id": "140222143932-91796b01f94327ee809bd759fd0f6c76",
       "event_type": "pagereadtime",
       "event_readtime": 1010,
       "subject_type": "doc",
       "subject_doc_id": "140222143932-91796b01f94327ee809bd759fd0f6c76",
       "subject_page": 3
    }, {
        "ts": 1393631983,
        "visitor_uuid": "232eeca785873d35",
        "visitor_source": "internal",
        "visitor_device": "browser",
        "visitor_useragent": "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.117 Safari/537.36",
        "visitor_ip": "fcf9c67037f993f0",
        "visitor_country": "MX",
        "visitor_referrer": "63765fcd2ff864fd",
        "env_type": "stream",
        "env_ranking": 10,
        "env_build": "1.7.118-b946",
        "env_name": "explore",
        "env_component": "editors_picks",
        "event_type": "impression",
        "subject_type": "doc",
        "subject_doc_id": "100713205147-2ee05a98f1794324952eea5ca678c026",
        "subject_page": 1
    }, {
        "ts": 1393631983,
        "visitor_uuid": "232eeca785873d35",
        "visitor_source": "internal",
        "visitor_device": "browser",
        "visitor_useragent": "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.117 Safari/537.36",
        "visitor_ip": "fcf9c67037f993f0",
        "visitor_country": "PL",
        "visitor_referrer": "63765fcd2ff864fd",
        "env_type": "stream",
        "env_ranking": 10,
        "env_build": "1.7.118-b946",
        "env_name": "explore",
        "env_component": "editors_picks",
        "event_type": "impression",
        "subject_type": "doc",
        "subject_doc_id": "100713205147-2ee05a98f1794324952eea5ca678c026",
        "subject_page": 1
    }
    , {
        "ts": 1393631983,
        "visitor_uuid": "232eeca785873d35",
        "visitor_source": "internal",
        "visitor_device": "browser",
        "visitor_useragent": "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.117 Safari/537.36",
        "visitor_ip": "fcf9c67037f993f0",
        "visitor_country": "PL",
        "visitor_referrer": "63765fcd2ff864fd",
        "env_type": "stream",
        "env_ranking": 10,
        "env_build": "1.7.118-b946",
        "env_name": "explore",
        "env_component": "editors_picks",
        "event_type": "impression",
        "subject_type": "doc",
        "subject_doc_id": "100713205147-2ee05a98f1794324952eea5ca678c026",
        "subject_page": 1
    }
    ]
    

    之后对于 data.json 文件中的每个元素,我正在检查它是否与我们的输入 subject_doc_id 匹配。如果我们得到匹配,我会将其附加到匹配列表中,这样我们就可以收集直方图的数据。之后,我想根据唯一国家/地区的数量获得一些垃圾箱,因此我正在创建一个唯一的国家列表,然后我正在检查它的长度。

    import matplotlib.pyplot as plt
    import json
    
    with open("data.json") as json_file:
        data = json.load(json_file)
    
    #Here is the subject id i'm using for the data presentation
    #100713205147-2ee05a98f1794324952eea5ca678c026
    subject_id = input("subject_doc_id: ")
    visitors = []
    for i in range(len(data)):
        if subject_id == data[i]["subject_doc_id"]:
            print("got a match from {}".format(data[i]["visitor_country"]))
            visitors.append(data[i]["visitor_country"])
    countries = []
    for i in visitors:
        if i not in countries:
            countries.append(i)
    try:
        plt.hist(visitors, bins = len(countries))
        plt.show()
    except ValueError:
        print("No matches for given subject_doc_id")
    

    如果要按大洲排序,首先需要知道哪个国家属于哪个大洲。我的例子:

    continents = {
        "europe": ["PL, GER"],
        "south_america": ["BR"],
        "north_america": ["MX"]
    }
    

    我是 python 新手,所以除了循环之外,我不知道任何花哨的技术来对以前的列表进行排序。

    continent_data = []
    for continent in continents:
        for visitor_country in visitors:
            for country in continents[continent]:
                if visitor_country in country:
                    continent_data.append(continent)
    print(continent_data)
    

    之后,您可以使用前面的代码将其排序为 bin 的唯一值,并根据上面的示例创建直方图

    【讨论】:

    • 感谢指正。非常感谢。如果我想按大陆对国家进行分组,然后将其显示在另一个直方图中,你知道我能做什么吗?再次感谢。
    • @user17534067 这听起来像是一个新问题。请随意问另一个问题,因为 cmets 是为了澄清。
    【解决方案2】:

    我不得不稍微修改您的文件内容以使其成为有效的 JSON,然后在我的工作目录中将其保存为“jsonExample.json”。

    修改后的json数据是这样的:

    {
    "visitor1": {[your data]}
    "visotor2": {[your data]}
    }
    

    然后使用 json 库 (https://docs.python.org/3/library/json.html),您只需列出每个访问者所在的国家/地区,并计算每个访问者出现的次数:

    import json
    
    with open("jsonExample.json", 'r') as file:
        contents = file.read()
    visitors = json.loads(contents)
    
    countryList = []
    for v in visitors.keys():
        if visitors[v]['subject_doc_id'] == "desired_subject_doc_id":
            countryList.append(visitors[v]['visitor_country'])
    
    for country in set(countryList):
        print(f"Country {country} appears {countryList.count(country)} times")
    

    if visitors[v]['subject_doc_id'] 语句检查 subject_doc_id 是否匹配指定值,只需将 RHS 替换为所需的 id。

    【讨论】:

    • 修改了什么?
    • 如果我只想计算国家/地区的特定文档 ID (subject_doc_id),这是用户输入的。
    • 已修改以显示如何完成。如果你只想要一个 id 这很好,但如果你想让它易于修改,它可以存储在一个变量中。
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2014-09-08
    • 2011-03-06
    • 1970-01-01
    • 1970-01-01
    • 2012-02-03
    • 2020-05-20
    • 1970-01-01
    相关资源
    最近更新 更多