【问题标题】:Getting imported json data into a data frame将导入的 json 数据导入数据框
【发布时间】:2013-06-01 14:36:27
【问题描述】:

我有一个包含超过 1500 个 json 对象的文件,我想在 R 中使用这些对象。我已经能够将数据作为列表导入,但在将其强制转换为有用的结构时遇到了麻烦。我想创建一个数据框,其中包含每个 json 对象的行和每个键:值对的列。

我用这个小的假数据集重现了我的情况:

[{"name":"Doe, John","group":"Red","age (y)":24,"height (cm)":182,"wieght (kg)":74.8,"score":null},
{"name":"Doe, Jane","group":"Green","age (y)":30,"height (cm)":170,"wieght (kg)":70.1,"score":500},
{"name":"Smith, Joan","group":"Yellow","age (y)":41,"height (cm)":169,"wieght (kg)":60,"score":null},
{"name":"Brown, Sam","group":"Green","age (y)":22,"height (cm)":183,"wieght (kg)":75,"score":865},
{"name":"Jones, Larry","group":"Green","age (y)":31,"height (cm)":178,"wieght (kg)":83.9,"score":221},
{"name":"Murray, Seth","group":"Red","age (y)":35,"height (cm)":172,"wieght (kg)":76.2,"score":413},
{"name":"Doe, Jane","group":"Yellow","age (y)":22,"height (cm)":164,"wieght (kg)":68,"score":902}]

数据的一些特征:

  • 所有对象都包含相同数量的键:值对,尽管 一些值为空
  • 每个对象(名称和组)有两个非数字列
  • name是唯一标识,有10个左右的组
  • 许多名称和组整体包含空格、逗号和其他标点符号。

基于这个问题:R list(structure(list())) to data frame,我尝试了以下方法:

json_file <- "test.json"
json_data <- fromJSON(json_file)
asFrame <- do.call("rbind.fill", lapply(json_data, as.data.frame))

对于我的真实数据和这个假数据,最后一行给我这个错误:

Error in data.frame(name = "Doe, John", group = "Red", `age (y)` = 24,  : 
  arguments imply differing number of rows: 1, 0

【问题讨论】:

    标签: json r import dataframe


    【解决方案1】:

    您只需将 NULL 替换为 NA:

    require(RJSONIO)    
    
    json_file <-  '[{"name":"Doe, John","group":"Red","age (y)":24,"height (cm)":182,"wieght (kg)":74.8,"score":null},
        {"name":"Doe, Jane","group":"Green","age (y)":30,"height (cm)":170,"wieght (kg)":70.1,"score":500},
        {"name":"Smith, Joan","group":"Yellow","age (y)":41,"height (cm)":169,"wieght (kg)":60,"score":null},
        {"name":"Brown, Sam","group":"Green","age (y)":22,"height (cm)":183,"wieght (kg)":75,"score":865},
        {"name":"Jones, Larry","group":"Green","age (y)":31,"height (cm)":178,"wieght (kg)":83.9,"score":221},
        {"name":"Murray, Seth","group":"Red","age (y)":35,"height (cm)":172,"wieght (kg)":76.2,"score":413},
        {"name":"Doe, Jane","group":"Yellow","age (y)":22,"height (cm)":164,"wieght (kg)":68,"score":902}]'
    
    
    json_file <- fromJSON(json_file)
    
    json_file <- lapply(json_file, function(x) {
      x[sapply(x, is.null)] <- NA
      unlist(x)
    })
    

    一旦每个元素都有一个非空值,就可以调用rbind 而不会出错:

    do.call("rbind", json_file)
         name           group    age (y) height (cm) wieght (kg) score
    [1,] "Doe, John"    "Red"    "24"    "182"       "74.8"      NA   
    [2,] "Doe, Jane"    "Green"  "30"    "170"       "70.1"      "500"
    [3,] "Smith, Joan"  "Yellow" "41"    "169"       "60"        NA   
    [4,] "Brown, Sam"   "Green"  "22"    "183"       "75"        "865"
    [5,] "Jones, Larry" "Green"  "31"    "178"       "83.9"      "221"
    [6,] "Murray, Seth" "Red"    "35"    "172"       "76.2"      "413"
    [7,] "Doe, Jane"    "Yellow" "22"    "164"       "68"        "902"
    

    【讨论】:

    • 我很惊讶没有更好的功能来做到这一点。 (对于 XML 有 XMLtoDataFrame 之类的函数)所以 JSONtoDataFrame 会很棒
    • @userJT - 有 jsonlite::fromJSON - 处理 NULL 并简化为 data.frame。见my answer
    • 这会将 json_file 转换为矩阵,而不是数据框。如何获取 data.frame?
    • @TSR: data.frame(do.call("rbind", json_file))
    【解决方案2】:

    如果您使用library(jsonlite)library(jsonify),这非常简单

    它们都处理null 值并将它们转换为NA,并保留数据类型。

    数据

    json_file <-  '[{"name":"Doe, John","group":"Red","age (y)":24,"height (cm)":182,"wieght (kg)":74.8,"score":null},
    {"name":"Doe, Jane","group":"Green","age (y)":30,"height (cm)":170,"wieght (kg)":70.1,"score":500},
    {"name":"Smith, Joan","group":"Yellow","age (y)":41,"height (cm)":169,"wieght (kg)":60,"score":null},
    {"name":"Brown, Sam","group":"Green","age (y)":22,"height (cm)":183,"wieght (kg)":75,"score":865},
    {"name":"Jones, Larry","group":"Green","age (y)":31,"height (cm)":178,"wieght (kg)":83.9,"score":221},
    {"name":"Murray, Seth","group":"Red","age (y)":35,"height (cm)":172,"wieght (kg)":76.2,"score":413},
    {"name":"Doe, Jane","group":"Yellow","age (y)":22,"height (cm)":164,"wieght (kg)":68,"score":902}]'
    

    jsonlite

    library(jsonlite)
    jsonlite::fromJSON( json_file )
    #           name  group age (y) height (cm) wieght (kg) score
    # 1    Doe, John    Red      24         182        74.8    NA
    # 2    Doe, Jane  Green      30         170        70.1   500
    # 3  Smith, Joan Yellow      41         169        60.0    NA
    # 4   Brown, Sam  Green      22         183        75.0   865
    # 5 Jones, Larry  Green      31         178        83.9   221
    # 6 Murray, Seth    Red      35         172        76.2   413
    # 7    Doe, Jane Yellow      22         164        68.0   902
    
    str( jsonlite::fromJSON( json_file ) )
    # 'data.frame': 7 obs. of  6 variables:
    # $ name       : chr  "Doe, John" "Doe, Jane" "Smith, Joan" "Brown, Sam" ...
    # $ group      : chr  "Red" "Green" "Yellow" "Green" ...
    # $ age (y)    : int  24 30 41 22 31 35 22
    # $ height (cm): int  182 170 169 183 178 172 164
    # $ wieght (kg): num  74.8 70.1 60 75 83.9 76.2 68
    # $ score      : int  NA 500 NA 865 221 413 902
    

    json化

    library(jsonify)
    jsonify::from_json( json_file )
    #           name  group age (y) height (cm) wieght (kg) score
    # 1    Doe, John    Red      24         182        74.8    NA
    # 2    Doe, Jane  Green      30         170        70.1   500
    # 3  Smith, Joan Yellow      41         169        60.0    NA
    # 4   Brown, Sam  Green      22         183        75.0   865
    # 5 Jones, Larry  Green      31         178        83.9   221
    # 6 Murray, Seth    Red      35         172        76.2   413
    # 7    Doe, Jane Yellow      22         164        68.0   90
    
    
    str( jsonify::from_json( json_file ) )
    # 'data.frame': 7 obs. of  6 variables:
    # $ name       : chr  "Doe, John" "Doe, Jane" "Smith, Joan" "Brown, Sam" ...
    # $ group      : chr  "Red" "Green" "Yellow" "Green" ...
    # $ age (y)    : int  24 30 41 22 31 35 22
    # $ height (cm): int  182 170 169 183 178 172 164
    # $ wieght (kg): num  74.8 70.1 60 75 83.9 76.2 68
    # $ score      : int  NA 500 NA 865 221 413 902
    

    【讨论】:

    • 我运行的代码与您完全相同,但是当我运行 fromJSON 时,它返回一个列表,而不是一个数据框。你是如何让它返回一个数据框的?
    • @Alexander - 我仍然收到data.frame。确保您使用的是jsonlite::fromJSON
    【解决方案3】:

    要删除空值,请使用参数 nullValue

    json_data <- fromJSON(json_file, nullValue = NA)
    asFrame <- do.call("rbind.fill", lapply(json_data, as.data.frame))
    

    这样你的输出中就不会有任何不必要的引号

    【讨论】:

      【解决方案4】:
      library(rjson)
      Lines <- readLines("yelp_academic_dataset_business.json") 
      business <- as.data.frame(t(sapply(Lines, fromJSON)))
      

      您可以尝试将 JSON 数据加载到 R

      【讨论】:

        【解决方案5】:
        dplyr::bind_rows(fromJSON(file_name))
        

        【讨论】:

        • 您使用的是哪个fromJson 函数?如果它来自jsonlite,那么dplyr::bind_rows 是多余的。如果它来自rjson,那么您提供的数据上的 solutino 错误。
        • 不记得了;事情一定已经改变了
        【解决方案6】:

        将包从 rjson 更改为 jsonlite 为我修复了它。

        所以不要这样:

        fromAPIPlantsPages <- rjson::fromJSON(content(apiGetPlants,type="text",encoding = "UTF-8"))
        
        dfPlantenAPI <- as.data.frame(fromAPIPlantsPages)
        

        我改成这样了:

        fromAPIPlantsPages <- jsonlite::fromJSON(content(apiGetPlants,type="text",encoding = "UTF-8"))
        
        dfPlantenAPI <- as.data.frame(fromAPIPlantsPages)
        

        【讨论】:

          猜你喜欢
          • 1970-01-01
          • 1970-01-01
          • 2020-05-22
          • 2011-01-30
          • 1970-01-01
          • 1970-01-01
          • 2018-11-19
          • 1970-01-01
          相关资源
          最近更新 更多