【问题标题】:How to read a json file line by line in R?如何在 R 中逐行读取 json 文件?
【发布时间】:2019-08-02 20:08:25
【问题描述】:

这是原始的json 数据:

json_file <-  '{"name":"Doe, John","group":"Red","age":{"v_0":24}}
    {"name":"Doe, Jane","group":"Green","age":{"v_0":31}}
    {"name":"Smith, Joan","group":"Yellow","age":{"v_0":22}}'

当我想将json_file 转换为数据框时:

library(RJSONIO)
json_file <- fromJSON(json_file)

我收到此错误:

Error: parse error: trailing garbage
      :"Red","age":{"v_0":24}}     {"name":"Doe, Jane","group":"Gr
                 (right here) ------^

我知道如果我将原始数据更改为以下数据,一切都会好起来的:

json_file <-  '[{"name":"Doe, John","group":"Red","age":{"v_0":24}},
    {"name":"Doe, Jane","group":"Green","age":{"v_0":31}},
    {"name":"Smith, Joan","group":"Yellow","age":{"v_0":22}}]'

但其实我很想知道:

1) 如何在不使用[,] 拆分对象的情况下从原始数据中获取数据帧?

2) 如果没有办法,如何拆分大json文件中的对象,方法是在除最后一行之外的每一行末尾添加,,并在第一行添加[]以及文件的最后一行?

【问题讨论】:

标签: r json dataframe


【解决方案1】:

您的原始 json 数据 已拆分为单独的对象。最重要的是,作为一个整体,json数据是无效的。幸运的是,正如您所注意到的,如果您在每行的末尾(最后一行除外)插入 , 并将其全部包裹在方括号中,您将获得一组密钥对(或数组)。所以你应该问,“我如何将所有元素组合到一个 data.frame 中?”

解决办法:dplyr::bind_rows(fromJSON(json_file))

# A tibble: 3 x 3
  name        group    age
  <chr>       <chr>  <dbl>
1 Doe, John   Red       24
2 Doe, Jane   Green     31
3 Smith, Joan Yellow    22

跟进:

假设 json 对象不包含换行符,您可以进行简单的搜索替换:

json_file <- gsub('\n', ',', trimws(json_file), fixed=TRUE)

我输入了trimws 以删除可能的尾随换行符。

接下来,用方括号括起来:

json_file <- paste0('[', json_file, ']')

你又回到了正轨。

【讨论】:

  • 所以,我必须在每一行的末尾插入,(除了最后一行)并手动将原始数据包装在方括号中?现在,如果我有一个大的 json 文件怎么办?
  • 定义“大”。
  • 当我们有很多对象时,比如 10000 个对象。
  • 这在 R 中应该不是问题。
【解决方案2】:

你需要那些方括号。将以下内容另存为“test.json”:

{ 
   "ID":["1","2","3","4","5","6","7","8" ],
   "Name":["Rick","Dan","Michelle","Ryan","Gary","Nina","Simon","Guru" ],
   "Salary":["623.3","515.2","611","729","843.25","578","632.8","722.5" ],

   "StartDate":[ "1/1/2012","9/23/2013","11/15/2014","5/11/2014","3/27/2015","5/21/2013",
      "7/30/2013","6/17/2014"],
   "Dept":[ "IT","Operations","IT","HR","Finance","IT","Operations","Finance"]
}

现在,加载所需的库并指向您刚刚保存的文件:

# Load the package required to read JSON files.
library("rjson")

# Give the input file name to the function.
result <- fromJSON(file = "C:\\Users\\Excel\\Documents\\test.json")

# Print the result.
print(result)

结果:

print(result)
$ID
[1] "1" "2" "3" "4" "5" "6" "7" "8"

$Name
[1] "Rick"     "Dan"      "Michelle" "Ryan"     "Gary"     "Nina"     "Simon"    "Guru"    

$Salary
[1] "623.3"  "515.2"  "611"    "729"    "843.25" "578"    "632.8"  "722.5" 

$StartDate
[1] "1/1/2012"   "9/23/2013"  "11/15/2014" "5/11/2014"  "3/27/2015"  "5/21/2013"  "7/30/2013"  "6/17/2014" 

$Dept
[1] "IT"         "Operations" "IT"         "HR"         "Finance"    "IT"         "Operations" "Finance"

【讨论】:

    【解决方案3】:

    有很多方法可以做到这一点,而无需编辑文件。

    如果你想要一个 data.frame:

    library(jsonlite)
    # url
    zips <- stream_in(url("http://media.mongodb.org/zips.json"))
    # file
    json_data <- stream_in(file("path/to/file.json"))
    

    或者如果你想要一个列表:

    json_data_as_list <- readLines("path/to/file.json") %>% lapply(fromJSON)
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2012-09-19
      • 2011-11-02
      • 2020-01-29
      • 2019-04-17
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多