【问题标题】:Flattening nested JSON in R在 R 中展平嵌套的 JSON
【发布时间】:2020-06-10 13:45:59
【问题描述】:

大家好:我已经在堆栈溢出和互联网的其余部分搜索了这个问题的答案,但我找不到的答案似乎对我有用。

我有数千行 json 数据,其中包含来自相机陷阱研究的图像信息。我在解压数据时遇到了很多麻烦。我使用jsonlite::fromJSON 无济于事。来自 tidyjson 的 as.tbl_json 相同。

我的目标是编写一些代码,为我提供一个数据框,其中包含以 json 格式存储的每个变量的列。你能帮忙吗?

这是我正在使用的数据向量,尽管我实际上将数据作为单个列放在更大的 .csv 文件中。第一行是列名。

annotations<-c(annotations,
"[{""task"":""T0"",""value"":[{""choice"":""NOTHINGHERE"",""answers"":{},""filters"":{}}]}]"
"[{""task"":""T0"",""value"":[{""choice"":""NOTHINGHERE"",""answers"":{},""filters"":{}}]}]"
"[{""task"":""T0"",""value"":[{""choice"":""DEERWHITETAILED"",""answers"":{""HOWMANY"":""1"",""YOUNGPRESENT"":""NO"",""ANTLERSPRESENT"":""NO"",""WHATBEHAVIORSDOYOUSEE"":[""ALERT""],""ESTIMATEOFSNOWDEPTHSEETUTORIAL"":""NOSNOWBAREGROUND"",""ISITACTIVELYRAININGORSNOWINGINTHEPICTURE"":""NO""},""filters"":{}}]}]"
"[{""task"":""T0"",""value"":[{""choice"":""NOTHINGHERE"",""answers"":{},""filters"":{}}]}]"
"[{""task"":""T0"",""value"":[{""choice"":""NOTHINGHERE"",""answers"":{},""filters"":{}}]}]"
"[{""task"":""T0"",""value"":[{""choice"":""NOTHINGHERE"",""answers"":{},""filters"":{}}]}]"
"[{""task"":""T0"",""value"":[{""choice"":""NOTHINGHERE"",""answers"":{},""filters"":{}}]}]"
"[{""task"":""T0"",""value"":[{""choice"":""NOTHINGHERE"",""answers"":{},""filters"":{}}]}]"
"[{""task"":""T0"",""value"":[{""choice"":""FISHER"",""answers"":{""HOWMANY"":""1"",""YOUNGPRESENT"":""NO"",""WHATBEHAVIORSDOYOUSEE"":[""WALKINGRUNNING"",""ALERT""],""ESTIMATEOFSNOWDEPTHSEETUTORIAL"":""1020CM"",""ISITACTIVELYRAININGORSNOWINGINTHEPICTURE"":""NO""},""filters"":{}}]}]"
"[{""task"":""T0"",""value"":[{""choice"":""NOTHINGHERE"",""answers"":{},""filters"":{}}]}]")

如果我运行 dput(annotations),会得到以下结果:

structure(list(annotations = c("[{\"task\":\"T0\",\"value\":[{\"choice\":\"NOTHINGHERE\",\"answers\":{},\"filters\":{}}]}]", 
"[{\"task\":\"T0\",\"value\":[{\"choice\":\"NOTHINGHERE\",\"answers\":{},\"filters\":{}}]}]", 
"[{\"task\":\"T0\",\"value\":[{\"choice\":\"DEERWHITETAILED\",\"answers\":{\"HOWMANY\":\"1\",\"YOUNGPRESENT\":\"NO\",\"ANTLERSPRESENT\":\"NO\",\"WHATBEHAVIORSDOYOUSEE\":[\"ALERT\"],\"ESTIMATEOFSNOWDEPTHSEETUTORIAL\":\"NOSNOWBAREGROUND\",\"ISITACTIVELYRAININGORSNOWINGINTHEPICTURE\":\"NO\"},\"filters\":{}}]}]", 
"[{\"task\":\"T0\",\"value\":[{\"choice\":\"NOTHINGHERE\",\"answers\":{},\"filters\":{}}]}]", 
"[{\"task\":\"T0\",\"value\":[{\"choice\":\"NOTHINGHERE\",\"answers\":{},\"filters\":{}}]}]", 
"[{\"task\":\"T0\",\"value\":[{\"choice\":\"NOTHINGHERE\",\"answers\":{},\"filters\":{}}]}]", 
"[{\"task\":\"T0\",\"value\":[{\"choice\":\"NOTHINGHERE\",\"answers\":{},\"filters\":{}}]}]", 
"[{\"task\":\"T0\",\"value\":[{\"choice\":\"NOTHINGHERE\",\"answers\":{},\"filters\":{}}]}]", 
"[{\"task\":\"T0\",\"value\":[{\"choice\":\"FISHER\",\"answers\":{\"HOWMANY\":\"1\",\"YOUNGPRESENT\":\"NO\",\"WHATBEHAVIORSDOYOUSEE\":[\"WALKINGRUNNING\",\"ALERT\"],\"ESTIMATEOFSNOWDEPTHSEETUTORIAL\":\"1020CM\",\"ISITACTIVELYRAININGORSNOWINGINTHEPICTURE\":\"NO\"},\"filters\":{}}]}]", 
"[{\"task\":\"T0\",\"value\":[{\"choice\":\"NOTHINGHERE\",\"answers\":{},\"filters\":{}}]}]"
)), class = "data.frame", row.names = c(NA, -10L))

【问题讨论】:

  • 这种语法没有意义。 c(annotations "[{""task""...中应该有逗号吗?
  • 很高兴你问到。我忘了在c(annotations, ... 后面加逗号,但我只是修复了它。谢谢!
  • fromJSON 不起作用是什么意思?例如,如果我尝试lapply(annotations$annotations,fromJSON),我会得到一些东西。与您的预期不同吗?
  • 这也适用于我,用于获取列表。但我试图得到一个扁平的数据框。当我添加flatten = T 时,我仍然只得到一个列表。我正在寻找一种优雅的方法来将列表中的所有元素提取到一个数据框中,并且代码很少。

标签: r json flatten


【解决方案1】:

我并不完全清楚您正在寻找什么输出格式。有很多不同的方法可以做到这一点。此外,数据结构中的数组(每个数组中只有一个对象)有点复杂,因为它们可以包含更多对象。

无论如何,感谢spread_all()tidyjson 不需要太多代码。您还可以使用spread_values()enter_object(answers) 仅传播特定值以传播答案等。希望对您有所帮助!

library(tidyjson)
#> 
#> Attaching package: 'tidyjson'
#> The following object is masked from 'package:stats':
#> 
#>     filter
library(tibble)

annotations <- structure(list(annotations = c("[{\"task\":\"T0\",\"value\":[{\"choice\":\"NOTHINGHERE\",\"answers\":{},\"filters\":{}}]}]", 
                                              "[{\"task\":\"T0\",\"value\":[{\"choice\":\"NOTHINGHERE\",\"answers\":{},\"filters\":{}}]}]", 
                                              "[{\"task\":\"T0\",\"value\":[{\"choice\":\"DEERWHITETAILED\",\"answers\":{\"HOWMANY\":\"1\",\"YOUNGPRESENT\":\"NO\",\"ANTLERSPRESENT\":\"NO\",\"WHATBEHAVIORSDOYOUSEE\":[\"ALERT\"],\"ESTIMATEOFSNOWDEPTHSEETUTORIAL\":\"NOSNOWBAREGROUND\",\"ISITACTIVELYRAININGORSNOWINGINTHEPICTURE\":\"NO\"},\"filters\":{}}]}]", 
                                              "[{\"task\":\"T0\",\"value\":[{\"choice\":\"NOTHINGHERE\",\"answers\":{},\"filters\":{}}]}]", 
                                              "[{\"task\":\"T0\",\"value\":[{\"choice\":\"NOTHINGHERE\",\"answers\":{},\"filters\":{}}]}]", 
                                              "[{\"task\":\"T0\",\"value\":[{\"choice\":\"NOTHINGHERE\",\"answers\":{},\"filters\":{}}]}]", 
                                              "[{\"task\":\"T0\",\"value\":[{\"choice\":\"NOTHINGHERE\",\"answers\":{},\"filters\":{}}]}]", 
                                              "[{\"task\":\"T0\",\"value\":[{\"choice\":\"NOTHINGHERE\",\"answers\":{},\"filters\":{}}]}]", 
                                              "[{\"task\":\"T0\",\"value\":[{\"choice\":\"FISHER\",\"answers\":{\"HOWMANY\":\"1\",\"YOUNGPRESENT\":\"NO\",\"WHATBEHAVIORSDOYOUSEE\":[\"WALKINGRUNNING\",\"ALERT\"],\"ESTIMATEOFSNOWDEPTHSEETUTORIAL\":\"1020CM\",\"ISITACTIVELYRAININGORSNOWINGINTHEPICTURE\":\"NO\"},\"filters\":{}}]}]", 
                                              "[{\"task\":\"T0\",\"value\":[{\"choice\":\"NOTHINGHERE\",\"answers\":{},\"filters\":{}}]}]"
)), class = "data.frame", row.names = c(NA, -10L))

ant <- tibble(raw = annotations$annotations)

as.tbl_json(ant, json.column = "raw") %>%
  gather_array("object_id") %>% 
  spread_all() %>%
  enter_object("value") %>%
  gather_array("value_id") %>%
  spread_all() %>%
  as_tibble()
#> # A tibble: 10 x 9
#>    object_id task  value_id choice answers.HOWMANY answers.YOUNGPR…
#>        <int> <chr>    <int> <chr>  <chr>           <chr>           
#>  1         1 T0           1 NOTHI… <NA>            <NA>            
#>  2         1 T0           1 NOTHI… <NA>            <NA>            
#>  3         1 T0           1 DEERW… 1               NO              
#>  4         1 T0           1 NOTHI… <NA>            <NA>            
#>  5         1 T0           1 NOTHI… <NA>            <NA>            
#>  6         1 T0           1 NOTHI… <NA>            <NA>            
#>  7         1 T0           1 NOTHI… <NA>            <NA>            
#>  8         1 T0           1 NOTHI… <NA>            <NA>            
#>  9         1 T0           1 FISHER 1               NO              
#> 10         1 T0           1 NOTHI… <NA>            <NA>            
#> # … with 3 more variables: answers.ANTLERSPRESENT <chr>,
#> #   answers.ESTIMATEOFSNOWDEPTHSEETUTORIAL <chr>,
#> #   answers.ISITACTIVELYRAININGORSNOWINGINTHEPICTURE <chr>

reprex package (v0.3.0) 于 2020 年 3 月 14 日创建

【讨论】:

  • 谢谢科尔。除了在这里谢谢你之外,我没有做太多事情的声誉。这行得通。感谢您的帮助!
  • 我的荣幸!很高兴听到它有帮助!
猜你喜欢
  • 2016-05-07
  • 2022-06-21
  • 2016-10-06
  • 2021-09-05
  • 1970-01-01
  • 2021-10-09
  • 1970-01-01
  • 1970-01-01
  • 2020-12-28
相关资源
最近更新 更多