【发布时间】:2020-08-29 06:47:51
【问题描述】:
如何使用此 json api 在 python 或 R 中创建具有适当列名的数据集:
【问题讨论】:
标签: python r json api dataframe
如何使用此 json api 在 python 或 R 中创建具有适当列名的数据集:
【问题讨论】:
标签: python r json api dataframe
基于R的回复:可以使用jsonlite包:
library(jsonlite)
data <- fromJSON("./data/data.json", flatten = FALSE)
我将您问题中的 json 文件保存到 ./data/data.json。这将生成一个列表:
List of 3
$ cases_time_series:'data.frame': 104 obs. of 7 variables:
..$ dailyconfirmed: chr [1:104] "1" "0" "0" "1" ...
..$ dailydeceased : chr [1:104] "0" "0" "0" "0" ...
..$ dailyrecovered: chr [1:104] "0" "0" "0" "0" ...
..$ date : chr [1:104] "30 January " "31 January " "01 February " "02 February " ...
..$ totalconfirmed: chr [1:104] "1" "1" "1" "2" ...
..$ totaldeceased : chr [1:104] "0" "0" "0" "0" ...
..$ totalrecovered: chr [1:104] "0" "0" "0" "0" ...
$ statewise :'data.frame': 38 obs. of 11 variables:
..$ active : chr [1:38] "47598" "18381" "5121" "6523" ...
..$ confirmed : chr [1:38] "74925" "24427" "8904" "8718" ...
..$ deaths : chr [1:38] "2436" "921" "537" "61" ...
..$ deltaconfirmed : chr [1:38] "595" "0" "0" "0" ...
..$ deltadeaths : chr [1:38] "21" "0" "0" "0" ...
..$ deltarecovered : chr [1:38] "434" "0" "0" "0" ...
..$ lastupdatedtime: chr [1:38] "13/05/2020 11:54:23" "12/05/2020 22:13:24" "12/05/2020 20:16:23" "12/05/2020 22:48:24" ...
..$ recovered : chr [1:38] "24887" "5125" "3246" "2134" ...
..$ state : chr [1:38] "Total" "Maharashtra" "Gujarat" "Tamil Nadu" ...
..$ statecode : chr [1:38] "TT" "MH" "GJ" "TN" ...
..$ statenotes : chr [1:38] "" "[10-May]<br>\n- Total numbers are updated to the final figure reported for 10th May. <br>\n- 665 cases added by"| __truncated__ "" "" ...
$ tested :'data.frame': 65 obs. of 11 variables:
..$ individualstestedperconfirmedcase: chr [1:65] "75.64102564" "81.56666667" "73.96428571" "72.99450549" ...
..$ positivecasesfromsamplesreported : chr [1:65] "" "" "" "" ...
..$ samplereportedtoday : chr [1:65] "" "" "" "" ...
..$ source : chr [1:65] "Press_Release_ICMR_13March2020.pdf" "ICMR_website_update_18March_6PM_IST.pdf" "ICMR_website_update_19March_10AM_IST_V2.pdf" "ICMR_website_update_19March_6PM_IST.pdf" ...
..$ testpositivityrate : chr [1:65] "1.32%" "1.23%" "1.35%" "1.37%" ...
..$ testsconductedbyprivatelabs : chr [1:65] "" "" "" "" ...
..$ testsperconfirmedcase : chr [1:65] "83.33333333" "87.5" "79.26190476" "77.88461538" ...
..$ totalindividualstested : chr [1:65] "5900" "12235" "12426" "13285" ...
..$ totalpositivecases : chr [1:65] "78" "150" "168" "182" ...
..$ totalsamplestested : chr [1:65] "6500" "13125" "13316" "14175" ...
..$ updatetimestamp : chr [1:65] "13/03/2020 00:00:00" "18/03/2020 18:00:00" "19/03/2020 10:00:00" "19/03/2020 18:00:00" ...
您可以将此列表转换为一个或多个数据框。您不能使用dplyr 函数bind_rows,因为您的列表元素都是不同的;他们有不同的列数和行数。如果它们有共同的字段,您可以使用join 函数将数据框合并在一起。
对此进行扩展:第一个列表元素cases 可以轻松拆分并处理为图形:
library(jsonlite)
library(ggplot2)
library(dplyr)
data <- fromJSON("./data/data.json", flatten = FALSE)
cases <- data[[1]] %>%
mutate(date = as.Date(date, format = "%d %B")) %>%
mutate_if(is.character, as.numeric)
ggplot(data = cases, aes(x = date, y = dailyconfirmed)) +
geom_line()
有了这个结果:
【讨论】: