【问题标题】:Creating dataset in R or python using json api使用 json api 在 R 或 python 中创建数据集
【发布时间】:2020-08-29 06:47:51
【问题描述】:

如何使用此 json api 在 python 或 R 中创建具有适当列名的数据集:

https://api.covid19india.org/data.json

【问题讨论】:

    标签: python r json api dataframe


    【解决方案1】:

    基于R的回复:可以使用jsonlite包:

    library(jsonlite)
    data <- fromJSON("./data/data.json", flatten = FALSE)
    

    我将您问题中的 json 文件保存到 ./data/data.json。这将生成一个列表:

    List of 3
     $ cases_time_series:'data.frame':  104 obs. of  7 variables:
      ..$ dailyconfirmed: chr [1:104] "1" "0" "0" "1" ...
      ..$ dailydeceased : chr [1:104] "0" "0" "0" "0" ...
      ..$ dailyrecovered: chr [1:104] "0" "0" "0" "0" ...
      ..$ date          : chr [1:104] "30 January " "31 January " "01 February " "02 February " ...
      ..$ totalconfirmed: chr [1:104] "1" "1" "1" "2" ...
      ..$ totaldeceased : chr [1:104] "0" "0" "0" "0" ...
      ..$ totalrecovered: chr [1:104] "0" "0" "0" "0" ...
     $ statewise        :'data.frame':  38 obs. of  11 variables:
      ..$ active         : chr [1:38] "47598" "18381" "5121" "6523" ...
      ..$ confirmed      : chr [1:38] "74925" "24427" "8904" "8718" ...
      ..$ deaths         : chr [1:38] "2436" "921" "537" "61" ...
      ..$ deltaconfirmed : chr [1:38] "595" "0" "0" "0" ...
      ..$ deltadeaths    : chr [1:38] "21" "0" "0" "0" ...
      ..$ deltarecovered : chr [1:38] "434" "0" "0" "0" ...
      ..$ lastupdatedtime: chr [1:38] "13/05/2020 11:54:23" "12/05/2020 22:13:24" "12/05/2020 20:16:23" "12/05/2020 22:48:24" ...
      ..$ recovered      : chr [1:38] "24887" "5125" "3246" "2134" ...
      ..$ state          : chr [1:38] "Total" "Maharashtra" "Gujarat" "Tamil Nadu" ...
      ..$ statecode      : chr [1:38] "TT" "MH" "GJ" "TN" ...
      ..$ statenotes     : chr [1:38] "" "[10-May]<br>\n- Total numbers are updated to the final figure reported for 10th May. <br>\n- 665 cases added by"| __truncated__ "" "" ...
     $ tested           :'data.frame':  65 obs. of  11 variables:
      ..$ individualstestedperconfirmedcase: chr [1:65] "75.64102564" "81.56666667" "73.96428571" "72.99450549" ...
      ..$ positivecasesfromsamplesreported : chr [1:65] "" "" "" "" ...
      ..$ samplereportedtoday              : chr [1:65] "" "" "" "" ...
      ..$ source                           : chr [1:65] "Press_Release_ICMR_13March2020.pdf" "ICMR_website_update_18March_6PM_IST.pdf" "ICMR_website_update_19March_10AM_IST_V2.pdf" "ICMR_website_update_19March_6PM_IST.pdf" ...
      ..$ testpositivityrate               : chr [1:65] "1.32%" "1.23%" "1.35%" "1.37%" ...
      ..$ testsconductedbyprivatelabs      : chr [1:65] "" "" "" "" ...
      ..$ testsperconfirmedcase            : chr [1:65] "83.33333333" "87.5" "79.26190476" "77.88461538" ...
      ..$ totalindividualstested           : chr [1:65] "5900" "12235" "12426" "13285" ...
      ..$ totalpositivecases               : chr [1:65] "78" "150" "168" "182" ...
      ..$ totalsamplestested               : chr [1:65] "6500" "13125" "13316" "14175" ...
      ..$ updatetimestamp                  : chr [1:65] "13/03/2020 00:00:00" "18/03/2020 18:00:00" "19/03/2020 10:00:00" "19/03/2020 18:00:00" ...
    

    您可以将此列表转换为一个或多个数据框。您不能使用dplyr 函数bind_rows,因为您的列表元素都是不同的;他们有不同的列数和行数。如果它们有共同的字段,您可以使用join 函数将数据框合并在一起。

    对此进行扩展:第一个列表元素cases 可以轻松拆分并处理为图形:

    library(jsonlite)
    library(ggplot2)
    library(dplyr)
    data <- fromJSON("./data/data.json", flatten = FALSE)
    
    cases <- data[[1]] %>% 
      mutate(date = as.Date(date, format = "%d %B")) %>%
      mutate_if(is.character, as.numeric)
    
    ggplot(data = cases, aes(x = date, y = dailyconfirmed)) +
      geom_line()
    

    有了这个结果:

    【讨论】:

      猜你喜欢
      • 2019-08-22
      • 2019-11-29
      • 1970-01-01
      • 1970-01-01
      • 2021-11-09
      • 1970-01-01
      • 2021-04-18
      • 1970-01-01
      • 2022-01-01
      相关资源
      最近更新 更多