R中的Web抓取json文件答案

【问题标题】：Web scraping json files in RR中的Web抓取json文件
【发布时间】：2021-10-10 10:25:01
【问题描述】：

我正在尝试从 UNHCR 网站https://data2.unhcr.org/en/situations/mediterranean#抓取公共数据
我想下载所有带有 json 扩展名的数据。数据链接以黄色突出显示，您可以在下面看到

为了做到这一点，我从代码开始，但我不知道如何处理。

library(jsonlite)
url <- 'https://data2.unhcr.org/en/situations/mediterranean#'


# Removes last character, i.e. &
url <- substr(url, 1, nchar(url)-1)

# Encodes URL to avoid errors
url <- URLencode(url)

# Extracts JSON from URL
json_extract <- fromJSON(url)

# Converts relevant list into a data.frame
df <- data.frame(json_extract[['items']])

那么任何人都可以帮助我编写代码如何将这些数据下载到下表中

【问题讨论】：

标签： r json web-scraping

【解决方案1】：

右键单击页面并选择“检查”。然后转到“网络”选项卡。单击其中一个 JSON 按钮，您将看到运行的查询出现。这是您需要使用的网址。

url <- 'https://data2.unhcr.org/population/get/timeseries'
query <- '?widget_id=267293&sv_id=11&population_group=4797,4798&frequency=month&fromDate=2015-01-01'

# Extracts JSON from URL
json_extract <- fromJSON(paste0(url, query))

# Extracts data.frame from list (there might be other info in the list you want too)
df <- json_extract$data$timeseries

# Others
# https://data2.unhcr.org/population/?widget_id=267298&sv_id=11&population_group=4797,4798&year=latest # Total Arrivals
# https://data2.unhcr.org/population/?widget_id=267299&sv_id=11&population_group=4797&year=latest # Sea Arrivals

给予：

tail(df)
   month year unix_timestamp individuals
76     5 2021     1622160000        9401
77     6 2021     1624838400        9245
78     7 2021     1627430400       12565
79     8 2021     1630108800       15749
80     9 2021     1632787200       16052
81    10 2021     1635379200        1902

【讨论】：