这个赛季已经过去了,所以 OP 可能不需要这个,但足够多的例子最终会导致这样的问题不会被问到。
那个网站确实通过 XHR 请求使用 javascript 来异步加载内容,但我们绝对不需要在图片中引入像 Selenium 这样的重量级依赖项来解决大多数这样的问题。
您可以在此处查看它异步 XHR 加载的 URL(开始使用浏览器开发者工具练习):
现在,该 URL 也是动态生成的(是的,令人费解的、次优的 Web 应用程序设计)。所以,我们需要变得聪明。我使用curlconverter 来查看该 URL 的格式(如下所示,这很讨厌)。但是,这是令人讨厌的和统一。因此,您应该能够发布任何团队统计 URL(它必须与您显示的一样)作为参数并返回可爱的 javascript。以下是注释较少的函数:
get_team_stats <- function(team_stats_url) {
suppressPackageStartupMessages({
library(httr, warn.conflicts = FALSE, quietly = TRUE)
library(jsonlite, warn.conflicts = FALSE, quietly = TRUE)
})
res <- httr::GET(team_stats_url) # to prime cookies
httr::stop_for_status(res) # and also validate the team URL
# extract the team id
team_id <- gsub("^[[:alpha:]]+\\-", "", strsplit(team_stats_url, "/")[[1]][6])
httr::GET(
url = sprintf("https://gc.com/stats/team/%s/", team_id), # use the team id
query = list(
stats_requested = "[{\"category\":\"offense\",\"key\":\"GP\"},{\"category\":\"offense\",\"key\":\"PA\"},{\"category\":\"offense\",\"key\":\"AB\"},{\"category\":\"offense\",\"key\":\"H\"},{\"category\":\"offense\",\"key\":\"1B\"},{\"category\":\"offense\",\"key\":\"2B\"},{\"category\":\"offense\",\"key\":\"3B\"},{\"category\":\"offense\",\"key\":\"HR\"},{\"category\":\"offense\",\"key\":\"RBI\"},{\"category\":\"offense\",\"key\":\"R\"},{\"category\":\"offense\",\"key\":\"HBP\"},{\"category\":\"offense\",\"key\":\"ROE\"},{\"category\":\"offense\",\"key\":\"FC\"},{\"category\":\"offense\",\"key\":\"CI\"},{\"category\":\"offense\",\"key\":\"BB\"},{\"category\":\"offense\",\"key\":\"SO\"},{\"category\":\"offense\",\"key\":\"AVG\"},{\"category\":\"offense\",\"key\":\"OBP\"},{\"category\":\"offense\",\"key\":\"SLG\"},{\"category\":\"offense\",\"key\":\"OPS\"}]",
qualifying_stat = "{\"key\":\"GP\",\"category\":\"offense\"}",
game_filter = "All"
)
) -> res
httr::stop_for_status(res) # warn if anything goes wrong
jsonlite::fromJSON(
httr::content(res, as = "text", encoding = "UTF-8")
)
}
让我们试一试:
team_stats <- get_team_stats("https://gc.com/t/summer-2018/west-5b3ad51e396a0500018e8513/stats")
str(team_stats, 1)
## List of 3
## $ glossary:'data.frame': 20 obs. of 5 variables:
## $ players :'data.frame': 14 obs. of 2 variables:
## $ totals :'data.frame': 1 obs. of 1 variable: