从网页到 R 的阅读表答案

【问题标题】：Reading table from webpage to R从网页到 R 的阅读表
【发布时间】：2018-05-11 11:39:16
【问题描述】：

我正在尝试阅读网页（http://www.cmegroup.com/trading/energy/crude-oil/west-texas-intermediate-wti-crude-oil-calendar-swap-futures_quotes_settlements_futures.html。我正在使用以下代码（我在 stackoverflow 上发现了另一个问题），但不幸的是我没有得到任何值 - 只是 NULL。有什么问题？我也试过了直接在 excel 中打开链接（电源查询）-但表格没有显示？我的最终目标是在 excel 中拥有这些数据并每天自动更新它-但是 VBA 不起作用，R 脚本也是不工作。

library(XML)
library(RCurl)
library(rlist)
theurl <- getURL("http://www.cmegroup.com/trading/energy/crude-oil/west-texas-intermediate-wti-crude-oil-calendar-swap-futures_quotes_settlements_futures.html",.opts = list(ssl.verifypeer = FALSE) )
tables <- readHTMLTable(theurl)
tables <- list.clean(tables, fun = is.null, recursive = FALSE)
n.rows <- unlist(lapply(tables, function(t) dim(t)[1]))
tables[[which.max(n.rows)]]

【问题讨论】：

您之前是否从该站点下载过数据？我已经用 R 和 Libreoffice 尝试了几件事，但它似乎为所有尝试返回了一个空表。数据可能以某种方式受到保护，因此您无法下载。
它以前可以工作....我认为数据没有受到保护，因为如果您打开网页，您可以毫无问题地复制数据。
js动态生成表格，行列html源为空；也许试试stackoverflow.com/questions/34616350/…

标签： r excel data-extraction

【解决方案1】：

好的，它在与rselenium 一起使用时有效：

library(XML)
library(RCurl)
library(rlist)
library(RSelenium)
remDr <- remoteDriver(remoteServerAddr = "192.168.99.100", port = 4445L)
remDr$open()
remDr$navigate("http://www.cmegroup.com/trading/energy/crude-oil/west-texas-intermediate-wti-crude-oil-calendar-swap-futures_quotes_settlements_futures.html")
remDr$getTitle()
[[1]]
[1] "WTI Financial Futures Settlements - CME Group"
s = unlist(remDr$getPageSource())
tables <- readHTMLTable(s)
tables <- list.clean(tables, fun = is.null, recursive = FALSE)
n.rows <- unlist(lapply(tables, function(t) dim(t)[1]))
tables[[which.max(n.rows)]]

结果head(tables[[which.max(n.rows)]]):

Month Open High Low Last Change Settle Estimated Volume Prior Day Open Interest
1 MAY 18    -    -   -    -   -.41  70.24              249                  12,547
2 JUN 18    -    -   -    -   -.62  70.61              948                  11,406
3 JLY 18    -    -   -    -   -.58  70.34               26                   9,203
4 AUG 18    -    -   -    -   -.52  69.94               26                   8,642
5 SEP 18    -    -   -    -   -.45  69.53               26                   8,627
6 OCT 18    -    -   -    -   -.42  69.11               38                   8,360

如何简单设置rselenium?https://ropensci.github.io/RSelenium/articles/RSelenium-docker.html

安装 rselenium 和依赖项时使用 devtools，因为 CRAN 中不存在某些包：

devtools::install_github("ropensci/RSelenium")
devtools::install_github("johndharrison/wdman")
devtools::install_github("johndharrison/binman")

【讨论】：