【发布时间】:2019-06-02 17:22:49
【问题描述】:
我正在尝试在RSelenium 中实现异常处理,需要帮助。请注意,我已经检查了使用 robotstxt 包抓取此页面的权限。
library(RSelenium)
library(XML)
library(janitor)
library(lubridate)
library(magrittr)
library(dplyr)
remDr <- remoteDriver(
remoteServerAddr = "192.168.99.100",
port = 4445L
)
remDr$open()
# Open TightVNC to follow along as RSelenium drives the browser
# navigate to the main page
remDr$navigate("https://docs.google.com/spreadsheets/d/1o1PlLIQS8v-XSuEz1eqZB80kcJk9xg5lsbueB7mTg1U/pub?output=html&widget=true#gid=690408156")
# look for table element
tableElem <- remDr$findElement(using = "id", "pageswitcher-content")
# switch to table
remDr$switchToFrame(tableElem)
# parse html for first table
doc <- htmlParse(remDr$getPageSource()[[1]])
table_tmp <- readHTMLTable(doc)
table_tmp <- table_tmp[[1]][-2, -1]
table_tmp <- table_tmp[-1, ]
colnames(table_tmp) <- c("team_name", "team_size", "start_time", "end_time", "total_time", "puzzels_solved")
table_tmp$city <- rep("montreal", nrow(table_tmp))
table_tmp$date <- rep(Sys.Date() - 5, nrow(table_tmp))
# switch back to the main/outer frame
remDr$switchToFrame(NULL)
# I found the elements I want to manipulate with Inspector mode in a browser
webElems <- remDr$findElements(using = "css", ".switcherItem") # Month/Year tabs at the bottom
arrowElems <- remDr$findElements(using = "css", ".switcherArrows") # Arrows to scroll left and right at the bottom
# Create NULL object to be used in for loop
big_df <- NULL
for (i in seq(length(webElems))) {
# choose the i'th Month/Year tab
webElem <- webElems[[i]]
webElem$clickElement()
tableElem <- remDr$findElement(using = "id", "pageswitcher-content") # The inner table frame
# switch to table frame
remDr$switchToFrame(tableElem)
Sys.sleep(3)
# parse html with XML package
doc <- htmlParse(remDr$getPageSource()[[1]])
Sys.sleep(3)
# Extract data from HTML table in HTML document
table_tmp <- readHTMLTable(doc)
Sys.sleep(3)
# put this into a format you can use
table <- table_tmp[[1]][-2, -1]
table <- table[-1, ]
# rename the columns
colnames(table) <- c("team_name", "team_size", "start_time", "end_time", "total_time", "puzzels_solved")
# add city name to a column
table$city <- rep("Montreal", nrow(table))
# add the Month/Year this table was extracted from
today <- Sys.Date() %m-% months(i + 1)
table$date <- today
# concatenate each table together
big_df <- dplyr::bind_rows(big_df, table)
# Switch back to main frame
remDr$switchToFrame(NULL)
################################################
### I should use exception handling here ###
################################################
}
当浏览器到达January 2018 表时,它无法再找到下一个webElems 元素并抛出错误:
Selenium 消息:元素当前不可见,因此可能无法与之交互 构建信息:版本:'2.53.1',修订:'a36b8b1',时间:'2016-06-30 17:37:03' 系统信息:主机:'617e51cbea11',ip:'172.17.0.2',os.name:'Linux',os.arch:'amd64',os.version:'4.14.79-boot2docker',java.version:' 1.8.0_91' 驱动信息:driver.version:未知
错误:摘要:ElementNotVisible 详细信息:元素命令无法完成,因为该元素在页面上不可见。 类:org.openqa.selenium.ElementNotVisibleException 更多细节:运行 errorDetails 方法 另外:有50个或更多的警告(使用warnings()查看前50个)
我一直很天真地处理它,将这段代码包含在 for 循环的末尾。这不是一个好主意,原因有两个:1)滚动速度很难弄清楚,并且会在其他(更长的)谷歌页面上失败,2)当它尝试点击右箭头时,for循环最终失败,但是它已经在最后 - 因此它不会下载最后几张表。
# click the right arrow to scroll right
arrowElem <- arrowElems[[1]]
# once you "click"" the element it is "held down" - no way to " unclick" to prevent it from scrolling too far
# I currently make sure it only scrolls a short distance - via Sys.sleep() before switching to outer frame
arrowElem$clickElement()
# give it "just enough time" to scroll right
Sys.sleep(0.3)
# switch back to outer frame to re-start the loop
remDr$switchToFrame(NULL)
我希望在出现此错误时通过执行arrowElem$clickElement() 来处理此异常。我认为通常会使用tryCatch();不过,这也是我第一次学习异常处理。我以为我可以将它包含在 for 循环的 remDr$switchToFrame(tableElem) 部分中,但它不起作用:
tryCatch({
suppressMessages({
remDr$switchToFrame(tableElem)
})
},
error = function(e) {
arrowElem <- arrowElems[[1]]
arrowElem$clickElement()
Sys.sleep(0.3)
remDr$switchToFrame(NULL)
}
)
【问题讨论】:
标签: r exception-handling try-catch rselenium