【发布时间】:2017-11-20 20:43:46
【问题描述】:
我试图从 Finviz 报废一些股票关键数据。我应用了原始问题的代码:Web scraping of key stats in Yahoo! Finance with R。为了尽可能多地收集股票的统计数据,我创建了一个股票代码和描述列表,如下所示:
Symbol Description
A Agilent Technologies
AAA Alcoa Corp
AAC Aac Holdings Inc
BABA Alibaba Group Holding Ltd
CRM Salesforce.Com Inc
...
我选择了第一列并将其作为一个字符存储在 R 中,并称之为股票。然后我应用了代码:
for (s in stocks) {
url <- paste0("http://finviz.com/quote.ashx?t=", s)
webpage <- readLines(url)
html <- htmlTreeParse(webpage, useInternalNodes = TRUE, asText = TRUE)
tableNodes <- getNodeSet(html, "//table")
# ASSIGN TO STOCK NAMED DFS
assign(s, readHTMLTable(tableNodes[[9]],
header= c("data1", "data2", "data3", "data4", "data5", "data6",
"data7", "data8", "data9", "data10", "data11", "data12")))
# ADD COLUMN TO IDENTIFY STOCK
df <- get(s)
df['stock'] <- s
assign(s, df)
}
# COMBINE ALL STOCK DATA
stockdatalist <- cbind(mget(stocks))
stockdata <- do.call(rbind, stockdatalist)
# MOVE STOCK ID TO FIRST COLUMN
stockdata <- stockdata[, c(ncol(stockdata), 1:ncol(stockdata)-1)]
但是,对于某些股票,Finviz 没有它们的页面,我收到如下错误消息:
Error in file(con, "r") : cannot open the connection
In addition: Warning message:
In file(con, "r") :
cannot open URL 'http://finviz.com/quote.ashx?t=AGM.A': HTTP status was '404
Not Found'
有很多股票都有这种情况,所以我无法手动将它们从我的列表中删除。有没有办法跳过获取这些股票的页面?提前致谢!
【问题讨论】:
标签: r web-scraping finance stock