【发布时间】:2020-07-28 02:59:53
【问题描述】:
我无法遍历多个 URL 并将其保存在数据框中。我分享了一次只能检索一个 url 并保存在数据框中的代码。
网址中发生变化的部分是网址末尾的数字,表示日期。 我正在尝试从例如 20190901 到 20190915 中抓取所有数据并将其存储在同一个数据框中。
代码如下:
library(rvest)
library(dplyr)
# Specifying URL
url <- 'https://classic.sportsbookreview.com/betting-odds/mlb-baseball/?date=20190901'
# Reading the HTML code from website
oddspage <- read_html(url)
# Using CSS selectors to scrape away teams
awayHtml <- html_nodes(oddspage,'.eventLine-value:nth-child(1) a')
#Using CSS selectors to scrape scores
awayScoreHtml <- html_nodes(oddspage,'.first.total')
awayScore <- html_text(awayScoreHtml)
awayScore <- as.numeric(awayScore)
homeScoreHtml <- html_nodes(oddspage, '.score-periods+ .score-periods .total')
homeScore <- html_text(homeScoreHtml)
homeScore <- as.numeric(homeScore)
# Converting away data to text
away <- html_text(awayHtml)
# Using CSS selectors to scrape home teams
homeHtml <- html_nodes(oddspage,'.eventLine-value+ .eventLine-value a')
# Converting home data to text
home <- html_text(homeHtml)
# Using CSS selectors to scrape Away Odds
awayPinnacleHtml <- html_nodes(oddspage,'.eventLine-consensus+ .eventLine-book.eventLine-book-value:nth-child(1) b')
awayBookmakerHtml <- html_nodes(oddspage,'.eventLine-book:nth-child(12) .eventLine-book-value:nth-child(1) b')
# Converting Away Odds to Text
awayPinnacle <- html_text(awayPinnacleHtml)
awayBookmaker <- html_text(awayBookmakerHtml)
# Converting Away Odds to numeric
awayPinnacle <- as.numeric(awayPinnacle)
awayBookmaker <- as.numeric(awayBookmaker)
# Using CSS selectors to scrape Pinnacle Home Odds
homePinnacleHtml <- html_nodes(oddspage,'.eventLine-consensus+ .eventLine-book .eventLine-book-value+ .eventLine-book-value b')
homeBookmakerHtml <- html_nodes(oddspage,'.eventLine-book:nth-child(12) .eventLine-book-value:nth-child(2) b')
# Converting Home Odds to Text
homePinnacle <- html_text(homePinnacleHtml)
homeBookmaker <- html_text(homeBookmakerHtml)
# Converting Home Odds to Numeric
homePinnacle <- as.numeric(homePinnacle)
homeBookmaker <- as.numeric(homeBookmaker)
# Create Data Frame
df <- data.frame(away,home,awayScore,homeScore,awayPinnacle,homePinnacle,awayBookmaker,homeBookmaker)
View(df)
我对编码非常陌生,但我无法成功应用类似问题中使用的任何技术。
【问题讨论】:
-
你试过什么?这是一个简单的for循环问题。您应该能够查找有关如何在 R 中使用 for 循环并将其应用于您的问题的任意数量的资源。
标签: r loops web-scraping