【发布时间】:2019-11-16 06:36:17
【问题描述】:
我正在尝试抓取亚马逊上某个产品的评论,然后以 CSV 格式导出结果。我试图在函数中嵌入 for 循环,但它一直失败。所以我将函数和for循环分开来查看结果,现在我不知道如何组合第1页到第10页的for循环结果。
运行脚本时,它按页面显示评论,但当我将结果保存为 CSV 时,文件只有第 10 页上的评论。
如何将 for 循环的结果合并到 CSV 中?
#install.packages("tidyverse")
#install.packages("rvest")
#install.packages("xml2")
library(tidyverse)
library(rvest)
library(xml2)
#Product = LG OLED77C9PUB Alexa Built-in C9 Series 77" 4K Ultra HD Smart OLED TV (2019)
#ASIN = B07PQ98L9D
scrape_amazon <- function(ASIN, page_num){
url_reviews <- paste0("https://www.amazon.com/LG-OLED77C9PUB-Alexa-Built-Ultra/product-reviews/",ASIN,"/?pageNumber=",page_num)
doc <- read_html(url_reviews)
#Review Date
doc %>%
html_nodes("[data-hook='review-date']")%>%
html_text() -> review_data
#Review Title
doc %>%
html_nodes("[class='a-size-base a-link-normal review-title a-color-base review-title-content a-text-bold']")%>%
html_text() -> review_title
#Review Text
doc %>%
html_nodes("[class='a-size-base review-text review-text-content']")%>%
html_text() -> review_text
#Number of Stars in Review
doc %>%
html_nodes("[data-hook='review-star-rating']")%>%
html_text() -> review_star
#Return a tibble
tibble(review_data,
review_title,
review_text,
review_star,
page = page_num)%>%
return()
}
for (i in 1:10){
review_all <- scrape_amazon(ASIN = "B07PQ98L9D", page_num = i) %>%
print(review_all)
}
#save in csv
write.table(review_all, file= "C:/Users/path/review.csv")
【问题讨论】:
标签: r for-loop web-scraping web-crawler tibble