【发布时间】:2021-09-16 01:01:53
【问题描述】:
我对价格不满意。我以为我正确使用了html_attr() 函数,但显然我遗漏了一些东西。
我从这个 HTML 中得到了 marca 和 product 字段,但没有得到价格:
<li data-internet-price="2,899" class="jsx-3342506598 price-0">
<div data-variant="DESKTOP_LIST" class="jsx-3342506598 cmr-icon-container">
<span id="" class="copy10 primary high jsx-2612542277 normal ">
S/ 2,899 (Oferta) </span>
</div>
</li>
我需要捕获 data-internet-price
的内容代码:
library(rvest)
library(purrr)
library(tidyverse)
urls <- list("https://www.falabella.com.pe/falabella-pe/category/cat210477/TV-Televisores?page=1",
"https://www.falabella.com.pe/falabella-pe/category/cat210477/TV-Televisores?page=2")
h <- urls %>% map(read_html) # scrape once, parse as necessary
df <- map_dfr(h %>%
map(~ .x %>%
html_nodes("div.search-results-list")), ~
data.frame(
periodo = lubridate::year(Sys.Date()),
fecha = Sys.Date(),
ecommerce = "falabella",
marca = .x %>% html_node(".pod-title") %>% html_text(),
producto = .x %>% html_node(".pod-subTitle") %>% html_text(),
precio.antes = .x %>% html_node('.prices') %>% html_attr("data-internet-price"),
precio.actual = .x %>% html_node('.prices') %>% html_attr("data-normal-price")
))
【问题讨论】:
标签: html r web-scraping purrr