【问题标题】:NA when trying to loop through xpath nodes RNA 尝试循环通过 xpath 节点 R
【发布时间】:2018-03-21 17:22:16
【问题描述】:

我正在尝试从该网站获取数据,其中包含来自里约热内卢的房地产广告:

https://www.zapimoveis.com.br/aluguel/imoveis/rj+rio-de-janeiro/?gclid=EAIaIQobChMIrLjc2u7m2QIVhYGRCh3w9g0GEAAYASAAEgJKdvD_BwE#{%22parametrosautosuggest%22:[{%22Bairro%22:%22%22,%22Zona%22:%22%22,%22Cidade%22:%22RIO%20DE%20JANEIRO%22,%22Agrupamento %22:%22%22,%22Estado%22:%22RJ%22}],%22pagina%22:%221%22,%22ordem%22:%22Relevancia%22,%22paginaOrigem%22:%22ResultadoBusca%22, %22semente%22:%222109841258%22,%22formato%22:%22Lista%22}

当我逐个进入节点时,我的代码工作正常,但是当我尝试循环遍历 xpath 节点时,包“rvest”中的函数 html_text() 返回 N/A。

这是我目前写的一段代码:

library(rvest)
library(httr)



Url<-"https://www.zapimoveis.com.br/aluguel/imoveis/rj+rio-de-janeiro/?gclid=EAIaIQobChMIrLjc2u7m2QIVhYGRCh3w9g0GEAAYASAAEgJKdvD_BwE#{%22parametrosautosuggest%22:[{%22Bairro%22:%22%22,%22Zona%22:%22%22,%22Cidade%22:%22RIO%20DE%20JANEIRO%22,%22Agrupamento%22:%22%22,%22Estado%22:%22RJ%22}],%22pagina%22:%221%22,%22ordem%22:%22Relevancia%22,%22paginaOrigem%22:%22ResultadoBusca%22,%22semente%22:%222109841258%22,%22formato%22:%22Lista%22}"


website<- GET(Url)


#vectors that will store the data I want to collect
condominio<-vector()
Iptu<-vector()


#loop through nodes
for (i in 1:2){
condominio[i]<- website %>%
  read_html() %>%
html_node(xpath = "/html/body/div[3]/div[2]/section/div/article[i]/section[1]/a/div/span") %>%
html_text()

Iptu[i]<- website %>%
  read_html() %>%
  html_node(xpath = "/html/body/div[3]/div[2]/section/div/article[i]/section[1]/a/div/strong") %>%
  html_text()




}

如果我将变量 i 替换为固定数字,例如 2,代码似乎可以正常工作。

谁能帮我找到从更多广告中提取数据的方法?

非常感谢!

【问题讨论】:

    标签: r xpath web-scraping rvest httr


    【解决方案1】:

    我更喜欢指定 css 而不是 xpath。试试这样的。

    library(rvest)
    library(httr)
    
    Url<-"https://www.zapimoveis.com.br/aluguel/imoveis/rj+rio-de-janeiro/?gclid=EAIaIQobChMIrLjc2u7m2QIVhYGRCh3w9g0GEAAYASAAEgJKdvD_BwE#{%22parametrosautosuggest%22:[{%22Bairro%22:%22%22,%22Zona%22:%22%22,%22Cidade%22:%22RIO%20DE%20JANEIRO%22,%22Agrupamento%22:%22%22,%22Estado%22:%22RJ%22}],%22pagina%22:%221%22,%22ordem%22:%22Relevancia%22,%22paginaOrigem%22:%22ResultadoBusca%22,%22semente%22:%222109841258%22,%22formato%22:%22Lista%22}"
    
    website<- GET(Url)
    
    #vectors that will store the data I want to collect
    condominio<-vector()
    Iptu<-vector()
    
    condominio<- website %>%
      read_html() %>%
      html_nodes("article section a div span") %>%
      html_text()
    
    Iptu<- website %>%
      read_html() %>%
      html_nodes("article section a div strong") %>%
      html_text()
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2019-07-30
      • 2013-10-01
      • 2010-11-02
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多