【问题标题】:Problem scraping a webpage with R and Rvest使用 R 和 Rvest 抓取网页时出现问题
【发布时间】:2021-10-02 19:27:22
【问题描述】:

我使用下面的代码从网页中提取表格:

library(rvest)
library(dplyr)

#Link to site and then getting html code. 
link <- "https://www.stats.gov.sa/en/915"
page <- read_html(link)

#extract table from html
files <- page %>%
    html_nodes("table") %>%
    .[[1]] %>%
    html_table()

但是,我得到的结果与网页上的不同。结果如下图:

小标题:1 × 4 名称Report Period 周期性下载

1 请稍等...请稍等...请稍等...请稍等...

我想知道有没有一种方法可以在不使用 Rselenium 的情况下以我通过网络浏览器查看表格的形式获取表格。这是因为这似乎不适用于 r studio online

【问题讨论】:

    标签: r web-scraping rvest


    【解决方案1】:

    解决方案可能是 RSelenium

    下面是一个简单的例子

    library(RSelenium)
    library(rvest)
    library(dplyr)
    #Your URL
    URL <- "https://www.stats.gov.sa/en/915"
    #Open the browser by RSelenium
    rD <- RSelenium::rsDriver(browser = "firefox", port = 4544L, verbose = F)
    remDr <- rD[["client"]]
    #Open the page into browser
    remDr$navigate(URL)
    #Get the table that you see
    remDr$getPageSource()[[1]] %>% 
      read_html() %>%
      html_table()
    
    
        [[1]]
    # A tibble: 13 x 4
       Name                           `Report Period` Periodicity Download
       <chr>                                    <int> <chr>       <lgl>   
     1 Ar-Riyad Region                           2017 Annual      NA      
     2 Makkah Al-Mokarramah Region               2017 Annual      NA      
     3 Al-Madinah Al-Monawarah Region            2017 Annual      NA      
     4 Al-Qaseem Region                          2017 Annual      NA      
     5 Eastern Region                            2017 Annual      NA      
     6 Aseer Region                              2017 Annual      NA      
     7 Tabouk Region                             2017 Annual      NA      
     8 Hail Region                               2017 Annual      NA      
     9 Northern Borders Region                   2017 Annual      NA      
    10 Jazan Region                              2017 Annual      NA      
    11 Najran Region                             2017 Annual      NA      
    12 Al-Baha Region                            2017 Annual      NA      
    13 Al-Jouf Region                            2017 Annual      NA 
    

    【讨论】:

      猜你喜欢
      • 2015-11-07
      • 1970-01-01
      • 2018-05-05
      • 2015-09-06
      • 1970-01-01
      • 2018-02-15
      • 2023-03-22
      • 1970-01-01
      相关资源
      最近更新 更多