【问题标题】:Web scraping table R网页抓取表 R
【发布时间】:2021-09-29 23:19:58
【问题描述】:

我正在尝试从该站点 https://www.ratingraph.com/tv-shows/one-piece-ratings-17673/ 的评级列中获取数据,但我遇到了“{xml_nodeset (0)}”的问题。

我的尝试:

library("rvest")
`%>%` <- magrittr::`%>%`

page <- read_html("https://www.ratingraph.com/tv-shows/one-piece-ratings-17673/")
table <- page %>% 
  html_nodes("table") 
df <- table[2] %>% 
  html_table()

这些是我想要的数据:

【问题讨论】:

    标签: html r web-scraping rvest


    【解决方案1】:

    通过检查页面并查看“网络”选项卡,您可以看到它创建表的调用。 响应采用 JSON 格式,很容易解析为 R 列表。 其中大部分对于您的目的可能是不必要的,因此您可以缩短它。 如果你想要超过 25 行,增加 length=25,或者去掉。

    page <- httr::GET(
      paste0("https://www.ratingraph.com/show-episodes-list/17673/?draw=1&columns%5B0%5D%5Bdata%5D=trend&",
             "columns%5B0%5D%5Bname%5D=&columns%5B0%5D%5Bsearchable%5D=false&columns%5B0%5D%5Borderable%5D=true&columns%5B0%5D%5Bsearch%5D%5Bvalue%5D=&columns%5B0%5D%5Bsearch%5D%5Bregex%5D=false&columns%5B1%5D%5Bdata%5D=season&",
             "columns%5B1%5D%5Bname%5D=&columns%5B1%5D%5Bsearchable%5D=false&columns%5B1%5D%5Borderable%5D=true&columns%5B1%5D%5Bsearch%5D%5Bvalue%5D=&columns%5B1%5D%5Bsearch%5D%5Bregex%5D=false&columns%5B2%5D%5Bdata%5D=episode&",
             "columns%5B2%5D%5Bname%5D=&columns%5B2%5D%5Bsearchable%5D=false&columns%5B2%5D%5Borderable%5D=true&columns%5B2%5D%5Bsearch%5D%5Bvalue%5D=&columns%5B2%5D%5Bsearch%5D%5Bregex%5D=false&columns%5B3%5D%5Bdata%5D=name&",
             "columns%5B3%5D%5Bname%5D=&columns%5B3%5D%5Bsearchable%5D=false&columns%5B3%5D%5Borderable%5D=true&columns%5B3%5D%5Bsearch%5D%5Bvalue%5D=&columns%5B3%5D%5Bsearch%5D%5Bregex%5D=false&columns%5B4%5D%5Bdata%5D=start&",
             "columns%5B4%5D%5Bname%5D=&columns%5B4%5D%5Bsearchable%5D=false&columns%5B4%5D%5Borderable%5D=true&columns%5B4%5D%5Bsearch%5D%5Bvalue%5D=&columns%5B4%5D%5Bsearch%5D%5Bregex%5D=false&columns%5B5%5D%5Bdata%5D=total_votes&",
             "columns%5B5%5D%5Bname%5D=&columns%5B5%5D%5Bsearchable%5D=false&columns%5B5%5D%5Borderable%5D=true&columns%5B5%5D%5Bsearch%5D%5Bvalue%5D=&columns%5B5%5D%5Bsearch%5D%5Bregex%5D=false&columns%5B6%5D%5Bdata%5D=average_rating&",
             "columns%5B6%5D%5Bname%5D=&columns%5B6%5D%5Bsearchable%5D=false&columns%5B6%5D%5Borderable%5D=true&columns%5B6%5D%5Bsearch%5D%5Bvalue%5D=&columns%5B6%5D%5Bsearch%5D%5Bregex%5D=false&order%5B0%5D%5Bcolumn%5D=1&",
             "order%5B0%5D%5Bdir%5D=asc&order%5B1%5D%5Bcolumn%5D=2&order%5B1%5D%5Bdir%5D=asc&start=0&length=25&search%5Bvalue%5D=&search%5Bregex%5D=false&_=", Sys.time() %>% as.numeric() %>% paste0("000")))
    table <- page %>% httr::content(as = 'parsed')
    avg_ratings <- sapply(table$data, `[[`, 'average_rating') %>% as.numeric()
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2017-02-27
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2020-07-20
      • 2011-08-15
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多