【发布时间】:2025-12-15 12:15:02
【问题描述】:
我希望按类别从https://www.inquirer.net/article-index?d=2020-6-13 获取文章名称
我尝试通过以下方式阅读文章名称:
library('rvest')
year <- 2020
month <- 06
day <- 13
url <- paste('http://www.inquirer.net/article-index?d=', year, '-', month, '-',day, sep = "")
pg <- read_html(url)
test<-pg %>%
html_nodes("#index-wrap") %>%
html_text()
这仅返回所有文章名称的 1 个字符串,而且非常混乱。
我最终希望有一个如下所示的数据框:
Date Category Article Name
2020-06-13 News ‘We can never let our guard down’ vs terrorism – Cayetano
2020-06-13 News PNP spox says mañanita remark did not intend to put Sinas in bad light
2020-06-13 News After stranded mom’s death, Pasay LGU helps over 400 stranded individuals
2020-06-13 World 4 dead after tanker truck explodes on highway in China
etc.
etc.
etc.
etc.
2020-06-13 Lifestyle Book: Melania Trump delayed 2017 move to DC to get new prenup
有人知道我可能会错过什么吗?非常新,谢谢!
【问题讨论】:
-
你好。您是否尝试过
read_html(url)而不仅仅是url? -
是的,我试过了,它只返回一个长字符串的结果