【发布时间】:2019-01-17 23:24:51
【问题描述】:
我正在学习抓取数据,并且正在使用 transfermakt,但今天我遇到了两个问题。我用过选择器小工具。我的代码是这样的:
library(rvest)
url <- "https://www.transfermarkt.es/fc-granada/startseite/verein/16795"
webpage <- read_html(url)
players_html <- html_nodes(webpage,"#yw1 .tooltipstered")
players <- html_text(players_html)
players
valores_html <- html_nodes(webpage,'.rechts.hauptlink')
valores <- html_text(valores_html)
valores
valores <- gsub(" miles €","000", valores)
valores <- gsub(" mill. €","0000", valores)
valores
valores <- gsub(",","",valores)
valores <- gsub(" ","", valores)
valores
我在选择球员时遇到了第一个问题。这是输出。
> players_html <- html_nodes(webpage,"#yw1 .tooltipstered")
> players <- html_text(players_html)
> players
character(0)
我认为问题出在CSS选择器上,但它是在选择播放器时向我显示Selector Gadget的那个,所以我不知道如何解决这个问题。
另一个问题是选择它们的市场价值。 Gsub 不会删除一些最终的空格,以避免将字符作为数字。这是输出:
> valores_html <- html_nodes(webpage,'.rechts.hauptlink')
> valores <- html_text(valores_html)
> valores
[1] "700 miles € " "300 miles € " "800 miles € " "500 miles € "
"300 miles € "
[6] "300 miles € " "1,00 mill. € " "300 miles € " "1,20 mill. €
" "500 miles € "
[11] "1,70 mill. € " "1,50 mill. € " "1,00 mill. € " "800 miles €
" "800 miles € "
[16] "300 miles € " "2,00 mill. € " "800 miles € " "700 miles €
" "400 miles € "
[21] "700 miles € " "1,00 mill. € " "800 miles € "
> valores <- gsub(" miles €","000", valores)
> valores <- gsub(" mill. €","0000", valores)
> valores
[1] "700000 " "300000 " "800000 " "500000 " "300000 "
"300000 " "1,000000 "
[8] "300000 " "1,200000 " "500000 " "1,700000 " "1,500000 "
"1,000000 " "800000 "
[15] "800000 " "300000 " "2,000000 " "800000 " "700000 "
"400000 " "700000 "
[22] "1,000000 " "800000 "
> valores <- gsub(",","",valores)
> valores <- gsub(" ","", valores)
> valores
[1] "700000 " "300000 " "800000 " "500000 " "300000 "
"300000 " "1000000 " "300000 "
[9] "1200000 " "500000 " "1700000 " "1500000 " "1000000 "
"800000 " "800000 " "300000 "
[17] "2000000 " "800000 " "700000 " "400000 " "700000 "
"1000000 " "800000 "
基本上,用于删除最终空白的最后一个 gsub 在这种情况下没有任何作用。有人可以帮我解决这两个问题吗?
PS:我使用的是西班牙语的 transfermarkt。
【问题讨论】:
标签: r regex web-scraping gsub rvest