问题：来自体育网站的 Webscraping Vector 错误答案

【问题标题】：Question: Webscraping Vector error from sport website问题：来自体育网站的 Webscraping Vector 错误
【发布时间】：2026-02-17 21:50:02
【问题描述】：

我正在学习如何使用网络抓取进行分析。但是，目前我在使用代码中的网站并获取 2020 赛季时出现错误。

但如果我抓住了 2019 年的赛季，那就没有错了。

我得到的错误是：名称错误（x）

这是什么意思，如何修复此代码以便创建数据框

加载数据

# Import/ingest the Formula 1 race results for season 2016 ----------------
# Take a look at the data in the browser
browseURL('https://www.formel1.de/rennergebnisse/wm-stand/2020/fahrerwertung')
# Fetch the contents of the HTML-table into the variable f1
f1 <- read_html('https://www.formel1.de/rennergebnisse/wm-stand/2020/fahrerwertung') %>% 
  html_node('table') %>% 
  html_table()
# Display our data
f1

这很好用

转换数据

# Transform & tidy the data -----------------------------------------------
# Add missing column headers
colnames(f1) <- c('Pos', 'Driver', 'Total', sprintf('R%02d', 1:24))
# Convert to tibble data frame and filter on top 9 drivers
f1 <- as_tibble(f1) %>% 
  filter(as.integer(Pos) <= 10)
# Make Driver a factorial variable, replace all '-' with zeros, convert to long format
f1$Driver <- as.factor(f1$Driver)
f1[, -2] <- apply(f1[, -2], 2, function(x) as.integer(gsub('-', '0', as.character(x))))
f1long <- gather(f1, Race, Points, R01:R21)
# That looks better
f1long

错误名称错误（x）

来源 https://www.formel1.de/rennergebnisse/wm-stand/2020/fahrerwertung2020

https://www.formel1.de/rennergebnisse/wm-stand/2019/fahrerwertung2019

【问题讨论】：

标签： r rvest

【解决方案1】：

问题不在于网络抓取，而在于colnames()-part。

您抓取的表 f1 包含 20 列：

ncol(f1)
# [1] 20

但您的colnames 有 27 个名称，前提是 f1 有 27 列。

因此，您需要对代码进行两项更改：

更改为 colnames(f1) <- c('Pos', 'Driver', 'Total', sprintf('R%02d', 1:17)) [注意 17 而不是 24] 应该没问题。
此外，将gather()-部分更改为f1long <- gather(f1, Race, Points, R01:R17) [再次注意17而不是20]。

(顺便说一句，建议以后不要使用gather()，而是使用pivot_longer()；参见?gather或see here。)

【讨论】：