【发布时间】:2020-08-28 09:48:49
【问题描述】:
我想出了如何抓取此 PDF,但我有很多这些文件需要浏览。我的意图是将其设置为一个函数,从所有 pdf 文件中导入数据(几年一个 pdf 文件),然后执行 rbind() 来制作一个数据表,然后我可以将其写为 csv。
这行得通。
library(tidyverse)
library(tabulizer)
#import the data
jan16s_raw <- extract_tables("https://www.nvsos.gov/sos/home/showdocument?id=4062")
#create data frame
cleanNvsen <- do.call(rbind, jan16s_raw)
cleanNvsen2 <-as.data.frame(cleanNvsen[3:nrow(cleanNvsen),])
#rename all of the columns
names(cleanNvsen2)[1] <- "District"
names(cleanNvsen2)[2] <- "Democrat"
names(cleanNvsen2)[3] <- "Independent American"
names(cleanNvsen2)[4] <- "Libertarian"
names(cleanNvsen2)[5] <- "Nonpartisan"
names(cleanNvsen2)[6] <- "Other"
names(cleanNvsen2)[7] <- "Republican"
names(cleanNvsen2)[8] <- "Total"
#check to see if it worked
head(example)
但这会产生一个 1 x 1 的数据框
library(tidyverse)
library(tabulizer)
#load data
jan16s_raw <- extract_tables("https://www.nvsos.gov/sos/home/showdocument?id=4062")
#create function to create data frame and then rename
clean <- function(x) {
cleanNvsen <- do.call(rbind, x)
cleanNvsen2 <-as.data.frame(cleanNvsen[3:nrow(cleanNvsen),])
names(cleanNvsen2)[1] <- "District"
names(cleanNvsen2)[2] <- "Democrat"
names(cleanNvsen2)[3] <- "Independent American"
names(cleanNvsen2)[4] <- "Libertarian"
names(cleanNvsen2)[5] <- "Nonpartisan"
names(cleanNvsen2)[6] <- "Other"
names(cleanNvsen2)[7] <- "Republican"
names(cleanNvsen2)[8] <- "Total"
}
x2 <- clean(jan16s_raw)
head(x2)
我真的很想让它工作,这样我就可以向 R 提供 url,然后运行我创建的这个干净的函数。我有几十个文件要处理。
【问题讨论】: