【发布时间】:2021-06-04 03:10:24
【问题描述】:
我正在使用包 RISmed 抓取 PubMed 的书目数据,并且在使用 lapply 时遇到了很多麻烦。我的总体目标是从 RISmed 中的单个搜索到 R 中的数据框。
问题 1:最大回报为 99,999,我很快就达到了。为了解决这个问题,我已经分解了搜索。
问题 2:Lapply 出现故障,我认为当搜索返回“零”时会发生这种情况,但我不确定。这是错误代码:
list_cpg <- lapply(search_cpg[1:length_issn], gen_df)
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'object' in selecting a method for function 'PMID': invalid class “EUtilsSummary” object: invalid object for slot "PMID" in class "EUtilsSummary": got class "list", should be or extend class "character"
我已经缩减了代码以使其更简单(实际代码约为 1800 次搜索,但您在下面的 15 次迭代中找到了它)。我尝试过使用粘贴函数来减少重复值,尝试了一堆失败的 for 循环,尝试剪切返回零的搜索(但它是从研究类型到研究类型的变量)等等。
简而言之,这里是使用 RISmed 的函数,可以让您一直遇到该错误
install.packages("RISmed")
library(RISmed)
### Function to scrape Pubmed into S4
scrape_pubmed <- function (x) {
x %>%
EUtilsSummary(
retmax=99999,
datetype = "ppdt") %>%
EUtilsGet(type = "efetch", db = "pubmed")
}
### Build function to pull relevant data from S4 into list of lists
make_list <- function (x) {
list('PMID' = PMID(x),
'Authors' = Author(x),
'Year' = YearPubmed(x),
'Month' = MonthPubmed(x),
'Day' = DayPubmed(x),
'Journal' = Title(x),
'ISSN' = ISSN(x),
'PublicationType' = PublicationType(x))
}
### Generate dataframe from lists
extract_data <- function (x) {
pmap_dfr(x, ~data.frame(
.y,
pmid = paste(.x, collapse = "-"),
year = paste(..3, collapse = "-"),
month = paste(..4, collapse = "-"),
day = paste(..5, collapse = "-"),
journal = paste(..6, collapse = "-"),
ISSN = paste(..7, collapse = "-"),
type = paste(..8, collapse = "-"),
stringsAsFactors = FALSE))
}
### Combine scrape, list, and dataframe functions into one process
gen_df <- function(x) {
Sys.sleep(5)
x %>%
scrape_pubmed() %>%
make_list() %>%
extract_data()
}
#################### ACQUIRE DATASET ####################
# Searches:
search_cpg <- c(
"(Guideline[Publication Type]) AND 1990/01/01[PDat]:2020/12/31[PDat] AND 1533-4406[issn]",
"(Guideline[Publication Type]) AND 1990/01/01[PDat]:2020/12/31[PDat] AND 0028-4793[issn]",
"(Guideline[Publication Type]) AND 1990/01/01[PDat]:2020/12/31[PDat] AND 1474-547X[issn]",
"(Guideline[Publication Type]) AND 1990/01/01[PDat]:2020/12/31[PDat] AND 0140-6736[issn]",
"(Guideline[Publication Type]) AND 1990/01/01[PDat]:2020/12/31[PDat] AND 0092-8674[issn]",
"(Guideline[Publication Type]) AND 1990/01/01[PDat]:2020/12/31[PDat] AND 1091-6490[issn]",
"(Guideline[Publication Type]) AND 1990/01/01[PDat]:2020/12/31[PDat] AND 0027-8424[issn]",
"(Guideline[Publication Type]) AND 1990/01/01[PDat]:2020/12/31[PDat] AND 1538-3598[issn]",
"(Guideline[Publication Type]) AND 1990/01/01[PDat]:2020/12/31[PDat] AND 0098-7484[issn]",
"(Guideline[Publication Type]) AND 1990/01/01[PDat]:2020/12/31[PDat] AND 1527-7755[issn]")
# Run function, create lists using lapply
list_cpg <- lapply(search_cpg, gen_df)
# Convert list to dataframe
df_cpg <- data.table::rbindlist(list_cpg)
【问题讨论】:
-
提示:当
x的长度为 0 时,1:length(x)会产生什么?这就是为什么您应该始终使用seq_len()和seq_along()。
标签: r database for-loop lapply