【问题标题】:lapply Malfunction When Converting S4's to List to Dataframe将 S4 转换为列表到数据框时出现 lapply 故障
【发布时间】:2021-06-04 03:10:24
【问题描述】:

我正在使用包 RISmed 抓取 PubMed 的书目数据,并且在使用 lapply 时遇到了很多麻烦。我的总体目标是从 RISmed 中的单个搜索到 R 中的数据框。

问题 1:最大回报为 99,999,我很快就达到了。为了解决这个问题,我已经分解了搜索。

问题 2:Lapply 出现故障,我认为当搜索返回“零”时会发生这种情况,但我不确定。这是错误代码:

 list_cpg <- lapply(search_cpg[1:length_issn], gen_df)
 Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'object' in selecting a method for function 'PMID': invalid class “EUtilsSummary” object: invalid object for slot "PMID" in class "EUtilsSummary": got class "list", should be or extend class "character" 

我已经缩减了代码以使其更简单(实际代码约为 1800 次搜索,但您在下面的 15 次迭代中找到了它)。我尝试过使用粘贴函数来减少重复值,尝试了一堆失败的 for 循环,尝试剪切返回零的搜索(但它是从研究类型到研究类型的变量)等等。

简而言之,这里是使用 RISmed 的函数,可以让您一直遇到该错误

install.packages("RISmed") 
library(RISmed)


### Function to scrape Pubmed into S4  

scrape_pubmed <- function (x) {
  x %>%
    EUtilsSummary(
      retmax=99999, 
      datetype = "ppdt") %>% 
    EUtilsGet(type = "efetch", db = "pubmed")
}


### Build function to pull relevant data from S4 into list of lists 

make_list <- function (x) {
  list('PMID' = PMID(x),
       'Authors' = Author(x), 
       'Year' = YearPubmed(x), 
       'Month' = MonthPubmed(x),
       'Day' = DayPubmed(x), 
       'Journal' = Title(x),
       'ISSN' = ISSN(x),
       'PublicationType' = PublicationType(x)) 
}

### Generate dataframe from lists 

extract_data <- function (x) {
  pmap_dfr(x, ~data.frame(
    .y,
    pmid = paste(.x,  collapse = "-"), 
    year = paste(..3, collapse = "-"), 
    month = paste(..4, collapse = "-"), 
    day = paste(..5, collapse = "-"), 
    journal = paste(..6, collapse = "-"), 
    ISSN = paste(..7, collapse = "-"), 
    type = paste(..8, collapse = "-"),
    stringsAsFactors = FALSE))
}

### Combine scrape, list, and dataframe functions into one process 

gen_df <- function(x) {
  Sys.sleep(5)
  x %>% 
    scrape_pubmed() %>% 
    make_list() %>% 
    extract_data()
}


####################  ACQUIRE DATASET  #################### 

# Searches: 
search_cpg <- c(
"(Guideline[Publication Type]) AND 1990/01/01[PDat]:2020/12/31[PDat] AND 1533-4406[issn]", 
"(Guideline[Publication Type]) AND 1990/01/01[PDat]:2020/12/31[PDat] AND 0028-4793[issn]", 
"(Guideline[Publication Type]) AND 1990/01/01[PDat]:2020/12/31[PDat] AND 1474-547X[issn]", 
"(Guideline[Publication Type]) AND 1990/01/01[PDat]:2020/12/31[PDat] AND 0140-6736[issn]",
"(Guideline[Publication Type]) AND 1990/01/01[PDat]:2020/12/31[PDat] AND 0092-8674[issn]", 
"(Guideline[Publication Type]) AND 1990/01/01[PDat]:2020/12/31[PDat] AND 1091-6490[issn]",
"(Guideline[Publication Type]) AND 1990/01/01[PDat]:2020/12/31[PDat] AND 0027-8424[issn]",
"(Guideline[Publication Type]) AND 1990/01/01[PDat]:2020/12/31[PDat] AND 1538-3598[issn]",
"(Guideline[Publication Type]) AND 1990/01/01[PDat]:2020/12/31[PDat] AND 0098-7484[issn]",
"(Guideline[Publication Type]) AND 1990/01/01[PDat]:2020/12/31[PDat] AND 1527-7755[issn]")

# Run function, create lists using lapply 
list_cpg <- lapply(search_cpg, gen_df)

# Convert list to dataframe
df_cpg <- data.table::rbindlist(list_cpg)

【问题讨论】:

  • 提示:当x 的长度为 0 时,1:length(x) 会产生什么?这就是为什么您应该始终使用seq_len()seq_along()

标签: r database for-loop lapply


【解决方案1】:

好的,所以一位朋友给我一个明显的答案。我需要在执行之前将循环包装在 TryCatch 函数中。此代码剪断了所有内容,发布以防对其他人有帮助:

gen_df_clean <- function(x) {
  Sys.sleep(5)
  x %>% 
    scrape_pubmed() %>% 
    make_list() %>% 
    extract_data()
}

### Add TryCatch to make scripts more resilient 

gen_df <- function (x) {
  return(tryCatch(gen_df_clean(x), error=function(e) NULL))
}

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2018-09-01
    • 1970-01-01
    • 2019-02-17
    • 1970-01-01
    • 1970-01-01
    • 2017-08-30
    • 1970-01-01
    相关资源
    最近更新 更多