【问题标题】:Loop over list of dataframes applying various functions in R循环应用 R 中各种函数的数据帧列表
【发布时间】:2023-01-24 20:46:19
【问题描述】:

我正在努力将各种功能应用于 60 个数据帧的列表。主要是我想用select和pivot_longer,但是我还需要把一些变量转换成数字。出于某种原因,我找到的解决方案不起作用。基本上我需要做三件事:

首先,读入所有的excel表(显然不止这两个)

df1 <- readxl::read_xlsx("C:/Users/.../df_list.xlsx", skip = 3, col_names = T, sheet = "df_1")
df2 <- readxl::read_xlsx("C:/Users/.../df_list.xlsx", skip = 3, col_names = T, sheet = "df_2")

其次,我想转动更长的时间,取消选择一些

df1 <- df1  %>%  
  pivot_longer(!c("country", "type", "company", "sector", "name"), names_to = "year", values_to = "df1") %>%
  select(!name)

df2 <- df2  %>%  
  pivot_longer(!c("country", "type", "company", "sector", "name"), names_to = "year", values_to = "df1") %>%
  select(!name)

第三,我想合并成一个数据框

df <- df1 %>% 
  left_join(df2,
            by = c("country", "type", "company", "sector", "name", "year"))

由于没有两个,而是更多包含不同变量的 excel 表,我想将它们放在列表中并循环应用所有相同的函数。

我使用以下方法管理的第一步:

mysheets_fromexcel <- list()
mysheetlist <- excel_sheets(path="C:/Users/.../df_list.xlsx")
i=1
for (i in 1:length(mysheetlist)){
  tempdf <- read_excel(path="C:/Users/.../df_list.xlsx", sheet = mysheetlist[i], skip = 3, col_names = T)
  tempdf$sheetname <- mysheetlist[i]
  mysheets_fromexcel[[i]] <- tempdf 
}

现在我有如下所示的内容:

df1 <- data.frame("type" = c("679821", "2800K7", "31938W", "749352", "15437R"),
                  "company" = c("A", "B", "C", "D", "E"),
                  "sector" = c("AA", "BB", "BB", "CC", "DD"),
                  "name" = c("A - var1", "B - var1", "C - var1", "D - var1" ,"E - var1"),
                  "country" = c("US", "US", "UK", "UK", "DE"),
                  "2010" = c(NA, 9999, 9999, NA, NA),
                  "2011" = c("Y", "9999", NA, "N", "9999"),
                  "2012" = c("Y", "9999", "N", "N", "9999"))

df2 <- data.frame("type" = c("679821", "2800K7", "31938W", "749352", "15437R"),
                  "company" = c("A", "B", "C", "D", "E"),
                  "sector" = c("AA", "BB", "BB", "CC", "DD"),
                  "name" = c("A - var2", "B - var2", "C - var2", "D - var2" ,"E - var2"),
                  "country" = c("US", "US", "UK", "UK", "DE"),
                  "2010" = c(NA, 9999, NA, NA, NA),
                  "2011" = c("N", "N", NA, "9999", "9999"),
                  "2012" = c("Y", "9999", "Y", "Y", "9999"))

mylist <- list(A = df1, B = df2)  

并非“2010”、“2011”、“2012”列中的所有值都属于同一类:有些是数字,有些是字符。要转向,我认为这些需要属于同一类。理想情况下,我会首先重新编码这些,在单个数据框中看起来像:

df1 <- df1 %>% 
  mutate(y2010 = case_when(y2010 == "Y" ~ 1,
                           y2010 == "N" ~ 0,
                           y2010 == 9999 ~ NA_real_),
         y2011 = case_when(y2011 == "Y" ~ 1,
                           y2011 == "N" ~ 0,
                           y2011 == 9999 ~ NA_real_),
         y2012 = case_when(y2012 == "Y" ~ 1,
                           y2012 == "N" ~ 0,
                           y2012 == 9999 ~ NA_real_))

但是对于这组变量以及列表中的所有数据帧来说都是理想的。

然后转向我尝试:

lapply(mylist, function(x) x %>% pivot_longer(!c("country", "type", "company", "sector", "name"), names_to = "year", values_to = mylist[i]))

它不起作用..

最终,它应该看起来像:

type   company sector country year    df1   df2
<chr>  <chr>   <chr>  <chr>   <chr> <dbl> <dbl>
679821 A       AA     US      y2010    NA    NA
679821 A       AA     US      y2011     1     0
679821 A       AA     US      y2012     1     1
2800K7 B       BB     US      y2010    NA    NA
2800K7 B       BB     US      y2011    NA     0
2800K7 B       BB     US      y2012    NA    NA
31938W C       BB     UK      y2010    NA    NA
31938W C       BB     UK      y2011    NA    NA
31938W C       BB     UK      y2012     0     1
749352 D       CC     UK      y2010    NA    NA
749352 D       CC     UK      y2011     0    NA
749352 D       CC     UK      y2012     0     1
15437R E       DD     DE      y2010    NA    NA
15437R E       DD     DE      y2011    NA    NA
15437R E       DD     DE      y2012    NA    NA

很抱歉这个问题很长。有几个步骤,但基线是我需要遍历大量数据框列表,但我不知道具体如何操作。

【问题讨论】:

    标签: r list dataframe pivot


    【解决方案1】:

    我们可能会使用

    library(dplyr)
    library(purrr)
    library(tidyr)
    imap(mylist, ~ .x %>%
       mutate(across(matches("\d{4}$"), as.character)) %>%
       pivot_longer(cols = -c("country", "type", "company", "sector", "name"), 
         names_to = "year", values_to = .y)) %>%
       reduce(left_join)%>%
       mutate(across(all_of(names(mylist)), ~ case_when(.x == "Y" ~ 1, .x == "N" ~ 0))
    

    【讨论】:

    • 嗯...很好!好久不见btw
    猜你喜欢
    • 2022-01-08
    • 1970-01-01
    • 1970-01-01
    • 2018-10-29
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2015-07-27
    • 1970-01-01
    相关资源
    最近更新 更多