【问题标题】:Assigning different names to columns in list using tidyr::pivot_longer, and combining them使用 tidyr::pivot_longer 为列表中的列分配不同的名称,并将它们组合起来
【发布时间】:2019-10-31 15:48:02
【问题描述】:

我正在提取我打算用tidyr::pivot_longer() 整理的“宽”数据。

library(tidyverse)

df1 <-
  data.frame(
    M = words[1:10],
    N = rnorm(10, 3, 3),
    O = rnorm(10, 3, 3),
    P = rnorm(10, 3, 3)
  )

df2 <-
  data.frame(
    M = words[1:10],
    N = rnorm(10, 3, 3),
    O = rnorm(10, 3, 3),
    P = rnorm(10, 3, 3)
  )

df3 <-
  data.frame(
    M = words[1:10],
    N = rnorm(10, 3, 3),
    O = rnorm(10, 3, 3),
    P = rnorm(10, 3, 3)
  )

df4 <-
  data.frame(
    M = words[1:10],
    N = rnorm(10, 3, 3),
    O = rnorm(10, 3, 3),
    P = rnorm(10, 3, 3)
  )

lst <- list(df1, df2, df3, df4)

colname <-
  c("ticker", "2017", "2018", "2019")
header <- list("Leverage", "Gearing", "Capex.to.sales", "FCFex")

lst <- lst %>% 
  lapply(setNames, colname) %>% 
  lapply(pivot_longer, -ticker, names_to = "Period", values_to = header)

使用values_to = header 会给我这个错误:

[[tmp`, ".value", value = list("Leverage", : 替换有 4 行,数据有 3 行

相反,我不得不使用默认的values_to = "value",然后使用此代码重命名我的列:

lst <- lst %>% 
  lapply(setNames, colname) %>% 
  lapply(pivot_longer, -ticker, names_to = "Period", values_to = "value")

lst <- map(seq_along(lst), function(i){
  x <- lst[[i]]
  colnames(x)[3] <- header[[i]]
  x
})

我的输出如下所示(列重命名),但我想知道是否有办法将向量输入values_to 而不是使用map(因为它可以更好地管道)?或者有没有更有效的方法来解决这个问题?

> lst
[[1]]
# A tibble: 30 x 3
   ticker   Period Leverage
   <fct>    <chr>     <dbl>
 1 a        2017      6.01 
 2 a        2018      4.82 
 3 a        2019      1.58 
 4 able     2017      8.64 
 5 able     2018      6.70 
 6 able     2019      0.831
 7 about    2017     -0.187
 8 about    2018      0.549
 9 about    2019      0.829
10 absolute 2017      1.26 
# ... with 20 more rows

[[2]]
# A tibble: 30 x 3
   ticker   Period Gearing
   <fct>    <chr>    <dbl>
 1 a        2017    2.37  
 2 a        2018    3.58  
 3 a        2019    5.63  
 4 able     2017    0.311 
 5 able     2018    0.708 
 6 able     2019   -0.0651
 7 about    2017    2.89  
 8 about    2018    6.25  
 9 about    2019   10.1   
10 absolute 2017    6.48  
# ... with 20 more rows

[[3]]
# A tibble: 30 x 3
   ticker   Period Capex.to.sales
   <fct>    <chr>           <dbl>
 1 a        2017            5.22 
 2 a        2018            1.88 
 3 a        2019            0.746
 4 able     2017           -3.90 
 5 able     2018            3.06 
 6 able     2019            1.91 
 7 about    2017            1.35 
 8 about    2018            4.12 
 9 about    2019           11.1  
10 absolute 2017            1.76 
# ... with 20 more rows

[[4]]
# A tibble: 30 x 3
   ticker   Period  FCFex
   <fct>    <chr>   <dbl>
 1 a        2017    1.76 
 2 a        2018    2.85 
 3 a        2019    1.86 
 4 able     2017   -3.38 
 5 able     2018   -3.02 
 6 able     2019   -1.52 
 7 about    2017    6.46 
 8 about    2018    5.39 
 9 about    2019    0.810
10 absolute 2017    8.08 
# ... with 20 more rows

对于我的问题的第二部分,我打算使用bind_col() 将所有四个数据帧合并为一个,但两个常见的列被复制(如下所示)。

我如何告诉 R 只绑定最右边的已重命名的列,即排除最后三个数据帧的前两列?谢谢。

Metrics <- bind_cols(lst)

> Metrics
# A tibble: 30 x 12
   ticker Period Leverage ticker1 Period1 Gearing ticker2 Period2
   <fct>  <chr>     <dbl> <fct>   <chr>     <dbl> <fct>   <chr>  
 1 a      2017      6.01  a       2017     2.37   a       2017   
 2 a      2018      4.82  a       2018     3.58   a       2018   
 3 a      2019      1.58  a       2019     5.63   a       2019   
 4 able   2017      8.64  able    2017     0.311  able    2017   
 5 able   2018      6.70  able    2018     0.708  able    2018   
 6 able   2019      0.831 able    2019    -0.0651 able    2019   
 7 about  2017     -0.187 about   2017     2.89   about   2017   
 8 about  2018      0.549 about   2018     6.25   about   2018   
 9 about  2019      0.829 about   2019    10.1    about   2019   
10 absol~ 2017      1.26  absolu~ 2017     6.48   absolu~ 2017   
# ... with 20 more rows, and 4 more variables: Capex.to.sales <dbl>,
#   ticker3 <fct>, Period3 <chr>, FCFex <dbl>

【问题讨论】:

  • 您能否为words 提供样本以使其可重现?这真的有助于理解你想要做什么
  • 抱歉,忘记包含所需的库(tidyverse)。该示例现在应该可以重现。这是我要重命名的 values_to 参数。

标签: r list


【解决方案1】:

您可以使用purrr

library(purrr)

lst <- map(lst, setNames, colname)

map2_dfc(lst, header, ~ pivot_longer(
  .x, -ticker, names_to = "Period", values_to = .y)) %>% 
  select(c(1:3, 6, 9, 12))

输出:

   ticker   Period Leverage Gearing Capex.to.sales FCFex
   <fct>    <chr>     <dbl>   <dbl>          <dbl> <dbl>
 1 a        2017      6.20     3.43          7.87   7.52
 2 a        2018      1.63     3.30          0.126  1.52
 3 a        2019      2.32     1.49         -0.286  6.95
 4 able     2017      6.38     3.42          7.34   2.60
 5 able     2018      0.763    1.68         -0.648 -2.85
 6 able     2019      5.56     2.35         -0.572  3.21
 7 about    2017     -0.762    1.49          3.12   2.43
 8 about    2018      9.07    -1.22          0.821  4.00
 9 about    2019      1.37     8.27         -0.700 -1.05
10 absolute 2017      1.39     2.49          0.390  2.40
# … with 20 more rows

【讨论】:

  • 谢谢!这确实是一个非常优雅的解决方案。
  • 好吧,使用 purrr::map2_dfc() 很简洁,因为它允许使用 2 个列表(lstheader),这解决了您直接使用 header 时遇到的问题 + 直接使用 @987654328 @ 结果。像我最后所做的那样使用select() 手动选择列远非优雅。可能有更好的方法来对重复的列进行排序。也许用left_join() 而不是bind_cols()。但是在这种情况下使用起来并不是超级容易
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 2020-10-07
  • 2018-05-09
  • 1970-01-01
  • 2021-09-07
  • 1970-01-01
  • 2021-11-30
  • 2019-01-14
相关资源
最近更新 更多