【问题标题】:Replicate columns using a dynamic name series based on the current year使用基于当前年份的动态名称系列复制列
【发布时间】:2018-06-07 13:04:51
【问题描述】:

我正在寻找:

  • 写一个函数或者
  • 使用 data.table 或
  • 使用 dplyr mutate_cond 或
  • 使用呼噜声地图功能

复制此功能:

If year = current
    columns(7,8,9) = column(6)
Else
    If year = current + 1
        columns(8,9,10) = column(7)
    Else
        If year = current + 2
            columns(9,10,11) = column(8)
        Else
            If year = current + 3
                columns(10,11,12) = column9)
            End If
        End If
    End If
End If

到目前为止,我已经能够使用以下不整洁的代码创建一个静态解决方案:

tbl.scholar1<-tbl.scholar1%>%mutate_cond(cohort == currentAY, ay_1819=ay_1718, ay_1920=ay_1718, ay_2021=ay_1718)
tbl.scholar1<-tbl.scholar1%>%mutate_cond(cohort == currentAY+1, ay_1920=ay_1819, ay_2021=ay_1819, ay_2122=ay_1819)
tbl.scholar1<-tbl.scholar1%>%mutate_cond(cohort == currentAY+2, ay_2021=ay_1920, ay_2122=ay_1920, ay_2223=ay_1920)
tbl.scholar1<-tbl.scholar1%>%mutate_cond(cohort == currentAY+3, ay_2122=ay_2021, ay_2223=ay_2021, ay_2324=ay_2021)

经过一番修改后,我编写了一个以当前年份和列名作为输入的函数:

tbl.scholar1<-dup.DF(tbl.scholar1, currentYR, "ay_1718", "ay_2324")

函数代码是这样的

dup.DF <- function(df1, currAY, name1, name2) {

  df1%>%mutate_cond(cohort == currAY, UQ(rlang::sym(name2)) :=  UQ(rlang::sym(name1)))              #This works!!!!

}

所以不知何故,我知道有一个更优雅的解决方案,使用 data.table、purrr:map 或 dplyr 将动态变量作为向量或列表接收,这样我就不必重复我的函数 n 次迭代带有 for 循环。

The input looks like this....
    SYSDATE     ID           name           cohort fundCode ay_1718 ay_1819 ay_1920  ay_2021  ay_2122  ay_2223  ay_2324  ay_2425
0005-11-20  000000000   "last0, first"       1718    316001    1         0     0         0        0         0        0       0
0005-11-20  000000000   "last0, first"       1718    316001    0         1     0         0        0         0        0       0
0005-11-20  000000000   "last0, first"       1718    316001    0         0     1         0        0         0        0       0
0005-11-20  000000000   "last0, first"       1718    316001    0         0     0         1        0         0        0       0

我的预期输出是……

    SYSDATE     ID           name           cohort fundCode ay_1718 ay_1819 ay_1920  ay_2021  ay_2122  ay_2223  ay_2324  ay_2425
0005-11-20  000000000   "last0, first"       1718    316001    1         1     1         1        0         0        0       0
0005-11-20  000000000   "last0, first"       1718    316001    0         1     1         1        1         0        0       0
0005-11-20  000000000   "last0, first"       1718    316001    0         0     1         1        1         1        0       0
0005-11-20  000000000   "last0, first"       1718    316001    0         0     0         1        1         1        1       0

【问题讨论】:

  • 请展示一个可重现的小例子和预期的输出
  • 输入看起来像这样...
  • 我还添加了预期的输出。我目前正在使用 mutate_cond 代码四次来生成输出,但我认为有更好的方法来完成此操作。
  • 您的示例数据似乎不包含与您的代码相关的任何内容。您是否错误地标记了列?

标签: r function dplyr data.table


【解决方案1】:

更新的答案 - 在得到 OP 对要求的澄清后,我不得不改变方法。

st_pos <- 6                         #concerned column's start position in the given dataframe
df_bkp <- df                        #data backup

#rename concerned columns as "ay_1718", "ay_1819" etc
names(df)[st_pos:ncol(df)] <- paste("ay", paste0(as.numeric(substr(min(df$year), 1, 2)) + 0:(ncol(df) - st_pos),
                                                 as.numeric(substr(min(df$year), 3, 4)) + 0:(ncol(df) - st_pos)), 
                                    sep="_")

#copy "year" column's value to the ensuing three columns
cols <- names(df)[st_pos:ncol(df)]  #renamed columns
mapply(function(x, y) 
  df[df$year == x & df$ID == y, which(grepl(x, cols)) + (st_pos-1):(st_pos+2)] <<- 
    df[df$year == x & df$ID == y, which(grepl(x, cols)) + (st_pos-1)],
  df$year, df$ID)

给了

> df
     SYSDATE ID         name year fundCode ay_1718 ay_1819 ay_1920 ay_2021 ay_2122 ay_2223 ay_2324 ay_2425
1 0005-11-20  0 last0, first 1718   316001     700     700     700     700       0       0       0       0
2 0005-11-20  1 last1, first 1819   316002       0      60      60      60      60       0       0       0
3 0005-11-20  2 last2, first 1920   316003       0       0      50      50      50      50       0       0
4 0005-11-20  3 last3, first 2021   316004       0       0       0     400     400     400     400       0


示例数据:(注意:出于说明目的,我将 Y1, Y2 etc 的值从 1 稍微修改为其他值)

df <- structure(list(SYSDATE = c("0005-11-20", "0005-11-20", "0005-11-20", 
"0005-11-20"), ID = 0:3, name = c("last0, first", "last1, first", 
"last2, first", "last3, first"), year = c(1718L, 1819L, 1920L, 
2021L), fundCode = 316001:316004, Y1 = c(700L, 0L, 0L, 0L), Y2 = c(0L, 
60L, 0L, 0L), Y3 = c(0L, 0L, 50L, 0L), Y4 = c(0L, 0L, 0L, 400L
), Y5 = c(0L, 0L, 0L, 0L), Y6 = c(0L, 0L, 0L, 0L), Y7 = c(0L, 
0L, 0L, 0L), Y8 = c(0L, 0L, 0L, 0L)), .Names = c("SYSDATE", "ID", 
"name", "year", "fundCode", "Y1", "Y2", "Y3", "Y4", "Y5", "Y6", 
"Y7", "Y8"), class = "data.frame", row.names = c(NA, -4L))

#     SYSDATE ID         name year fundCode  Y1 Y2 Y3  Y4 Y5 Y6 Y7 Y8
#1 0005-11-20  0 last0, first 1718   316001 700  0  0   0  0  0  0  0
#2 0005-11-20  1 last1, first 1819   316002   0 60  0   0  0  0  0  0
#3 0005-11-20  2 last2, first 1920   316003   0  0 50   0  0  0  0  0
#4 0005-11-20  3 last3, first 2021   316004   0  0  0 400  0  0  0  0

【讨论】:

  • 感谢 Prem 抽出宝贵时间。它很接近,但它附加了将“AY_##”列值替换为第一列的值。例如,如果群组 =1718 并且“ay_1718”中的值 = 700,那么“ay_1819”、“ay_1920”和“ay_2021”也应该等于 700。
  • 你可能会发现我是一个相对的 R 新手,所以我仍然有些磕磕绊绊。无论如何,我确实弄清楚了列附加的原因是因为 R 的大小写敏感。如果我替换语句 mutate(cohort_plus3 = paste(paste("AY" 使用 mutate(cohort_plus3 = paste(paste("ay" 将纠正附加问题。现在,我需要弄清楚如何将初始值传播到随后的三列。
  • 谢谢普雷姆!一个绝妙的优雅答案。向你致敬,非常感谢你的努力。我点击了upvote,但由于我没有信誉但它没有出现。 UT80
  • 很高兴它有帮助:)
【解决方案2】:

这里有两种替代方法,可以将数据从宽格式改造成长格式。

1。 melt() / dcast()

library(data.table)
long <- melt(setDT(inp)[, rn := .I], measure.vars = patterns("ay_"))
long[order(rn, variable), value := replace(value, which(value == 1L)[1L] + 1:3, 1L), by = rn]
dcast(long, rn + ... ~ variable)
   rn    SYSDATE ID         name cohort fundCode ay_1718 ay_1819 ay_1920 ay_2021 ay_2122 ay_2223 ay_2324 ay_2425
1:  1 0005-11-20  0 last0, first   1718   316001       1       1       1       1       0       0       0       0
2:  2 0005-11-20  0 last0, first   1718   316001       0       1       1       1       1       0       0       0
3:  3 0005-11-20  0 last0, first   1718   316001       0       0       1       1       1       1       0       0
4:  4 0005-11-20  0 last0, first   1718   316001       0       0       0       1       1       1       1       0

2。 gather() / spread()

library(tidyr)
library(dplyr)
inp %>% 
  group_by(rn = row_number()) %>% 
  gather(, , starts_with("ay_")) %>% 
  mutate(value = replace(value, which(value == 1L)[1L] + 1:3, 1L)) %>% 
  spread(key, value)
# A tibble: 4 x 14
# Groups:   rn [4]
  SYSDATE       ID name         cohort fundCode    rn ay_1718 ay_1819 ay_1920 ay_2021 ay_2122 ay_2223 ay_2324 ay_2425
  <chr>      <int> <chr>         <int>    <int> <int>   <int>   <int>   <int>   <int>   <int>   <int>   <int>   <int>
1 0005-11-20     0 last0, first   1718   316001     1       1       1       1       1       0       0       0       0
2 0005-11-20     0 last0, first   1718   316001     2       0       1       1       1       1       0       0       0
3 0005-11-20     0 last0, first   1718   316001     3       0       0       1       1       1       1       0       0
4 0005-11-20     0 last0, first   1718   316001     4       0       0       0       1       1       1       1       0

数据

inp <- fread('
SYSDATE     ID           name           cohort fundCode ay_1718 ay_1819 ay_1920  ay_2021  ay_2122  ay_2223  ay_2324  ay_2425
0005-11-20  000000000   "last0, first"       1718    316001    1         0     0         0        0         0        0       0
0005-11-20  000000000   "last0, first"       1718    316001    0         1     0         0        0         0        0       0
0005-11-20  000000000   "last0, first"       1718    316001    0         0     1         0        0         0        0       0
0005-11-20  000000000   "last0, first"       1718    316001    0         0     0         1        0         0        0       0')

【讨论】:

    猜你喜欢
    • 2021-09-28
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2019-01-15
    • 1970-01-01
    • 2013-05-01
    • 1970-01-01
    • 2015-06-24
    相关资源
    最近更新 更多