【问题标题】:pass other column inside spread with duplicate identifiers使用重复的标识符传递内部的其他列
【发布时间】:2023-08-03 03:04:01
【问题描述】:

我有下面的数据框,我试图通过传递actv_amtspreadfeature_code,以便获得对应feature 代码的对应actv_amt。我正在尝试以count_FEATURE = ACTV_AMT 的身份传递,它正在传递值但不合并数据。

作为参考,我之前问过一个问题 take unique count and sum each unique values in R

Input type: 1
ST_DATE ND_DATE LO_NO   ACTV_CODE   ACTV_AMT    AB_NO   FEATURE_CODE    L_NU    
7/27/16 7/27/16 265       O          15          1      INTEREST        855          
7/27/16 7/27/16 265       O          14          1      INTEREST        855 

getting Output
ST_DATE ND_DATE LO_NO   ACTV_CODE   ACTV_AMT    AB_NO   FEATURE_INTEREST     L_NU   
7/27/16 7/27/16 265      O           29          1             2             855

Expected output:
ST_DATE ND_DATE LO_NO   ACTV_CODE   ACTV_AMT    AB_NO   FEATURE_INTEREST     L_NU   
7/27/16 7/27/16 265      O           29          1             29             855

输入类型 2:

Input
ST_DATE ND_DATE LO_NO   ACTV_CODE   ACTV_AMT    AB_NO   FEATURE_CODE    L_NU    
7/27/16 7/27/16 265            O          15       1     INTEREST        855          
7/27/16 7/27/16 265            O          14       1     INSTALLMENT   855    

Getting output:
ST_DATE ND_DATE LO_NO   ACTV_CODE   ACTV_AMT    AB_NO   INTEREST INSTALLMENT     L_NU   
7/Expected7/16 265          O           29           1      1          1           855 

Expected output:
ST_DATE ND_DATE LO_NO   ACTV_CODE   ACTV_AMT    AB_NO   INTEREST INSTALLMENT     L_NU   
7/27/16 7/27/16 265        O           29           1      15         14           855 

实现的代码:

dt %>%
  group_by(AB_NO,LO_NO,L_NU)%>% 
  mutate(ACTV_AMT = sum(ACTV_AMT),
         ST_DATE = min(ST_DATE),
         ND_DATE = max(ND_DATE)) %>%
  ungroup() %>%
  mutate(id = row_number(),
         FEATURE_CODE = paste0("FTR_", FEATURE_CODE),
         ACTV_CODE = paste0("ACTV_", ACTV_CODE),
         count_FEATURE = 1,
         count_ACTV = 1) %>%
  spread(FEATURE_CODE, count_FEATURE) %>%
  spread(ACTV_CODE, count_ACTV) %>%
  select(-id) %>%
  group_by(ST_DATE, ND_DATE, LO_NO, ACTV_AMT, AB_NO, L_NU) %>%
  summarise_all(sum, na.rm=T) %>%
  ungroup()

谁能帮我获得预期的输出。

【问题讨论】:

  • @Hardikgupta dput 是什么意思
  • @Hardikgupta 我在上面分享了我的数据或参考链接你会很容易得到它

标签: r dplyr tidyverse spread


【解决方案1】:

你可以这样试试

library(reshape2)

df <- read.table(text = "ST_DATE ND_DATE LO_NO   ACTV_CODE   ACTV_AMT    AB_NO   FEATURE_CODE    L_NU    
7/27/16 7/27/16 265       O          15          1      INTEREST        855          
7/27/16 7/27/16 265       O          14          1      INTEREST        855", header = T)

dcast(df, ST_DATE+ND_DATE+LO_NO+ACTV_CODE+AB_NO+L_NU~FEATURE_CODE, value.var = "ACTV_AMT", fun.aggregate = sum)

output:
-------
  ST_DATE ND_DATE LO_NO ACTV_CODE AB_NO L_NU INTEREST
1 7/27/16 7/27/16   265         O     1  855       29

input2:
-------
df <- read.table(text = "ST_DATE ND_DATE LO_NO   ACTV_CODE   ACTV_AMT    AB_NO   FEATURE_CODE    L_NU    
7/27/16 7/27/16 265            O          15       1     INTEREST        855          
7/27/16 7/27/16 265            O          14       1     INSTALLMENT   855", header = T)

dcast(df, ST_DATE+ND_DATE+LO_NO+ACTV_CODE+AB_NO+L_NU~FEATURE_CODE, value.var = "ACTV_AMT", fun.aggregate = sum)

output:
-------
  ST_DATE ND_DATE LO_NO ACTV_CODE AB_NO L_NU INSTALLMENT INTEREST
1 7/27/16 7/27/16   265         O     1  855          14       15

【讨论】:

  • 但我需要考虑最大最小日期
  • 我正在考虑日期的最小值和最大值我如何从上面的代码中考虑
  • 我没听懂你在说什么