【问题标题】:difference between two comma separated strings两个逗号分隔的字符串之间的区别
【发布时间】:2018-12-24 07:14:00
【问题描述】:

我有以下yy

    fundId  Year    Qtr   StockCurrentQtr    StockNextQtr
    1       2015    1     1,2,3,4,5         2,3,4,51
    1       2015    2     2,3,4,51          7,8,9,4,2
    1       2015    3     7,8,9,4,2         NA
    2       2015    1     10,11,14          14,16,19
    2       2015    2     14,16,19          20,21,45
    2       2015    3     20,21,45          NA

我想知道StockNextQtrStocCurrentQtr 每行group_by fundId 之间的区别或者'StockCurrentQtr' 列的连续行之间的区别group_byfundId

yy <- yy %>% 
       group_by(fundId) %>% 
       mutate(StockDiff = apply(yy,2,function(x){
                    paste(setdiff(unlist(strsplit(x[5], split = ",")), unlist(strsplit(x[4], 
                                                            split = ","))),collapse = ",")}))

我收到以下错误:

StockDiff 列的长度必须为 3(组大小)或 1,而不是 5

【问题讨论】:

  • 你能用你的预期输出更新你的帖子吗?
  • 不要像这样以非标准化的 CSV 格式存储您的数据。它通常会给您带来麻烦。

标签: dataframe r dataframe dplyr tidyverse


【解决方案1】:

您不必在此处使用apply。就rowwise,即

library(dplyr)

df %>% 
 mutate_at(vars(4:5), funs(strsplit(., ','))) %>% 
 rowwise() %>% 
 mutate(new = toString(setdiff(StocCurrentQtr, StockNextQtr)))

给出,

Source: local data frame [6 x 6]
Groups: <by row>

# A tibble: 6 x 6
  fundId  Year   Qtr StocCurrentQtr StockNextQtr new          
   <int> <int> <int> <list>         <list>       <chr>        
1      1  2015     1 <chr [5]>      <chr [4]>    1, 5         
2      1  2015     2 <chr [4]>      <chr [5]>    3, 51        
3      1  2015     3 <chr [5]>      <chr [1]>    7, 8, 9, 4, 2
4      2  2015     1 <chr [3]>      <chr [3]>    10, 11       
5      2  2015     2 <chr [3]>      <chr [3]>    14, 16, 19   
6      2  2015     3 <chr [3]>      <chr [1]>    20, 21, 45

基础 R 中的等价物,

mapply(function(x, y)toString(setdiff(x, y)), strsplit(df$StocCurrentQtr, ','), 
                                              strsplit(df$StockNextQtr, ','))

#[1] "1, 5"          "3, 51"         "7, 8, 9, 4, 2" "10, 11"        "14, 16, 19"    "20, 21, 45"

如果缺少StockNextQtr,我们可以先创建它,并以与之前相同的方式继续,即

df %>% 
 group_by(fundId) %>% 
 mutate(StockNextQtr = lead(StocCurrentQtr)) %>% 
 mutate_at(vars(4:5), funs(strsplit(., ','))) %>% 
 rowwise() %>% 
 mutate(new = toString(setdiff(StocCurrentQtr, StockNextQtr)))

【讨论】:

  • 非常感谢您的回答。如果“StockNextQtr”列不存在,如何使用 FundId 上的 group_by 查找连续行之间的差异?
【解决方案2】:

我找到了另一种方式

yy <- yy %>% group_by(fundId, Year, Qtr) %>% mutate(new = paste(setdiff((unlist(strsplit(StockCurrentQtr,split = ","))), unlist(strsplit(StockNextQtr,split = ","))),collapse = ","))

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2012-08-07
    • 1970-01-01
    • 2011-11-12
    • 2019-04-05
    • 2015-02-18
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多