【问题标题】:Delete rows from a data table depending from the values on the previous one根据前一个值从数据表中删除行
【发布时间】:2021-04-29 18:38:43
【问题描述】:

我有一个类似于以下的数据框,按日期组织:

|Symbol |   Date    | volume |price |
|------------------------------------
|A      |2014-01-01 | 1      |   5  |
|A      |2014-01-02 | 3      |   8  |
|A      |2014-01-03 | 7      |   4  |
|A      |2014-01-05 | 0      |  4   |
|A      |2014-01-06 |0       |   4  |
|A      |2014-01-07 |3       |   6  |
|A      |2014-01-08 |34      |   7  |
|A      |2014-01-09 |45      |  34  |
|A      |2014-01-10 |4       |   5  | 
|A      |2014-01-11 |9       |   7  |
|A      |2014-01-12 |0       |   7  |
|A      |2014-01-13 |0       |   7  | 
|A      |2014-01-14 |8       |   6  |
|A      |2014-01-15 |4       |   4  |
|A      |2014-01-16 |0       |   7  |
|A      |2014-01-17 |4       |   7  |

我需要删除同时满足该 volume=0 且 price 列中的值与前一行的值完全相同的行。获取如下数据框:

|Symbol |   Date    | volume |price |
|------------------------------------
|A      |2014-01-01 | 1      |   5  |
|A      |2014-01-02 | 3      |   8  |
|A      |2014-01-03 | 7      |   4  |
|A      |2014-01-07 |3       |   6  |
|A      |2014-01-08 |34      |   7  |
|A      |2014-01-09 |45      |  34  |
|A      |2014-01-10 |4       |   5  | 
|A      |2014-01-11 |9       |   7  |
|A      |2014-01-14 |8       |   6  |
|A      |2014-01-15 |4       |   4  |
|A      |2014-01-16 |0       |   7  |
|A      |2014-01-17 |4       |   7  |

我想这应该用 for 循环来完成,但我真的不知道该怎么做。我对 R 还是很陌生。希望你能帮助我。

【问题讨论】:

    标签: r dataframe for-loop data-cleaning delete-row


    【解决方案1】:

    在基数 R 中(这将保留行号):

    df[(df$Volume != 0 | c(0, diff(df$Price) != 0)),] 
    
       Symbol      Date Volume Price
    1       A  1/1/2014      1     5
    2       A  1/2/2014      3     8
    3       A  1/3/2014      7     4
    6       A  1/7/2014      3     6
    7       A  1/8/2014     34     7
    8       A  1/9/2014     45    34
    9       A 1/10/2014      4     5
    10      A 1/11/2014      9     7
    13      A 1/14/2014      8     6
    14      A 1/15/2014      4     4
    15      A 1/16/2014      0     7
    16      A 1/17/2014      4     7
    

    从库dplyr,您可以使用filterlag

    library(dplyr)
    
    dplyr::filter(df, Volume != 0 | Price != dplyr::lag(Price))
    
       Symbol      Date Volume Price
    1       A  1/1/2014      1     5
    2       A  1/2/2014      3     8
    3       A  1/3/2014      7     4
    4       A  1/7/2014      3     6
    5       A  1/8/2014     34     7
    6       A  1/9/2014     45    34
    7       A 1/10/2014      4     5
    8       A 1/11/2014      9     7
    9       A 1/14/2014      8     6
    10      A 1/15/2014      4     4
    11      A 1/16/2014      0     7
    12      A 1/17/2014      4     7
    

    【讨论】:

      【解决方案2】:

      subset 的基本 R 选项

      > subset(df,!(volume==0 & c(TRUE,diff(price)==0)))
         Symbol       Date volume price
      1       A 2014-01-01      1     5
      2       A 2014-01-02      3     8
      3       A 2014-01-03      7     4
      6       A 2014-01-07      3     6
      7       A 2014-01-08     34     7
      8       A 2014-01-09     45    34
      9       A 2014-01-10      4     5
      10      A 2014-01-11      9     7
      13      A 2014-01-14      8     6
      14      A 2014-01-15      4     4
      15      A 2014-01-16      0     7
      16      A 2014-01-17      4     7
      

      data.table 选项

      > setDT(df)[!(volume==0 & c(TRUE,diff(price)==0))]
          Symbol       Date volume price
       1:      A 2014-01-01      1     5
       2:      A 2014-01-02      3     8
       3:      A 2014-01-03      7     4
       4:      A 2014-01-07      3     6
       5:      A 2014-01-08     34     7
       6:      A 2014-01-09     45    34
       7:      A 2014-01-10      4     5
       8:      A 2014-01-11      9     7
       9:      A 2014-01-14      8     6
      10:      A 2014-01-15      4     4
      11:      A 2014-01-16      0     7
      12:      A 2014-01-17      4     7
      

      数据

      > dput(df)
      structure(list(Symbol = c("A", "A", "A", "A", "A", "A", "A",
      "A", "A", "A", "A", "A", "A", "A", "A", "A"), Date = c("2014-01-01",
      "2014-01-02", "2014-01-03", "2014-01-05", "2014-01-06", "2014-01-07",
      "2014-01-08", "2014-01-09", "2014-01-10", "2014-01-11", "2014-01-12",
      "2014-01-13", "2014-01-14", "2014-01-15", "2014-01-16", "2014-01-17"
      ), volume = c(1L, 3L, 7L, 0L, 0L, 3L, 34L, 45L, 4L, 9L, 0L, 0L,
      8L, 4L, 0L, 4L), price = c(5L, 8L, 4L, 4L, 4L, 6L, 7L, 34L, 5L,
      7L, 7L, 7L, 6L, 4L, 7L, 7L)), class = "data.frame", row.names = c(NA,
      -16L))
      

      【讨论】:

      • 谢谢!这正是我想要的。
      猜你喜欢
      • 1970-01-01
      • 2021-04-11
      • 1970-01-01
      • 1970-01-01
      • 2016-05-01
      • 2017-01-16
      • 1970-01-01
      • 2017-12-06
      • 2021-06-23
      相关资源
      最近更新 更多