【问题标题】:Sequencing R, Time Series Data测序 R,时间序列数据
【发布时间】:2025-12-07 05:20:05
【问题描述】:

我希望在我当前的数据框中添加一个新列,它会根据足球比赛中的一系列事件添加一个新的序列号。

这是我当前的数据框

 head(test_P)
 index        team.name      possession_team.name  minute second period possession     type.name
1      5      Cardiff City         Cardiff City      0      0      1          2          Pass
2      6      Cardiff City         Cardiff City      0      2      1          2 Ball Receipt*
3      7      Cardiff City         Cardiff City      0      2      1          2         Carry
4      8      Cardiff City         Cardiff City      0      3      1          2          Pass
5      9      Cardiff City         Cardiff City      0      6      1          2 Ball Receipt*
6     10 Preston North End         Cardiff City      0      6      1          2          Duel
7     11 Preston North End         Cardiff City      0      6      1          2          Pass
8     12 Preston North End         Cardiff City      0      8      1          2 Miscontrol
9     13      Cardiff City         Cardiff City      0      8      1          2          Pass
10    14      Cardiff City         Cardiff City      0      9      1          2 Ball Receipt*
11    15      Cardiff City         Cardiff City      0      9      1          2         Cross
12    16 Preston North End         Cardiff City      0     10      1          2 Clearance
13    17      Cardiff City         Cardiff City      0     11      1          2          Pass
14    18      Cardiff City         Cardiff City      0     13      1          2 Ball Receipt*
15    19 Preston North End    Preston North End      0     13      1          3 Ball Recovery
16    20 Preston North End    Preston North End      0     13      1          3         Carry
17    21 Preston North End    Preston North End      0     21      1          3          Pass
18    22 Preston North End    Preston North End      0     22      1          3 Ball Receipt*.   

但是,我想在拥有后添加一个名为 sequence 的附加列名称,用于标记拥有的序列号。

每一个新的拥有都应该从序列值 1 开始

但是如果对手用一个/多个事件打破了这个序列并且控球值仍然相同,那么下一次控球队触球时,它应该是一个新的序列号,例如 2 或者如果多次打破 3,4 等

对立事件应该使用与他们打破的事件相同的序列号

例如下面的数据

   index        team.name      possession_team.name  minute second period possession type.name sequence
1      5      Cardiff City         Cardiff City      0      0      1          2          Pass         1
2      6      Cardiff City         Cardiff City      0      2      1          2 Ball Receipt          1
3      7      Cardiff City         Cardiff City      0      2      1          2         Carry         1
4      8      Cardiff City         Cardiff City      0      3      1          2          Pass         1
5      9      Cardiff City         Cardiff City      0      6      1          2 Ball Receipt*         1
6     10 Preston North End         Cardiff City      0      6      1          2          Duel         1
7     11 Preston North End         Cardiff City      0      6      1          2          Pass         1
8     12 Preston North End         Cardiff City      0      8      1          2 Miscontrol            1
9     13      Cardiff City         Cardiff City      0      8      1          2          Pass         2
10    14      Cardiff City         Cardiff City      0      9      1          2 Ball Receipt          2
11    15      Cardiff City         Cardiff City      0      9      1          2         Cross         2
12    16 Preston North End         Cardiff City      0     10      1          2 Clearance             2
13    17      Cardiff City         Cardiff City      0     11      1          2          Pass         3
14    18      Cardiff City         Cardiff City      0     13      1          2 Ball Receipt          3
15    19 Preston North End    Preston North End      0     13      1          3 Ball Recovery         1
16    20 Preston North End    Preston North End      0     13      1          3         Carry         1
17    21 Preston North End    Preston North End      0     21      1          3          Pass         1
18    22 Preston North End    Preston North End      0     22      1          3 Ball Receipt          1

我尝试过结合 ifelse 语句的超前和滞后函数,但似乎无法让数据正常工作

     test <- test  %>% mutate(P = ifelse(dplyr::lag(team.id)!=team.id & dplyr::lag(possession) == possession, dplyr::lag(seq_id) + 1,
                                                      ifelse(dplyr::lead(team.id)!=team.id & dplyr::lead(possession)!=possession , seq_id, 1))) 

任何帮助将不胜感激,并对这个问题的不整洁表示歉意

【问题讨论】:

标签: r time-series lag dplyr lead


【解决方案1】:

以下内容感觉很hacky,但可能会完成这项工作。

逻辑如下:

  • 生成一个 flip 变量,每次 team.name “翻转”时为 1/2,否则为 0。
  • 生成cum_sum_flip,即flip 上的累积和。添加 1,使其从 1 而不是 0 开始。
  • 通过从cum_sum_flip 中获取floor() 来生成sequence,这样在每第二次翻转时,序列都会增加。

注意事项:

  • 为了便于理解,我把中间变量留了下来,大家可以巩固一下。
  • 根据您的数据结构,您可能需要按 match 或其他方式进行分组,以确保当全新匹配开始时,它会再次从 0 开始计数。
  • 此解决方案不是很健壮,并且对数据结构有一些假设。请检查边缘情况。
library(dplyr)

test_P %>% 
  mutate(flip = (lag(team.name) != team.name) %>% replace_na(0) * 1/2,
         .after = possession
  ) %>% group_by(possession) %>% 
  mutate(cum_sum_flip = cumsum(flip)+1, 
         sequence = floor(cum_sum_flip),
         .after = possession
  ) 

结果:

# A tibble: 18 x 11
# Groups:   possession [2]
   index team.name         possession_team.name minute second period possession cum_sum_flip sequence  flip type.name    
   <dbl> <chr>             <chr>                 <dbl>  <dbl>  <dbl>      <dbl>        <dbl>    <dbl> <dbl> <chr>        
 1     5 Cardiff City      Cardiff City              0      0      1          2          1          1   0   Pass         
 2     6 Cardiff City      Cardiff City              0      2      1          2          1          1   0   Ball Receipt*
 3     7 Cardiff City      Cardiff City              0      2      1          2          1          1   0   Carry        
 4     8 Cardiff City      Cardiff City              0      3      1          2          1          1   0   Pass         
 5     9 Cardiff City      Cardiff City              0      6      1          2          1          1   0   Ball Receipt*
 6    10 Preston North End Cardiff City              0      6      1          2          1.5        1   0.5 Duel         
 7    11 Preston North End Cardiff City              0      6      1          2          1.5        1   0   Pass         
 8    12 Preston North End Cardiff City              0      8      1          2          1.5        1   0   Miscontrol   
 9    13 Cardiff City      Cardiff City              0      8      1          2          2          2   0.5 Pass         
10    14 Cardiff City      Cardiff City              0      9      1          2          2          2   0   Ball Receipt*
11    15 Cardiff City      Cardiff City              0      9      1          2          2          2   0   Cross        
12    16 Preston North End Cardiff City              0     10      1          2          2.5        2   0.5 Clearance    
13    17 Cardiff City      Cardiff City              0     11      1          2          3          3   0.5 Pass         
14    18 Cardiff City      Cardiff City              0     13      1          2          3          3   0   Ball Receipt*
15    19 Preston North End Preston North End         0     13      1          3          1.5        1   0.5 Ball Recovery
16    20 Preston North End Preston North End         0     13      1          3          1.5        1   0   Carry        
17    21 Preston North End Preston North End         0     21      1          3          1.5        1   0   Pass         
18    22 Preston North End Preston North End         0     22      1          3          1.5        1   0   Ball Receipt*

数据

test_P <- tribble(
~index, ~team.name, ~possession_team.name, ~minute, ~second, ~period, ~possession, ~type.name, 
5 ,      "Cardiff City",  "Cardiff City",       0,        0,       1,           2,  "Pass",
6 ,      "Cardiff City",  "Cardiff City",       0,        2,       1,           2,  "Ball Receipt*",
7 ,      "Cardiff City",  "Cardiff City",       0,        2,       1,           2,  "Carry",
8 ,      "Cardiff City",  "Cardiff City",       0,        3,       1,           2,  "Pass",
9 ,      "Cardiff City",  "Cardiff City",       0,        6,       1,           2,  "Ball Receipt*",
10,  "Preston North End", "Cardiff City",       0,        6,       1,           2,  "Duel",
11,  "Preston North End", "Cardiff City",       0,        6,       1,           2,  "Pass",
12,  "Preston North End", "Cardiff City",       0,        8,       1,           2,  "Miscontrol",
13,       "Cardiff City", "Cardiff City",       0,        8,       1,           2,  "Pass",
14,       "Cardiff City", "Cardiff City",       0,        9,       1,           2,  "Ball Receipt*",
15,       "Cardiff City", "Cardiff City",       0,        9,       1,           2,  "Cross",
16,  "Preston North End", "Cardiff City",       0,       10,       1,           2,  "Clearance",
17,       "Cardiff City", "Cardiff City",       0,       11,       1,           2,  "Pass",
18,       "Cardiff City", "Cardiff City",       0,       13,       1,           2,  "Ball Receipt*",
19,  "Preston North End", "Preston North End",  0,       13,       1,           3,  "Ball Recovery",
20,  "Preston North End", "Preston North End",  0,       13,       1,           3,  "Carry",
21,  "Preston North End", "Preston North End",  0,       21,       1,           3,  "Pass",
22,  "Preston North End", "Preston North End",  0,       22,       1,           3,  "Ball Receipt*")

【讨论】:

  • 感谢 Marcelo,这解决了我遇到的问题。我会检查错误,但到目前为止没有发现任何问题。