【问题标题】:how to remove the first row of some of the group's elements?如何删除某些组元素的第一行?
【发布时间】:2019-07-28 22:27:43
【问题描述】:

嗨,我有 3 列:家庭索引、每个家庭成员的数量、每个人的旅行次数和旅行的位置。我希望每个家庭中每个人的第一次旅行的位置都是家。这是一个例子:

  Household  person  trip     location
      1         1     1          home
      1         1     2          work
      1         1     3          home
      1         2     1          other
      1         2     2          home
      1         2     3          work
      2         1     1          school
      2         1     2          home
      2         1     3          shopping
      2         1     4          home

第一个家庭中第二个人的第一次旅行是其他人,所以我想删除这一行,我也想改变旅行列并从 1 开始。 第二个家庭有一个成员,第一次旅行是学校,所以我也想删除这一行并更改旅行列,所以我希望输出为:

  Household  person  trip     location
      1         1     1          home
      1         1     2          work
      1         1     3          home
      1         2     1          home
      1         2     2          work
      2         1     1          home
      2         1     2          shopping
      2         1     3          home

【问题讨论】:

  • 能否添加已经失败的示例代码?

标签: r dataframe


【解决方案1】:

使用dplyr 的一种方法是从group_by Householdpersonslice 行,从其中值为"home" 直到组结束。然后我们可以使用row_number 为每个组添加新的行程编号。这假设每个组都至少有一个 "home" 值。

library(dplyr)

df %>%
  group_by(Household, person) %>%
  slice(which.max(location == "home") : n()) %>%
  mutate(trip = row_number())

#  Household person  trip location
#      <int>  <int> <int> <fct>   
#1         1      1     1 home    
#2         1      1     2 work    
#3         1      1     3 home    
#4         1      2     1 home    
#5         1      2     2 work    
#6         2      1     1 home    
#7         2      1     2 shopping
#8         2      1     3 home    

【讨论】:

  • 如果每个人不在家的情况下,我想删除每个人的最后一次旅行怎么样?在这种情况下,我们不需要更改行程号码!
  • @sherek_66 在这种情况下,你可以做类似df %&gt;% group_by(Household, person) %&gt;% slice(if(location[n()] != "home") 1:(n() - 1) else 1:n())
  • 我发现你是数据框中最聪明的人,你有这个问题的解决方案吗? stackoverflow.com/questions/57245666/…
  • 你能告诉我在 R 中做这个问题是否不可能吗? stackoverflow.com/questions/57259022/…
【解决方案2】:

我们可以使用data.table 方法。将'data.frame'转换为'data.table'(setDT(df)),按'Household','person'分组,得到逻辑表达式的累计和,子集data.table(.SD

library(data.table)
setDT(df)[, .SD[cumsum(location == "home")> 0], .(Household, person)
         ][, trip := rowid(Household, person)]
#  Household person trip location
#1:         1      1    1     home
#2:         1      1    2     work
#3:         1      1    3     home
#4:         1      2    1     home
#5:         1      2    2     work
#6:         2      1    1     home
#7:         2      1    2 shopping
#8:         2      1    3     home

或者tidyverse也一样

library(dplyr)
df %>%
    group_by(Household, person) %>% 
    filter(cumsum(location == "home") > 0) %>%
    mutate(trip = row_number())
# A tibble: 8 x 4
# Groups:   Household, person [3]
#  Household person  trip location
#      <int>  <int> <int> <chr>   
#1         1      1     1 home    
#2         1      1     2 work    
#3         1      1     3 home    
#4         1      2     1 home    
#5         1      2     2 work    
#6         2      1     1 home    
#7         2      1     2 shopping
#8         2      1     3 home    

如果我们想删除 last 行程而不是 'home'

df %>%
    group_by(Household, person) %>%
    filter(row_number() != n()| last(location) == "home") 
# A tibble: 9 x 4
# Groups:   Household, person [3]
#  Household person  trip location
#      <int>  <int> <int> <chr>   
#1         1      1     1 home    
#2         1      1     2 work    
#3         1      1     3 home    
#4         1      2     1 other   
#5         1      2     2 home    
#6         2      1     1 school  
#7         2      1     2 home    
#8         2      1     3 shopping
#9         2      1     4 home 

数据

df <- structure(list(Household = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 
2L, 2L), person = c(1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L), 
    trip = c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 4L), location = c("home", 
    "work", "home", "other", "home", "work", "school", "home", 
    "shopping", "home")), class = "data.frame", row.names = c(NA, 
-10L))

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2020-02-13
    • 1970-01-01
    • 1970-01-01
    • 2021-02-12
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多