如何从R中的现有列值创建新列答案

【问题标题】：How to create new column from existing column value in R如何从R中的现有列值创建新列
【发布时间】：2021-05-01 00:04:51
【问题描述】：

This is what the sample looks like: 

                     vehicle_id     time trip_id location_lat location_lon
                          <chr>    <chr>   <int>        <dbl>        <dbl>
 1: Zz3yE90z++QmTX2QO5dHI78IK/Q 00:05:24       1     13.67530     100.6345
 2: Zz3yE90z++QmTX2QO5dHI78IK/Q 00:06:14       1     13.67534     100.6359
 3: Zz3yE90z++QmTX2QO5dHI78IK/Q 00:08:14       1     13.67805     100.6307
 4: Zz3yE90z++QmTX2QO5dHI78IK/Q 00:09:14       1     13.67829     100.6239
 5: Zz3yE90z++QmTX2QO5dHI78IK/Q 00:15:14       1     13.66856     100.6324
 6: Zz3yE90z++QmTX2QO5dHI78IK/Q 00:18:14       1     13.66252     100.6599
 7: Zz3yE90z++QmTX2QO5dHI78IK/Q 00:20:14       1     13.65382     100.6756
 8: /+bx80f3gOoPMoFBsS+3xX6jpi8 22:37:30   44498     13.91795     100.6832
 9: /+bx80f3gOoPMoFBsS+3xX6jpi8 22:38:30   44498     13.91173     100.6766
10: /+bx80f3gOoPMoFBsS+3xX6jpi8 22:40:30   44498     13.90366     100.6679

 my.df <- data.table(structure(list(vehicle_id = c("Zz3yE90z++QmTX2QO5dHI78IK/Q","Zz3yE90z++QmTX2QO5dHI78IK/Q", "Zz3yE90z++QmTX2QO5dHI78IK/Q","Zz3yE90z++QmTX2QO5dHI78IK/Q", "Zz3yE90z++QmTX2QO5dHI78IK/Q","Zz3yE90z++QmTX2QO5dHI78IK/Q", "Zz3yE90z++QmTX2QO5dHI78IK/Q","/+bx80f3gOoPMoFBsS+3xX6jpi8", "/+bx80f3gOoPMoFBsS+3xX6jpi8","/+bx80f3gOoPMoFBsS+3xX6jpi8"), time = c("00:05:24", "00:06:14","00:08:14", "00:09:14", "00:15:14", "00:18:14", "00:20:14", "22:37:30","22:38:30", "22:40:30"), trip_id = c(1L, 1L, 1L, 1L, 1L, 1L,1L, 44498L, 44498L, 44498L), location_lat = c(13.6753, 13.67534,13.67805, 13.67829, 13.66856, 13.66252, 13.65382, 13.91795, 13.91173,13.90366), location_lon = c(100.63453, 100.63586, 100.63067,100.62387, 100.63235, 100.65986, 100.67562, 100.68322, 100.67663,100.66788)), row.names = c(NA, -10L), class = c("data.table","data.frame"), .internal.selfref = <pointer: 0x000002cef90c1ef0>))

所以，我想创建一个新结果，其中包含每个 trip_id 列的第一行和最后一行。我的结果应该是这样的。

# A tibble: 10 x 9
   trip_id start_location_Long start_location_Lat end_location_Long end_location_Lat start_time end_time from_vehicle_id               
     <int>               <dbl>              <dbl>             <dbl>            <dbl> <chr>      <chr>    <chr>                        
        1                101.               13.7              101.             13.7 00:05:24   00:41:14 Zz3yE90z++QmTX2QO5dHI78IK/Q 
    44498                101.               13.9              101.             13.9 22:37:30   22:40:30 /+bx80f3gOoPMoFBsS+3xX6jpi8

类似的东西。提前谢谢你。

【问题讨论】：

标签： r dplyr

【解决方案1】：

另一种方法是：

my.df %>%
  group_by(trip_id, vehicle_id)%>%
  arrange(time)%>% 
  slice(c(1, n()))%>%
  mutate(row=c("start", "end")[row_number()]) %>%
  pivot_wider( values_from = -c(vehicle_id, trip_id), names_from=row)

  # A tibble: 2 x 10
# Groups:   trip_id, vehicle_id [2]
  vehicle_id                  trip_id time_start time_end location_lat_start location_lat_end location_lon_start location_lon_end row_start row_end
  <chr>                         <int> <chr>      <chr>                 <dbl>            <dbl>              <dbl>            <dbl> <chr>     <chr>  
1 Zz3yE90z++QmTX2QO5dHI78IK/Q       1 00:05:24   00:20:14               13.7             13.7               101.             101. start     end    
2 /+bx80f3gOoPMoFBsS+3xX6jpi8   44498 22:37:30   22:40:30               13.9             13.9               101.             101. start     end

【讨论】：

【解决方案2】：

我们也可以在 data.table 中做到这一点，首先获取第一个和最后一个元素，然后重新整形：

new_df <- my.df[ my.df[order(trip_id), .I[c(1L,.N)], by=trip_id]$V1 ]
new_ df <- new_df[, list(start_location_Long = location_lon[1], start_location_Lat = location_lat[1], end_location_Long = location_lon[2], end_location_Lat = location_lat[2], start_time = time[1], end_time = time[2], from_vehicle_id = vehicle_id[1]), by = trip_id ]

> new_df
   trip_id start_location_Long start_location_Lat end_location_Long end_location_Lat start_time end_time             from_vehicle_id
1:       1            100.6345           13.67530          100.6756         13.65382   00:05:24 00:20:14 Zz3yE90z++QmTX2QO5dHI78IK/Q
2:   44498            100.6832           13.91795          100.6679         13.90366   22:37:30 22:40:30 /+bx80f3gOoPMoFBsS+3xX6jpi8

【讨论】：

【解决方案3】：

使用库“dplyr”，您可以尝试以下操作：

library(dplyr)

my.df <- my.df %>% group_by(vehicle_id, time) %>%
  summarise(
    end_time = last(time),
    end_location_lat = last(location_lat)
    end_location_lon = last(location_lon)) %>%
  as.data.frame()

【讨论】：