【问题标题】:How to create new column from existing column value in R如何从R中的现有列值创建新列
【发布时间】:2021-05-01 00:04:51
【问题描述】:
This is what the sample looks like: 

                     vehicle_id     time trip_id location_lat location_lon
                          <chr>    <chr>   <int>        <dbl>        <dbl>
 1: Zz3yE90z++QmTX2QO5dHI78IK/Q 00:05:24       1     13.67530     100.6345
 2: Zz3yE90z++QmTX2QO5dHI78IK/Q 00:06:14       1     13.67534     100.6359
 3: Zz3yE90z++QmTX2QO5dHI78IK/Q 00:08:14       1     13.67805     100.6307
 4: Zz3yE90z++QmTX2QO5dHI78IK/Q 00:09:14       1     13.67829     100.6239
 5: Zz3yE90z++QmTX2QO5dHI78IK/Q 00:15:14       1     13.66856     100.6324
 6: Zz3yE90z++QmTX2QO5dHI78IK/Q 00:18:14       1     13.66252     100.6599
 7: Zz3yE90z++QmTX2QO5dHI78IK/Q 00:20:14       1     13.65382     100.6756
 8: /+bx80f3gOoPMoFBsS+3xX6jpi8 22:37:30   44498     13.91795     100.6832
 9: /+bx80f3gOoPMoFBsS+3xX6jpi8 22:38:30   44498     13.91173     100.6766
10: /+bx80f3gOoPMoFBsS+3xX6jpi8 22:40:30   44498     13.90366     100.6679

 my.df <- data.table(structure(list(vehicle_id = c("Zz3yE90z++QmTX2QO5dHI78IK/Q","Zz3yE90z++QmTX2QO5dHI78IK/Q", "Zz3yE90z++QmTX2QO5dHI78IK/Q","Zz3yE90z++QmTX2QO5dHI78IK/Q", "Zz3yE90z++QmTX2QO5dHI78IK/Q","Zz3yE90z++QmTX2QO5dHI78IK/Q", "Zz3yE90z++QmTX2QO5dHI78IK/Q","/+bx80f3gOoPMoFBsS+3xX6jpi8", "/+bx80f3gOoPMoFBsS+3xX6jpi8","/+bx80f3gOoPMoFBsS+3xX6jpi8"), time = c("00:05:24", "00:06:14","00:08:14", "00:09:14", "00:15:14", "00:18:14", "00:20:14", "22:37:30","22:38:30", "22:40:30"), trip_id = c(1L, 1L, 1L, 1L, 1L, 1L,1L, 44498L, 44498L, 44498L), location_lat = c(13.6753, 13.67534,13.67805, 13.67829, 13.66856, 13.66252, 13.65382, 13.91795, 13.91173,13.90366), location_lon = c(100.63453, 100.63586, 100.63067,100.62387, 100.63235, 100.65986, 100.67562, 100.68322, 100.67663,100.66788)), row.names = c(NA, -10L), class = c("data.table","data.frame"), .internal.selfref = <pointer: 0x000002cef90c1ef0>)) 

所以,我想创建一个新结果,其中包含每个 trip_id 列的第一行和最后一行。 我的结果应该是这样的。

# A tibble: 10 x 9
   trip_id start_location_Long start_location_Lat end_location_Long end_location_Lat start_time end_time from_vehicle_id               
     <int>               <dbl>              <dbl>             <dbl>            <dbl> <chr>      <chr>    <chr>                        
        1                101.               13.7              101.             13.7 00:05:24   00:41:14 Zz3yE90z++QmTX2QO5dHI78IK/Q 
    44498                101.               13.9              101.             13.9 22:37:30   22:40:30 /+bx80f3gOoPMoFBsS+3xX6jpi8 

类似的东西。提前谢谢你。

【问题讨论】:

    标签: r dplyr


    【解决方案1】:

    另一种方法是:

    my.df %>%
      group_by(trip_id, vehicle_id)%>%
      arrange(time)%>% 
      slice(c(1, n()))%>%
      mutate(row=c("start", "end")[row_number()]) %>%
      pivot_wider( values_from = -c(vehicle_id, trip_id), names_from=row)
    
      # A tibble: 2 x 10
    # Groups:   trip_id, vehicle_id [2]
      vehicle_id                  trip_id time_start time_end location_lat_start location_lat_end location_lon_start location_lon_end row_start row_end
      <chr>                         <int> <chr>      <chr>                 <dbl>            <dbl>              <dbl>            <dbl> <chr>     <chr>  
    1 Zz3yE90z++QmTX2QO5dHI78IK/Q       1 00:05:24   00:20:14               13.7             13.7               101.             101. start     end    
    2 /+bx80f3gOoPMoFBsS+3xX6jpi8   44498 22:37:30   22:40:30               13.9             13.9               101.             101. start     end 
    

    【讨论】:

      【解决方案2】:

      我们也可以在 data.table 中做到这一点,首先获取第一个和最后一个元素,然后重新整形:

      new_df <- my.df[ my.df[order(trip_id), .I[c(1L,.N)], by=trip_id]$V1 ]
      new_ df <- new_df[, list(start_location_Long = location_lon[1], start_location_Lat = location_lat[1], end_location_Long = location_lon[2], end_location_Lat = location_lat[2], start_time = time[1], end_time = time[2], from_vehicle_id = vehicle_id[1]), by = trip_id ]
      
      > new_df
         trip_id start_location_Long start_location_Lat end_location_Long end_location_Lat start_time end_time             from_vehicle_id
      1:       1            100.6345           13.67530          100.6756         13.65382   00:05:24 00:20:14 Zz3yE90z++QmTX2QO5dHI78IK/Q
      2:   44498            100.6832           13.91795          100.6679         13.90366   22:37:30 22:40:30 /+bx80f3gOoPMoFBsS+3xX6jpi8
      
      

      【讨论】:

        【解决方案3】:

        使用库“dplyr”,您可以尝试以下操作:

        library(dplyr)
        
        my.df <- my.df %>% group_by(vehicle_id, time) %>%
          summarise(
            end_time = last(time),
            end_location_lat = last(location_lat)
            end_location_lon = last(location_lon)) %>%
          as.data.frame()
        

        【讨论】:

          猜你喜欢
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 2019-12-27
          • 2021-01-02
          • 2022-07-13
          • 1970-01-01
          • 1970-01-01
          • 2021-04-19
          相关资源
          最近更新 更多