【问题标题】:Converting columns into rows in r [duplicate]将列转换为r中的行[重复]
【发布时间】:2023-12-27 05:56:01
【问题描述】:

我使用代码形成了以下数据

test <- data.frame(dis = c(10,20,30,40),dur=c(30,40,60,90),method=c("car","car","Bicycle","Bicycle"),to_lon=c(-1.980,-1.5678,-1.324,-1.456),to_lat=c(55.3009,55.3416,55.1123,55.2234),from_lon=c(-1.4565,-1.3424,-1.4566,-1.1111),from_lat=c(76.8888,65.8999,76.9088,25.3344))

 dis dur  method  to_lon  to_lat from_lon from_lat
1  10  30     car -1.9800 55.3009  -1.4565  76.8888
2  20  40     car -1.5678 55.3416  -1.3424  65.8999
3  30  60 Bicycle -1.3240 55.1123  -1.4566  76.9088
4  40  90 Bicycle -1.4560 55.2234  -1.1111  25.3344

我想转换这个数据框,使它有一行用于 to_lat 和 to_lon,而在下一行它有 from_lat 和 from_lon。其余细节无需更改,可以复制。期望的结果应该如下所示

    dis dur method  longitude   latitude
from    10  30  car -1.98   55.3009
to  10  30  car -1.4565 76.8888
from    20  40  car -1.5678 55.3416
to  20  40  car -1.3424 65.8999
from    30  60  Bicycle -1.324  55.1123
to  30  60  Bicycle -1.4566 76.9088
from    40  90  Bicycle -1.456  55.2234
to  40  90  Bicycle -1.1111 25.3344

任何帮助将不胜感激。

谢谢。

【问题讨论】:

  • 除了@akrun 的回答,请查看此页面以获取reshape2tidyr 解决方案(给定您的标签):cookbook-r.com/Manipulating_data/…
  • 好吧,我还没有找到任何有用的东西来转换上面提到的所需数据。有什么想法吗?

标签: r transform reshape2 tidyr


【解决方案1】:

我们可以使用data.table 中的melt,它可以采用多个measure 列。

library(data.table)
dM <- melt(setDT(test), measure=patterns('lon', 'lat'), 
          value.name=c('longitude', 'latitude'))
#change the 'variable' column from numeric index to 'from/to'
dM[, variable:= c('from', 'to')[variable]]
#create a sequence column grouped by 'variable'
dM[,i1:= 1:.N ,variable]
#order based on the 'i1'
res <- dM[order(i1)][,i1:=NULL]
res
#    dis dur  method variable longitude latitude
#1:  10  30     car     from   -1.9800  55.3009
#2:  10  30     car       to   -1.4565  76.8888
#3:  20  40     car     from   -1.5678  55.3416
#4:  20  40     car       to   -1.3424  65.8999
#5:  30  60 Bicycle     from   -1.3240  55.1123
#6:  30  60 Bicycle       to   -1.4566  76.9088
#7:  40  90 Bicycle     from   -1.4560  55.2234
#8:  40  90 Bicycle       to   -1.1111  25.3344

【讨论】:

    【解决方案2】:

    这可能不是最优雅的解决方案,但它应该可以工作并且希望可以理解:

    我们将数据分成两个数据帧:一个包含“来自”的经度和纬度数据(称为 testF),另一个包含“到”数据(称为 test)。然后我们使用 rbind 将 'testF' 的行插入到 'test' 的适当位置。

    test <- data.frame(dis = c(10,20,30,40),dur=c(30,40,60,90),method=c("car","car","Bicycle","Bicycle"),to_lon=c(-1.980,-1.5678,-1.324,-1.456),to_lat=c(55.3009,55.3416,55.1123,55.2234),from_lon=c(-1.4565,-1.3424,-1.4566,-1.1111),from_lat=c(76.8888,65.8999,76.9088,25.3344))
    
    testF <- test[,c(1:3,6,7)]
    names(testF)[4:5] <- c("lonitude", "latitude")
    test <- test[,1:5]
    names(test)[4:5] <- c("lonitude", "latitude")
    
    for(i in dim(test)[1]:1) {
      test <- rbind(test[1:i,], testF[i,], test[-(1:i),])
    }
    

    【讨论】:

    • 我同意您的解决方案,但如果您有超过一百万行要处理,那么循环可以很快为您的日记增加时间。感谢您的解决方案。
    【解决方案3】:

    这是使用包tidyr(一种流行的数据修改包)的替代方法,它避免了for 循环。

    library(tidyr)
    
    test <- data.frame(dis = c(10,20,30,40),dur=c(30,40,60,90),method=c("car","car","Bicycle","Bicycle"),to_lon=c(-1.980,-1.5678,-1.324,-1.456),to_lat=c(55.3009,55.3416,55.1123,55.2234),from_lon=c(-1.4565,-1.3424,-1.4566,-1.1111),from_lat=c(76.8888,65.8999,76.9088,25.3344))
    test$id <- 1:dim(test)[1]
    
    # gather latitude columns
    d1 <- gather(data = test, 
                 key = direction, 
                 value = latitude, 
                 to_lat, from_lat)
    
    # gather longitude columns
    d2 <- gather(data = test, 
                 key = direction, 
                 value = longitude, 
                 to_lon, from_lon)
    
    d3 <- cbind(d1[,c("direction","dis","dur","method","latitude")],d2[,c("longitude","id"),drop=FALSE])
    
    # Create names
    dir <- unlist(strsplit(d3$direction,"_"))
    dir <- dir[seq(from = 1, to = length(dir), by = 2)]
    
    # Factor and sort
    d3$direction <- factor(dir)
    d3[order(d3$id),]
    

    【讨论】:

      最近更新 更多