【问题标题】:Split a column into 2 in R在R中将一列拆分为2
【发布时间】:2017-05-10 16:42:06
【问题描述】:
 我有这个数据框

      CC.Number Date Time Accident.Type Location.1
    1 12T008826 07/01/2012 1630 PD (39.26699, -76.560642)
    2 12L005385 2012 年 7 月 2 日 1229 PD(39.000549,-76.399312)
    3 12L005388 07/02/2012 1229 PD (39.00058, -76.399267)
    4 12T008851 07/02/2012 445 PI (39.26367, -76.56648)
    5 12T008858 07/02/2012 802 PD (39.240862, -76.599017)
    6 12T008860 07/02/2012 832 PD (39.27022, -76.63926)

我想将 Location.1 列拆分为 "alt" 和 "lng" 列,就像

  CC.Number       Date Time Accident.Type      alt       lng
1 12T008826 07/01/2012 1630            PD  39.26699    -76.560642
2 12L005385 07/02/2012 1229            PD  39.000549   -76.399312
3 12L005388 07/02/2012 1229            PD  39.00058    -76.399267

我试过了

location <- md$Location.1
location1 <- substring(location, 2)
location2 <- substr(location1, 1, nchar(location1)-1 )
location3 <-  strsplit(location2, ",")

但停留在将 location3 从列表转换为数据框

我试过了

ocdf<-data.frame(location2)
colnames(locdf)[1] = c("x")
df <- separate(location, col=x,into = c("lat","log"), sep = ",")

但我得到一个错误

UseMethod("separate_") 中的错误:没有适用的方法 'separate_' 应用于“字符”类的对象

【问题讨论】:

    标签: r dataframe split


    【解决方案1】:

    tidyr 中的separate 也可以使用

    library(tidyr)
    # Sub out the parentheses
    df$Location.1 <- gsub("[()]", "", df$Location.1)
    
    separate(df, col = Location.1, into = c("lat","long"), sep = ",")
    #  CC.Number       Date Time Accident.Type       lat        long
    #1 12T008826 07/01/2012 1630            PD  39.26699  -76.560642
    #2 12L005385 07/02/2012 1229            PD 39.000549  -76.399312
    #3 12L005388 07/02/2012 1229            PD  39.00058  -76.399267
    #4 12T008851 07/02/2012  445            PI  39.26367   -76.56648
    #5 12T008858 07/02/2012  802            PD 39.240862  -76.599017
    #6 12T008860 07/02/2012  832            PD  39.27022   -76.63926
    

    【讨论】:

      【解决方案2】:

      我们可以使用tidyr 中的extract,方法是捕获两个组,仅包含带点的数字元素,并丢弃“Location.1”中的其余部分

      library(tidyr)
      df1 %>% 
        extract(Location.1, into = c('alt', 'lng'), "\\(([0-9.]+),\\s+(-*[0-9.]+).")
      # CC.Number       Date Time Accident.Type       alt        lng
      #1 12T008826 07/01/2012 1630            PD  39.26699 -76.560642
      #2 12L005385 07/02/2012 1229            PD 39.000549 -76.399312
      #3 12L005388 07/02/2012 1229            PD  39.00058 -76.399267
      #4 12T008851 07/02/2012  445            PI  39.26367  -76.56648
      #5 12T008858 07/02/2012  802            PD 39.240862 -76.599017
      #6 12T008860 07/02/2012  832            PD  39.27022  -76.63926
      

      【讨论】:

        【解决方案3】:

        您也可以这样做,假设 dat1 是您的原始数据集名称,我们可以使用 strsplit 和 gsub。首先我们使用 gsub 将逗号和括号替换为空,然后使用 strsplit 以空格分隔值:

        df1 <- setNames(data.frame(do.call("rbind",strsplit(gsub("\\(|\\)|,","",dat1$Location.1),split=" "))),c("Lat","Long"))
        df2 <- data.frame(cbind(dat1[,1:(length(dat1)-1)],df1))
        
        # CC.Number     Date Time Accident.Type       Lat       Long
        # 1 12T008826 07/01/12 1630            PD  39.26699 -76.560642
        # 2 12L005385 07/02/12 1229            PD 39.000549 -76.399312
        # 3 12L005388 07/02/12 1229            PD  39.00058 -76.399267
        # 4 12T008851 07/02/12  445            PI  39.26367  -76.56648
        # 5 12T008858 07/02/12  802            PD 39.240862 -76.599017
        # 6 12T008860 07/02/12  832            PD  39.27022  -76.63926
        

        【讨论】:

          【解决方案4】:

          base中,您可以使用trimws 删除()read.table, 处拆分。

          cbind(md[1:4], read.table(sep=",", text=trimws(md$Location.1, whitespace = "[ ()]"),
           col.names=c("alt", "lng")))
          #  CC.Number        Date Time  Accident.Type      alt       lng
          #1 12T008826  07/01/2012 1630             PD 39.26699 -76.56064
          #2 12L005385  07/02/2012 1229             PD 39.00055 -76.39931
          #3 12L005388  07/02/2012 1229             PD 39.00058 -76.39927
          #4 12T008851  07/02/2012  445             PI 39.26367 -76.56648
          #5 12T008858  07/02/2012  802             PD 39.24086 -76.59902
          #6 12T008860  07/02/2012  832             PD 39.27022 -76.63926
          

          数据:

          md <- structure(list(CC.Number = c("12T008826", "12L005385", "12L005388", 
          "12T008851", "12T008858", "12T008860"), Date = c(" 07/01/2012", 
          " 07/02/2012", " 07/02/2012", " 07/02/2012", " 07/02/2012", " 07/02/2012"
          ), Time = c(1630L, 1229L, 1229L, 445L, 802L, 832L), Accident.Type = c("            PD", 
          "            PD", "            PD", "            PI", "            PD", 
          "            PD"), Location.1 = c("  (39.26699, -76.560642)", 
          " (39.000549, -76.399312)", "  (39.00058, -76.399267)", "   (39.26367, -76.56648)", 
          " (39.240862, -76.599017)", "   (39.27022, -76.63926)")), class = "data.frame", row.names = c(NA, 
          -6L))
          

          【讨论】:

            猜你喜欢
            • 1970-01-01
            • 1970-01-01
            • 2013-03-06
            • 1970-01-01
            • 1970-01-01
            • 2020-03-26
            • 2014-09-04
            • 2016-11-10
            • 2019-05-02
            相关资源
            最近更新 更多