【问题标题】:Removing Special Characters and Numbers for a column in a data frame删除数据框中列的特殊字符和数字
【发布时间】:2014-01-09 14:40:43
【问题描述】:

我有一个数据框:

dput(Data1)
structure(list(Emp.ID = c(182038L, 191854L), Project.Acquired.Skill = structure(c(2L, 
1L), .Label = c("Architecting (10),Cognos TM1 (4),Support Function (3)", 
"SAS (76),SAS Analytics (76),SAS BI (76),SAS data modeling tool (63),ClearCase (18),SQL (18),SQL Server (18),SQL SERVER 2000 (18),SQL SERVER 2005 (18),Excel (16),Oracle (16),AS400 (10)"
), class = "factor")), .Names = c("Emp.ID", "Project.Acquired.Skill"
), class = "data.frame", row.names = c(NA, -2L))

str(Data1)
'data.frame':   2 obs. of  2 variables:
 $ Emp.ID                : int  182038 191854
 $ Project.Acquired.Skill: Factor w/ 2 levels "Architecting (10),Cognos TM1 (4),Support Function (3)",..: 2 1  

我有一列是一个像 Architecting (10),Cognos TM1 (4),Support Function (3) 这样的因子,我需要去掉数字 (0-9)、WhiteSpace 和括号 () 以获得 Architecting,Cognos TM1,Support Function。我正面临问题,因为这被编码为因素。
我的输出应该是这样的

Emp ID  Project Acquired Skill
182038  SAS,SAS Analytics,SAS BI,SAS data modeling tool,ClearCase,SQL,SQL Server,SQL SERVER 2000,SQL SERVER 2005,Excel,Oracle,AS400
191854  Architecting,Cognos TM1,Support Function

【问题讨论】:

    标签: r dataframe gsub


    【解决方案1】:

    gsub 中使用字符类正则表达式:

    transform(Data1, Project.Acquired.Skill=gsub("\\s[0-9()]+","",Project.Acquired.Skill))
      Emp.ID
    1 182038
    2 191854
                                                                                                 Project.Acquired.Skill
    1 SAS,SAS Analytics,SAS BI,SAS data modeling tool,ClearCase,SQL,SQL Server,SQL SERVER,SQL SERVER,Excel,Oracle,AS400
    2                                                                          Architecting,Cognos TM1,Support Function
    

    【讨论】:

      【解决方案2】:
      (data1[,2] <- gsub("\\s\\(\\d+\\)", "", data1[,2]))
      # [1] "SAS,SAS Analytics,SAS BI,SAS data modeling tool,ClearCase,SQL,SQL Server,SQL SERVER 2000,SQL SERVER 2005,Excel,Oracle,AS400"
      # [2] "Architecting,Cognos TM1,Support Function"
      

      【讨论】:

        【解决方案3】:
        library(qdap)
        gsub(" ,", ",", strip(Data1[, 2], char.keep=",", lower=F))
        
        ## [1] "SAS,SAS Analytics,SAS BI,SAS data modeling tool,ClearCase,SQL,SQL Server,SQL SERVER ,SQL SERVER ,Excel,Oracle,AS"
        ## [2] "Architecting,Cognos TM,Support Function" 
        

        【讨论】:

          猜你喜欢
          • 2018-04-16
          • 2020-05-14
          • 2023-03-25
          • 1970-01-01
          • 1970-01-01
          • 2018-02-02
          • 2020-06-30
          • 1970-01-01
          相关资源
          最近更新 更多