【问题标题】:R - Avoid concatenation when replacing string by numberR - 用数字替换字符串时避免连接
【发布时间】:2021-04-20 13:01:43
【问题描述】:

看起来是个很简单的问题,但目前我还没有找到任何解决方案。

考虑以下数据框:

dat <- data.frame(id=LETTERS[1:5],
                  land.use=c(3,4,9,34,39))

我需要用字符串替换land.use 列中的数字。问题是:我对数字 3434 有不同的字符串。

但是,R 坚持将34 替换为34 的串联字符串。

例如:

dat$land.use <- gsub("3","Bare soil", dat$land.use)
dat$land.use <- gsub("4","Primary Forest", dat$land.use)
dat$land.use <- gsub("9","Secondary Forest", dat$land.use)
dat$land.use <- gsub("34","Wheat", dat$land.use)
dat$land.use <- gsub("39","Soybean", dat$land.use)

> dat
  id                  land.use
1  A                 Bare soil # This is OK
2  B            Primary Forest # This is OK
3  C          Secondary Forest # This is OK
4  D   Bare soilPrimary Forest # This should be Wheat
5  E Bare soilSecondary Forest # This should be Soybean

我做错了什么?

【问题讨论】:

    标签: r replace gsub


    【解决方案1】:

    当您想要执行完全匹配时,不要使用部分匹配函数(gsubgrep 等)。您可以创建查找表并执行连接。

    lookup_table <- data.frame(land.use = c(3, 4, 9, 34, 39), 
                               value = c("Bare soil", "Primary Forest", 
                               "Secondary Forest", "Wheat", "Soybean"))
    
    merge(dat, lookup_table, all.x = TRUE, by = 'land.use')
    
    #  land.use id            value
    #1        3  A        Bare soil
    #2        4  B   Primary Forest
    #3        9  C Secondary Forest
    #4       34  D            Wheat
    #5       39  E          Soybean
    

    【讨论】:

      【解决方案2】:

      在这种情况下,我会使用match用字符串替换数字

      c("Bare soil","Primary Forest","Secondary Forest","Wheat",
        "Soybean")[match(dat$land.use, c(3,4,9,34,39))]
      #[1] "Bare soil"        "Primary Forest"   "Secondary Forest" "Wheat"           
      #[5] "Soybean"         
      

      要使用您的方法,您必须添加 ^$

      dat$land.use <- sub("^3$","Bare soil", dat$land.use)
      dat$land.use <- sub("^4$","Primary Forest", dat$land.use)
      dat$land.use <- sub("^9$","Secondary Forest", dat$land.use)
      dat$land.use <- sub("^34$","Wheat", dat$land.use)
      dat$land.use <- sub("^39$","Soybean", dat$land.use)
      dat
      #  id         land.use
      #1  A        Bare soil
      #2  B   Primary Forest
      #3  C Secondary Forest
      #4  D            Wheat
      #5  E          Soybean
      

      【讨论】:

        【解决方案3】:

        根据您接下来要做什么,您也可能需要一个factor() 变量。您可以这样做,或者使用其他方法之一,稍后再使用as.factor()

        dat$land.use.factor <- factor(dat$land.use, 
                                      levels = c(3, 4, 9, 34, 39),
                                      labels = c("Bare soil", "Primary Forest", 
                                                 "Secondary Forest", "Wheat", "Soybean"))
        
        # > dat
        #    id land.use  land.use.factor
        # 1   A        3        Bare soil
        # 2   B        4   Primary Forest
        # 3   C        9 Secondary Forest
        # 4   D       34            Wheat
        # 5   E       39          Soybean
        

        【讨论】:

          【解决方案4】:

          我们可以使用left_join

          library(dplyr)
          left_join(df1, keydat, by = 'land.use')
          

          数据

          keydat <- data.frame(land.use = c(3, 4, 9, 34, 39), 
                                     value = c("Bare soil", "Primary Forest", 
                                     "Secondary Forest", "Wheat", "Soybean"))
          

          【讨论】:

            猜你喜欢
            • 1970-01-01
            • 1970-01-01
            • 1970-01-01
            • 2019-12-15
            • 1970-01-01
            • 1970-01-01
            • 1970-01-01
            • 1970-01-01
            • 2011-04-16
            相关资源
            最近更新 更多