【问题标题】:Replace missing values in one table to values in another table by joining column name to row通过将列名连接到行来将一个表中的缺失值替换为另一个表中的值
【发布时间】:2017-03-16 17:51:33
【问题描述】:

我试图通过将列名连接到行来将一个表中的缺失值替换为另一个表中的值。下面是一个例子:

df1

A  B  C  D
1  3  4  *
4  *  5  9
0  *  2  *
1  2  *  4

df2

Variable  Value
A  2
B  1
C  9
D  0

结果数据集:

A  B  C  D
1  3  4  0
4  1  5  9
0  1  2  0
1  2  9  4

【问题讨论】:

    标签: r missing-data


    【解决方案1】:

    另一个使用stackunstack的选项,

    d1 <- stack(df)
    d1$values[d1$values == '*'] <- df1$Value[match(d1$ind, df1$Variable)][d1$values == '*']
    unstack(d1, values ~ ind)
    #  A B C D
    #1 1 3 4 0
    #2 4 1 5 9
    #3 0 1 2 0
    #4 1 2 9 4
    

    数据

    dput(df)
    structure(list(A = c(1, 4, 0, 1), B = c("3", "*", "*", "2"), 
        C = c("4", "5", "2", "*"), D = c("*", "9", "*", "4")), .Names = c("A", 
    "B", "C", "D"), row.names = c(NA, -4L), class = "data.frame")
    
    dput(df1)
    structure(list(Variable = c("A", "B", "C", "D"), Value = c(2L, 
    1L, 9L, 0L)), .Names = c("Variable", "Value"), row.names = c(NA, 
    -4L), class = "data.frame")
    

    【讨论】:

      【解决方案2】:

      我们可以使用Map

      df1[as.character(df2$Variable)] <- Map(function(x, y)
          replace(x, is.na(x), y), df1[as.character(df2$Variable)], df2$Value)
      

      如果值不是NA 而只是* 那么

      df1[as.character(df2$Variable)] <- Map(function(x, y)
          replace(x, x=="*", y), df1[as.character(df2$Variable)], df2$Value)
      df1
      #  A B C D
      #1 1 3 4 0
      #2 4 1 5 9
      #3 0 1 2 0
      #4 1 2 9 4
      

      如果数据集'df1'不是字符,那么做

      df1[] <- as.matrix(df1)
      

      数据

      df1 <- structure(list(A = c(1L, 4L, 0L, 1L), B = c("3", "*", "*", "2"
       ), C = c("4", "5", "2", "*"), D = c("*", "9", "*", "4")), .Names = c("A", 
       "B", "C", "D"), class = "data.frame", row.names = c(NA, -4L))
      df2 <- structure(list(Variable = c("A", "B", "C", "D"), Value = c(2L, 
       1L, 9L, 0L)), .Names = c("Variable", "Value"), class = "data.frame",
        row.names = c(NA, -4L))
      

      【讨论】:

      • 这显示错误:mapply 中的错误(FUN = f, ..., SIMPLIFY = FALSE):零长度输入不能与非零长度的输入混合
      • @shivanigupta 您应该显示数据的输出。我没有收到任何错误
      • @shivanigupta 我用我使用的数据进行了更新。请检查您是否收到我帖子中数据的错误
      • 是的,我想我犯了一些语法错误。它工作正常。非常感谢!!
      【解决方案3】:

      找出“*”的列名并将其与df2中的Variable列匹配并提取对应的Value

      x <- which(df1=="*", arr.ind = TRUE)
      df1[x] <- df2$Value[match(names(df1)[x[, 2]], df2$Variable)]
      
      #  A B C D
      #1 1 3 4 0
      #2 4 1 5 9
      #3 0 1 2 0
      #4 1 2 9 4
      

      这是假设您在df1 中有字符列,如果它们没有转换它们

      df1[] <- lapply(df1, as.character)
      

      【讨论】:

        【解决方案4】:

        我们可以创建查找表,然后在匹配时更新:

        # make a lookup table same size as df1
        df2Lookup <-
          matrix(rep(df2$Value, nrow(df1)), nrow = nrow(df1), byrow = TRUE)
        
        # then update on "*"
        df1[ df1 == "*" ] <- df2Lookup[ df1 == "*" ]
        
        #result
        df1
        #   A B C D
        # 1 1 3 4 0
        # 2 4 1 5 9
        # 3 0 1 2 0
        # 4 1 2 9 4
        

        【讨论】:

          猜你喜欢
          • 2021-12-28
          • 1970-01-01
          • 2023-03-06
          • 1970-01-01
          • 2015-11-30
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 2014-07-08
          相关资源
          最近更新 更多