【问题标题】:Iterate R function over levels of a factor在因子水平上迭代 R 函数
【发布时间】:2020-05-22 16:21:53
【问题描述】:

我正在开发一项功能,根据这些几何形状的拼接方式调整暴露的表面区域。在实际应用中,我经常会处理一些形状的缺失数据,所以我需要在函数中处理。

我想为数据集中的每个主题(“ind”)迭代相同类型的调整。

testdata <- 
  data.frame(ind = rep(paste(letters[1:10]), each =2), A = rnorm(20, mean = 10, sd = 3), shape = rep(c("sphere", "ellipsoid"), 10), 
             x = rnorm(10, mean = 5, sd = 1))


funct <- function(A, shape, x, subject) {
  #Create NA aware function to deal with missing factor levels
  sum_ <- function(...) sum(..., na.rm=T) 

  radius <- x / 2

   A <- dplyr::case_when(
    shape %in% "sphere" ~ A - sum_((pi * radius[which(shape %in% 'cylinder')]^2)), 
    shape %in% "cylinder" ~ A -  sum_(2*(pi * radius^2)), 
    shape %in% "ellipsoid" ~ A -  sum_((0.2 * A[which(shape %in% "sphere")]), (2 * pi * radius[which(shape == "cylinder")]))
  )
  return(A)
}

此函数会产生预期的输出,但仅在我进行非常简单的调整(例如加减)时。当我实际执行上面的代码时,结果很差。

所以我尝试在函数中添加一个循环,但没有运气:

funct <- function(A, shape, x, subject) {
  #Create NA aware function to deal with missing factor levels
  sum_ <- function(...) sum(..., na.rm=T) 

  radius <- x / 2

  for(levels in levels(subject)) {
   A <- dplyr::case_when(
    shape %in% "sphere" ~ A - sum_((pi * radius[which(shape %in% 'cylinder')]^2)), 
    shape %in% "cylinder" ~ A -  sum_(2*(pi * radius^2)), 
    shape %in% "ellipsoid" ~ A -  sum_((0.2 * A[which(shape %in% "sphere")]), (2 * pi * radius[which(shape == "cylinder")]))
  )
  }
  return(A)
} 

这就是我得到的:

testdata$result <- funct(A = testdata$A, shape = testdata$shape, x = testdata$x, subject = testdata$ind)

这就是我想要的:

testdata <- 
  testdata %>%
  group_by(ind) %>%
  mutate(expected = case_when(
    shape %in% "sphere" ~ A - sum_((pi * radius[which(shape %in% 'cylinder')]^2)), 
    shape %in% "cylinder" ~ A -  sum_(2*(pi * radius^2)), 
    shape %in% "ellipsoid" ~ A -  sum_((0.2 * A[which(shape %in% "sphere")]), (2 * pi * radius[which(shape == "cylinder")]))
    )
  )

对于如何正确处理有什么建议吗?

【问题讨论】:

    标签: r loops iteration


    【解决方案1】:

    考虑by,它是一个基本的R 函数,它完全按照您的需要:迭代因子的级别。但是,by 将子集化的data.frame 传递给定义的方法并返回函数输出的list

    因此,调整参数以仅接收数据框,然后使用数据框限定所有列。然后将unlist 结果干净地映射到新列中:

    funct <- function(df) {
      #Create NA aware function to deal with missing factor levels
      sum_ <- function(...) sum(..., na.rm=T)
      radius <- df$x / 2
    
      A <- dplyr::case_when(
        df$shape %in% "sphere" ~ df$A - sum_((pi * radius[which(df$shape %in% 'cylinder')]^2)), 
        df$shape %in% "cylinder" ~ df$A -  sum_(2*(pi * radius^2)), 
        df$shape %in% "ellipsoid" ~ df$A -  sum_((0.2 * df$A[which(df$shape %in% "sphere")]), 
                                                 (2 * pi * radius[which(df$shape == "cylinder")]))
      )
      return(A)
    }
    
    testdata$A_new <- unlist(by(testdata, testdata$ind, funct))
    

    输出

    testdata 
    #    ind         A     shape        x               A_new
    # 1    a 10.762472    sphere 5.378723 10.762472, 7.641209
    # 2    a  9.793703 ellipsoid 5.673754  9.268827, 9.077957
    # 3    b  9.268827    sphere 5.274687  3.697459, 9.954235
    # 4    b 10.931723 ellipsoid 5.870127  5.136058, 7.315795
    # 5    c  3.697459    sphere 4.045259 15.532064, 6.427971
    # 6    c 10.693726 ellipsoid 4.536622 11.287207, 5.004321
    # 7    d  5.136058    sphere 4.558130  10.31248, 11.41084
    # 8    d  8.343007 ellipsoid 4.486902  8.015844, 3.818175
    # 9    e 15.532064    sphere 4.044176 12.234275, 2.507726
    # 10   e  9.534384 ellipsoid 6.179843  4.168281, 4.036249
    # 11   f 11.287207    sphere 5.378723 10.762472, 7.641209
    # 12   f  7.261763 ellipsoid 5.673754  9.268827, 9.077957
    # 13   g 10.312481    sphere 5.274687  3.697459, 9.954235
    # 14   g 13.473335 ellipsoid 5.870127  5.136058, 7.315795
    # 15   h  8.015844    sphere 4.045259 15.532064, 6.427971
    # 16   h  5.421344 ellipsoid 4.536622 11.287207, 5.004321
    # 17   i 12.234275    sphere 4.558130  10.31248, 11.41084
    # 18   i  4.954581 ellipsoid 4.486902  8.015844, 3.818175
    # 19   j  4.168281    sphere 4.044176 12.234275, 2.507726
    # 20   j  4.869905 ellipsoid 6.179843  4.168281, 4.036249
    

    顺便说一下,您可以在函数中运行带有嵌套ifelse 的完整基础 R:

      val1 <- df$A - sum_((pi * radius[which(df$shape %in% 'cylinder')]^2))
      val2 <- df$A - sum_(2*(pi * radius^2))
      val3 <- df$A - sum_((0.2 * df$A[which(df$shape %in% "sphere")]),
                          (2 * pi * radius[which(df$shape == "cylinder")]))
    
      A <- ifelse(df$shape %in% "sphere", val1, 
                  ifelse(df$shape %in% "cylinder", val2, 
                         ifelse(df$shape %in% "ellipsoid", val3, NA)
                  )
           )
    

    【讨论】:

    • 谢谢 - 我会考虑这些笔记!
    【解决方案2】:

    为什么不在有效的代码周围封装一个函数呢?

    funct <- function(DF){
      stopifnot(require('dplyr'))
      DF %>%
        mutate(radius = x/2) %>%
        group_by(ind) %>%
        mutate(expected = case_when(
          shape %in% "sphere" ~ A - sum_((pi * radius[which(shape %in% 'cylinder')]^2)), 
          shape %in% "cylinder" ~ A -  sum_(2*(pi * radius^2)), 
          shape %in% "ellipsoid" ~ A -  sum_((0.2 * A[which(shape %in% "sphere")]), (2 * pi * radius[which(shape == "cylinder")]))
        )
        ) %>%
        pull(expected)
    }
    funct(testdata)
    # [1]  6.3788028  9.5565272 13.2533235  0.3122422 11.2873741  9.2606929
    # [7]  8.2757801  6.7049484  8.3066440  5.6685577  8.5684219  5.2911563
    #[13]  7.6712383  8.6591288 12.8784822  7.0934471  8.4669715  5.5730195
    #[19]  7.4884850 15.7498085
    

    或者,要返回带有新列 expected 的数据帧,请将最后一个管道代码行 pull(expected) 替换为 select(-radius)。然后直接赋值

    testdata <- funct(testdata)
    

    【讨论】:

    • 感谢 Rui - 这很有帮助!
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2023-04-01
    • 1970-01-01
    • 2019-08-25
    相关资源
    最近更新 更多