在 tapply 或 R 中使用 approx 函数答案

【问题标题】：Using approx function within tapply or by in R在 tapply 或 R 中使用 approx 函数
【发布时间】：2020-02-01 15:40:23
【问题描述】：

我有日期、深度和温度的温度分析仪 (tp) 数据。每个日期的深度并不完全相同，因此我需要将其统一到相同的深度并通过线性近似设置该深度的温度。我能够通过使用“近似”函数的循环来做到这一点（参见随附代码的第一部分）。但我知道我应该在没有循环的情况下做得更好（考虑到我将有大约 600,000 行）。我尝试使用“by”函数来完成，但未能成功地将结果（列表）转换为数据框或矩阵（参见代码的第二部分）。请记住，圆形深度的长度并不总是与示例中的相同。取整的深度在 Depth2 列，插值的温度放在 Temp2 解决这个问题的“正确”方法是什么？

# create df manually
tp <- data.frame(Date=double(31), Depth=double(31), Temperature=double(31))
tp$Date[1:11] <- '2009-12-17' ; tp$Date[12:22] <- '2009-12-18'; tp$Date[23:31] <- '2009-12-19' 
tp$Depth <- c(24.92,25.50,25.88,26.33,26.92,27.41,27.93,28.37,28.82,29.38,29.92,25.07,25.56,26.06,26.54,27.04,27.53,28.03,28.52,29.02,29.50,30.01,25.05,25.55,26.04,26.53,27.02,27.52,28.01,28.53,29.01)
tp$Temperature <- c(19.08,19.06,19.06,18.87,18.67,17.27,16.53,16.43,16.30,16.26,16.22,17.62,17.43,17.11,16.72,16.38,16.28,16.20,16.15,16.13,16.11,16.08,17.54,17.43,17.32,17.14,16.89,16.53,16.28,16.20,16.13)

# create rounded depth column
tp$Depth2 <- round(tp$Depth)

# loop on date to calculate linear approximation for rounded depth
dtgrp <- tp[!duplicated(tp[,1]),1]
for (i in dtgrp) {
  x1 <- tp[tp$Date == i, "Depth"]  
  y1 <- tp[tp$Date == i, "Temperature"]
  x2 <- tp[tp$Date == i, "Depth2"]
  tpa <- approx(x=x1,y=y1,xout=x2, rule=2)
  tp[tp$Date == i, "Temp2"] <- tpa$y
}
# reduce result to rounded depth
tp1 <- tp[!duplicated(tp[,-c(2:3)]),-c(2:3)]

# not part of the question, but the end need is for a matrix, so this complete it:
library(reshape2)
tpbydt <- acast(tp1, Date~Depth2, value.var="Temp2")

# second part: I tried to use the by function (instead of loop) but got lost when tring to convert it to data frame or matrix
rdpth <- function(x1,y1,x2) {
  tpa <- approx(x=x1,y=y1,xout=x2, rule=2)
  return(tpa)
}
tp2 <- by(tp, tp$Date,function(tp) rdpth(tp$Depth,tp$Temperature,tp$Depth2), simplify = TRUE)

【问题讨论】：

标签： r tapply function-approximation

【解决方案1】：

与by 调用非常接近，但请记住它返回一个对象列表。因此，考虑构建一个数据框列表以在最后进行行绑定：

df_list <- by(tp, tp$Date, function(sub) {
  tpa <- approx(x=sub$Depth, y=sub$Temperature, xout=sub$Depth2, rule=2)

  df <- unique(data.frame(Date = sub$Date, 
                          Depth2 = sub$Depth2,
                          Temp2 = tpa$y,
                          stringsAsFactors = FALSE))
  return(df)
})    

tp2 <- do.call(rbind, unname(df_list))

tp2
#          Date Depth2    Temp2
# 1  2009-12-17     25 19.07724
# 2  2009-12-17     26 19.00933
# 5  2009-12-17     27 18.44143
# 7  2009-12-17     28 16.51409
# 9  2009-12-17     29 16.28714
# 11 2009-12-17     30 16.22000
# 12 2009-12-18     25 17.62000
# 21 2009-12-18     26 17.14840
# 4  2009-12-18     27 16.40720
# 6  2009-12-18     28 16.20480
# 8  2009-12-18     29 16.13080
# 10 2009-12-18     30 16.08059
# 13 2009-12-19     25 17.54000
# 22 2009-12-19     26 17.32898
# 41 2009-12-19     27 16.90020
# 61 2009-12-19     28 16.28510
# 81 2009-12-19     29 16.13146

如果您重置row.names，这与您的tp1 输出完全相同：

identical(data.frame(tp1, row.names = NULL),
          data.frame(tp2, row.names = NULL))
# [1] TRUE

【讨论】：

谢谢！这完美地工作。虽然我承认我不确定我是否理解它 - 对功能没有什么经验。 “sub”作为sub-of-tp的定义在“by”语句中吗？
sub 是每个唯一 tp$Date 的子集数据框。
我是新来的，所以我没有足够的声望来表达我的谢意...