【问题标题】:How does one utilize the apply function over columns of an xts object?如何在 xts 对象的列上使用 apply 函数?
【发布时间】:2021-06-06 21:27:03
【问题描述】:

我在 R 中有以下 xts 对象,并试图计算每行最后三列(b、c 和 d)的最大值。第一行可以产生 NA。

library(xts)

    x <-  structure(c(3.081786, 3.001786, 3.063214, 3.07, 3.0875, 
0.167143, 0.0760719999999999, 0.0642850000000004, 0.0446430000000002, 
0.279643, NA, 0.0767860000000002, 0.019285, 0.0528569999999999, 0.268214, 
NA, 0.000714000000000325, 0.0450000000000004, 0.00821399999999972, 
0.0114290000000001), class = c("xts", "zoo"), .indexCLASS = "Date", 
.indexTZ = "UTC", tclass = "Date", tzone = "UTC", src = "yahoo", updated = 
structure(1615250183.87979, class = c("POSIXct", "POSIXt")), index = 
structure(c(1167782400, 1167868800, 1167955200, 1168214400, 1168300800), 
tzone = "UTC", tclass = "Date"), .Dim = 5:4, .Dimnames = list( NULL, 
c("a", "b", "c", "d")))


>x

                  a        b        c        d
2007-01-03 3.081786 0.167143       NA       NA
2007-01-04 3.001786 0.076072 0.076786 0.000714
2007-01-05 3.063214 0.064285 0.019285 0.045000
2007-01-08 3.070000 0.044643 0.052857 0.008214
2007-01-09 3.087500 0.279643 0.268214 0.011429

我认为下面的方法会起作用,但它不会产生预期的结果,并且还会产生产生重复行的意外后果。

x$e <- apply(x[,2:4],1,max)
> x

                  a        b        c        d        e
2007-01-03 3.081786 0.167143       NA       NA       NA
2007-01-03       NA       NA       NA       NA       NA
2007-01-04 3.001786 0.076072 0.076786 0.000714       NA
2007-01-04       NA       NA       NA       NA 0.076786
2007-01-05 3.063214 0.064285 0.019285 0.045000       NA
2007-01-05       NA       NA       NA       NA 0.064285
2007-01-08 3.070000 0.044643 0.052857 0.008214       NA
2007-01-08       NA       NA       NA       NA 0.052857
2007-01-09 3.087500 0.279643 0.268214 0.011429       NA
2007-01-09       NA       NA       NA       NA 0.279643

两个问题:

  1. 在不产生重复行的情况下,计算每行最后三列的最大值的正确方法是什么?
  2. 为什么在执行apply(y[,2:4],1,max))时会产生重复的行?

【问题讨论】:

  • 为我工作 - x$e &lt;- apply(x[,2:4],1,max) 你能通过dput(x) 准确地提供x 的结构吗?
  • > dput(x)的结构(C(3.081786,3.001786,3.063214,3.07,3.0875,0.167143,0.0760719999999999,0.0642850000000004,0.0446430000000002,0.279643,NA,0.0767860000000002,0.019285,0.0528569999999999,0.268214,NA,0.000714000000000325 , 0.0450000000000004, 0.0082139999999972, 0.0114290000000001), class= c("xts", "zoo"), .indexCLASS = "日期", .indexTZ = "UTC", tclass= "UTCs", tzone = "UTCsrc"雅虎”,更新=结构(1615250183.87979,class= c(“POSIXct”,“POSIXt”)),索引=结构(c(1167782400,1167868800,1167955200,1168214400,1168300800),tzone =“UTC”,t跨度>
  • "日期"), .Dim = 5:4, .Dimnames = list( NULL, c("a", "b", "c", "d")))
  • 看起来很有趣,因为apply 会产生一个命名向量 - 例如x$e &lt;- unname(apply(x[,2:4], 1, max)) 有效。
  • 对我来说行不重复。行数保持不变。你确定你的dput 是正确的吗?对我来说,行号就像数字11677824001167868800 一样,当我在控制台中输入x 时会打印很多属性。

标签: r xts


【解决方案1】:

这里有一个matrixStats::rowMaxs 的解决方案,它有效且效率更高。

x$e <- matrixStats::rowMaxs(x, cols=2:4, na.rm=T)
x
#                   a        b        c        d        e
# 2007-01-03 3.081786 0.167143       NA       NA 0.167143
# 2007-01-04 3.001786 0.076072 0.076786 0.000714 0.076786
# 2007-01-05 3.063214 0.064285 0.019285 0.045000 0.064285
# 2007-01-08 3.070000 0.044643 0.052857 0.008214 0.052857
# 2007-01-09 3.087500 0.279643 0.268214 0.011429 0.279643

地点:

class(x)
# [1] "xts" "zoo"

【讨论】:

  • 仅供参考,matrixStats::rowMaxs(x, cols=2:4, na.rm=TRUE)matrixStats::rowMaxs(x[, 2:4], na.rm=TRUE) 更有效
  • @HenrikB 哇,确实快了大约 70%,我印象深刻!
  • 有一个简单的原因:x[, 2:4] 需要为该大小的新对象临时分配内存并将值复制过来,然后让垃圾收集器清理,而使用 cols=2:4 则直接在现有的三列没有额外的开销。
  • @HenrikB 我想过这样的事情。感谢您的洞察力!
猜你喜欢
  • 2012-11-15
  • 1970-01-01
  • 2018-01-21
  • 2012-12-02
  • 2016-02-23
  • 1970-01-01
  • 2020-03-16
  • 1970-01-01
相关资源
最近更新 更多