【发布时间】:2020-04-15 09:23:16
【问题描述】:
我有一个包含加密货币 ret.daily 的返回值的数据集,它的一小部分看起来像这样
Bitcoin Ethereum XRP Bitcoin.Cash Bitcoin.SV ...
2018-01-01 -0.04 0.02 0.04 -0.04 NA
2018-01-02 0.09 0.13 0.04 0.11 NA
2018-01-03 0.01 0.08 0.23 -0.04 NA
...
然后,我每天根据角值price.groups.daily 将每个硬币分成 5 组之一(这些只是我包括的最大的硬币,还有许多其他硬币,所以 5 组中的每组都有硬币组)
Bitcoin Ethereum XRP Bitcoin.Cash Bitcoin.SV ...
2018-01-01 5 5 4 5 NA
2018-01-02 5 5 4 5 NA
2018-01-03 5 5 4 5 NA
...
然后我想做的是取每组每天的平均值,并制作一个新矩阵,如下所示
1 2 3 4 5
2018-01-01 Mean(groups 1 numbers) Mean(groups 2 numbers) ... ... mean(-0.04, 0.02,-0.04,...)
2018-01-02 Mean(groups 1 numbers) Mean(groups 2 numbers)
2018-01-03 Mean(groups 1 numbers) Mean(groups 2 numbers)
...
当我进行分组时,我做了以下事情(其中 price.daily 是每日价格数据,这是我用来将数据分组的)
col.daily <- seq(1,length(price.daily$Bitcoin))
quantile.daily = sapply(col.daily, function(y) {quantile(x = unlist(price.daily[y,] ), seq(0,1, length=6),na.rm = TRUE )})
quantile.daily.t = t(quantile.daily)
rownames(quantile.daily.t) = rownames(price.daily)
combined.daily = cbind(price.daily, quantile.daily.t)
price.groups.daily = as.data.frame(t(apply(combined.daily, 1, function(x) findInterval(x[1:ncol(price.daily)], x[(1 + ncol(price.daily)):ncol(combined.daily)]))))
colnames(price.groups.daily) = colnames(price.daily)
price.groups.daily[price.groups.daily == 6] = 5
我像这样添加了最后一行,因为如果最大值等于最后一组中的结束间隔,我不知道如何解决,但这很好用。我想这也可以使用一些应用函数来完成,我只是不确定如何,因为在我可以使用诸如 Quantile 之类的函数之前,findInterval 正是我想做的。不确定是否有可以在这种情况下工作的功能?
编辑:使用dput(head(price.groups.daily[1:5])) 添加了我的一些数据(我的数据从 2014 年开始,但在我的示例中我从 2018 年开始,因为当时大多数硬币都不存在)
structure(list(Bitcoin = c(5, 5, 5, 5, 5, 5), Ethereum = c(NA_integer_,
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_
), XRP = c(1L, 1L, 1L, 1L, 1L, 2L), Bitcoin.Cash = c(NA_integer_,
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_
), Bitcoin.SV = c(NA_integer_, NA_integer_, NA_integer_, NA_integer_,
NA_integer_, NA_integer_)), row.names = c("2014-01-01", "2014-01-02",
"2014-01-03", "2014-01-04", "2014-01-05", "2014-01-06"), class = "data.frame")
对于> dput(head(ret.daily[1:5]))
structure(list(Bitcoin = c(0.0201473710988784, 0.048620314369761,
0.0826106401572204, 0.0209460599834816, -0.17281055170073, 0.0495261478685647
), Ethereum = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_), XRP = c(-0.0390090806022911, 0.0180075172268452, -0.108767309981219,
0.0184572292482077, -0.111605656954607, 0.0104300601469132),
Bitcoin.Cash = c(NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), Bitcoin.SV = c(NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_)), row.names = c("2014-01-03",
"2014-01-04", "2014-01-05", "2014-01-06", "2014-01-07", "2014-01-08"
), class = "data.frame")
【问题讨论】:
标签: r grouping apply lapply sapply