【发布时间】:2018-02-26 18:41:32
【问题描述】:
我正在将一些预测数据与实际值进行比较。预测来自三个不同的供应商。但是,实际数据和预测数据的时间戳并不相同。我想比较每个预测点的误差。
在下面的快照中,我想获取每个提供商的预测与实际值的差异。圆圈中的点代表无法获得实际数据的预测,但我们可以看到存在明显的趋势。我想我可以使用分段近似,但我不知道该怎么做。我已经看到Need a R package for piecewise linear regression? 中发布的答案,但这并不是很有帮助。
样本数据(1 天)
> dput(dt)
structure(list(tme = structure(c(1516221000, 1516224600, 1516228200,
1516231800, 1516235400, 1516239000, 1516242600, 1516246200, 1516249800,
1516253400, 1516257000, 1516260600, 1516264200, 1516267800, 1516271400,
1516275000, 1516278600, 1516282200, 1516285800, 1516289400, 1516293000,
1516296600, 1516300200, 1516303800, 1516307400, 1516226400, 1516230000,
1516233600, 1516237200, 1516240800, 1516244400, 1516248000, 1516251600,
1516255200, 1516258800, 1516262400, 1516266000, 1516269600, 1516273200,
1516276800, 1516280400, 1516284000, 1516287600, 1516291200, 1516294800,
1516298400, 1516302000, 1516305600, 1516221000, 1516224600, 1516228200,
1516231800, 1516235400, 1516239000, 1516242600, 1516246200, 1516249800,
1516253400, 1516257000, 1516260600, 1516264200, 1516267800, 1516271400,
1516275000, 1516278600, 1516282200, 1516285800, 1516289400, 1516293000,
1516296600, 1516300200, 1516303800, 1516307400, 1516233600, 1516244400,
1516255200, 1516266000, 1516276800, 1516287600, 1516298400), tzone = "UTC", class = c("POSIXct",
"POSIXt")), degc = c(2.25, 1.69, 2.22, 2.22, 1.65, 1.12, 2.22,
1.1, 1.13, 2.82, 5.58, 7.8, 7.85, 8.43, 10.05, 10.06, 10.07,
10.03, 8.89, 6.17, 5.04, 5.01, 3.92, 2.29, 2.29, -1, -1, -1,
-1, -1, 0, 1, 2, 4, 6, 7, 8, 8, 9, 9, 9, 7, 6, 4, 3, 2, 2, 1,
-0.16, -1.13, -2.19, -2.98, -3.48, -3.86, -3.84, -2.96, -1.16,
0.91, 2.61, 3.92, 4.84, 5.59, 6.68, 7.41, 6.82, 5.08, 3.07, 1.56,
0.51, -0.36, -1.15, -1.86, -2.53, -0.2, -0.9, 4.1, 6.9, 8.1,
3.6, 2.6), rh = c(0.55, 0.6, 0.51, 0.51, 0.6, 0.52, 0.55, 0.57,
0.6, 0.49, 0.44, 0.41, 0.38, 0.36, 0.33, 0.33, 0.31, 0.33, 0.35,
0.39, 0.4, 0.4, 0.43, 0.49, 0.49, 73, 73, 75, 75, 75, 71, 67,
59, 52, 47, 42, 39, 37, 35, 34, 37, 43, 48, 51, 54, 58, 61, 62,
0.61, 0.64, 0.67, 0.7, 0.72, 0.74, 0.74, 0.71, 0.65, 0.58, 0.54,
0.52, 0.51, 0.5, 0.46, 0.44, 0.45, 0.5, 0.57, 0.61, 0.64, 0.65,
0.67, 0.69, 0.71, 59.1, 62.6, 43.9, 36.7, 33.2, 46.4, 50.1),
type = c("Actual", "Actual", "Actual", "Actual", "Actual",
"Actual", "Actual", "Actual", "Actual", "Actual", "Actual",
"Actual", "Actual", "Actual", "Actual", "Actual", "Actual",
"Actual", "Actual", "Actual", "Actual", "Actual", "Actual",
"Actual", "Actual", "Provider W", "Provider W", "Provider W",
"Provider W", "Provider W", "Provider W", "Provider W", "Provider W",
"Provider W", "Provider W", "Provider W", "Provider W", "Provider W",
"Provider W", "Provider W", "Provider W", "Provider W", "Provider W",
"Provider W", "Provider W", "Provider W", "Provider W", "Provider W",
"Provider D", "Provider D", "Provider D", "Provider D", "Provider D",
"Provider D", "Provider D", "Provider D", "Provider D", "Provider D",
"Provider D", "Provider D", "Provider D", "Provider D", "Provider D",
"Provider D", "Provider D", "Provider D", "Provider D", "Provider D",
"Provider D", "Provider D", "Provider D", "Provider D", "Provider D",
"Provider B", "Provider B", "Provider B", "Provider B", "Provider B",
"Provider B", "Provider B")), .Names = c("tme", "degc", "rh",
"type"), row.names = c(NA, -80L), class = c("data.table", "data.frame"
), .internal.selfref = <pointer: 0x0000000000120788>)
我真的不确定如何进行此操作。我需要对具有多达 30 个变量(样本数据只有两个)的多个数据集(每个数百行)重复此练习。
【问题讨论】:
-
在“t”时刻,我有来自提供商 D 的预测值(我们称之为 d_t),但实际值仅在“t-10 分钟”(a_{t-10})知道和“t+30 分钟”(a_{t+30})。我想在实际值之间进行插值以获得“t”(a_t)的估计值和 a_t - d_t 的差值。
标签: r interpolation