时间序列中的峰值检测答案

【问题标题】：Peak Detection in Time Series时间序列中的峰值检测
【发布时间】：2012-08-28 07:48:11
【问题描述】：

我目前正在做一个小项目，我想在其中比较两个时间序列。相似度度量真的很模糊，如果两个时间序列的形状大致相同，则认为它们是相似的。

所以我想“如果它们只需要具有相同的形状，我只是比较两个时间序列的峰值，如果峰值在同一位置，那么时间序列肯定会相似"

我现在的问题是找到一个好的峰值检测算法。我用谷歌，但我只想出了论文Simple Algorithms for Peak Detection in Time-Series。问题是，本文中描述的算法可以很好地处理非常极端和细小的峰，但在大多数情况下，我的time-series 有相当平坦的峰，因此不会被检测到。

有人知道我在哪里可以找到或搜索可以检测下图中显示的峰值的算法吗？

【问题讨论】：

我的高中数学很模糊，但您不想计算滚动一阶（或者考虑到平坦度可能是二阶）导数然后找到变化吗？
我相信之字折线指标应该对你很有用stockcharts.com/school/…

标签： java time-series

【解决方案1】：

该问题的答案较晚，但动态时间规整 (DTW) 算法是此类问题的正确选择。基本上有两个时间序列，其中一个是模板，另一个是样本。我建议检查微笑库 DynamicTimeWarping 类的源代码。

http://haifengl.github.io/

【讨论】：

【解决方案2】：

我不确定时间序列或特定峰值检测算法之间的相关性，但这是我写的一个小最大峰值检测算法。它不会检测最小峰值，但可以通过反转 for 循环中的操作轻松扩展。

List<XYDataItem> maxPoints = ... //list to store the maximums
XYDataItem leftPeakPoint = new XYDataItem(0, 0);
int leftPeakPointIndex = 0;
XYDataItem rightPeakPoint = new XYDataItem(0, 0);
boolean first = true;
int index = -1;
List<XYDataItem> pointList = (List<XYDataItem>) lrpSeries.getItems();
for (XYDataItem point : pointList) {
    index++;
    if (first) {
        //initialize the first point
        leftPeakPoint = point;
        leftPeakPointIndex = index;
        first = false;
        continue;
    }
    if (leftPeakPoint.getYValue() < point.getYValue()) {
        leftPeakPoint = point;
        leftPeakPointIndex = index;
        rightPeakPoint = point;
    } else if (leftPeakPoint.getYValue() == point.getYValue()) {
        rightPeakPoint = point;
    } else {
        //determine if we are coming down off of a peak by looking at the Y value of the point before the
        //left most point that was detected as a part of a peak
        if (leftPeakPointIndex > 0) {
            XYDataItem prev = pointList.get(leftPeakPointIndex - 1);
            //if two points back has a Y value that is less than or equal to the left peak point
            //then we have found the end of the peak and we can process as such
            if (prev.getYValue() <= leftPeakPoint.getYValue()) {
                double peakx = rightPeakPoint.getXValue() - ((rightPeakPoint.getXValue() - leftPeakPoint.getXValue()) / 2D);
                maxPoints.add(new XYDataItem(peakx, leftPeakPoint.getYValue()));
            }
        }
        leftPeakPoint = point;
        leftPeakPointIndex = index;
        rightPeakPoint = point;
    }
}

这样的结果会将检测到的峰值集中在连续数据点的 Y 值相同的平坦部分。 XYDataItem 只是一个包含 X 和 Y 值作为双精度值的类。这可以很容易地用等效的东西代替。

【讨论】：

【解决方案3】：

您可以使用一个非常简单的局部极值检测器：

// those are your points:
double[] f = {1, 2, 3, 4, 5, 6, 5, 4, 7, 8, 9, 3, 1, 4, 6, 8, 9, 7, 4, 1};
List<Integer> ext = new ArrayList<Integer> ();
for (int i = 0; i<f.length-2; i++) {
  if ((f[i+1]-f[i])*(f[i+2]-f[i+1]) <= 0) { // changed sign?
    ext.add(i+1);
  }
}
// now you have the indices of the extremes in your list `ext`

这适用于平滑系列。如果你的数据有一定的变化，你应该先把它通过一个低通滤波器。一个非常简单的低通滤波器实现是移动平均（每个点都被最接近的 k 值的平均值代替，k 是窗口大小）。

【讨论】：

【解决方案4】：

您似乎只是在寻找斜率反转（从正到负，反之亦然）。一个粗略的 java 算法可能是（未测试）：

List<Point> points = ... //all the points in your curve
List<Point> extremes = new ArrayList<Point> ();
double previous = null;
double previousSlope = 0;

for (Point p : points) {
    if (previous == null) { previous = p; continue; }
    double slope = p.getValue() - previous.getValue();
    if (slope * previousSlope < 0) { //look for sign changes
        extremes.add(previous);
    }
    previousSlope = slope;
    previous = p;
}

最后，衡量相似性的一个好方法是相关性。在您的情况下，我会查看百分比移动相关性（换句话说，您希望您的 2 个系列同时上升或下降） - 这通常是在金融中所做的，例如计算 2 个资产回报之间的相关性：

创建 2 个新系列，其中 2 个系列中每个点的移动百分比为 %
计算这 2 个系列之间的相关性

您可以阅读有关returns correlations here for example 的更多信息。总之，如果您的价值观是：

Series 1  Series 2
 100        50
 98         49
 100        52
 102        54

“回归”系列将是：

Series 1  Series 2
 -2.00%     -2.00%
 +2.04%     +6.12%
 +2.00%     +3.85%

然后您计算这 2 个回报系列的相关性（在本例中：0.96），以衡量 2 条曲线的相似程度。您可能需要调整结果的方差（即，如果一个形状的范围比另一个形状大得多）。

【讨论】：

感谢您的意见，寻找符号变化是一个非常好的和简单的想法。但只有一个问题：你的第二个想法听起来很有趣，但我不太明白。你知道我在哪里可以找到更多关于它是如何完成的信息吗？
考虑到他给出的示例数据，相关性将是一个非常好的检测器。 +1

【解决方案5】：

Eli Billauer 提出的 peakdet 算法效果很好且易于实现：

http://www.billauer.co.il/peakdet.html

该算法特别适用于使用一阶导数的方法失败的噪声信号。

【讨论】：

即使该问题被标记为 Java，OP 确实要求一种允许在时间序列中找到峰值的算法。他没有特别要求在 Java 中实现。文章详细解释了 matlab 实现，还链接到 C、Python 和 Fortran 中的实现。如果有人期望一些复制和粘贴解决方案被误导，我很抱歉。

【解决方案6】：

如果您想要在统计上更可靠，您可以测量两个系列之间的互相关。您可以查看Wikipedia，或this site。

【讨论】：

感谢您的链接，这看起来很有趣。