【问题标题】:Peak to Trough in time series data时间序列数据中的峰谷
【发布时间】:2017-08-26 03:41:42
【问题描述】:

寻找股票指数时间序列下跌 10% 或更多的每个实例。我正在努力对其进行编程,以便它了解订单问题(不希望升值 10%)。

理想情况下,代码将:选择值,检查值是否低于 10%,如果不是,则检查下一个,继续检查下一个直到找到一个,然后记录。然后,移动到那个“低谷”或“谷”作为开始,继续检查它之后的值是否比该值小 10% 或更多。

I have an Excel file with dates in the first column and the index value in the second

This is what it outputs which I don't think can be correct based on a graph

# Import Libraries
import pandas as pd
import numpy as np
import peakutils
from peakutils.plot import plot as pplot
from matplotlib import pyplot
import matplotlib.pyplot as plt
from scipy import signal
import csv
import scipy
import plotly.plotly as py
import plotly.graph_objs as go
from plotly.tools import FigureFactory as FF

# from pandas import DataFrame

# Import Excel as array
index = pd.read_csv(r"\Users\Reed_2\Desktop\Indexonly.csv")
print("as Pandas")
print (index.values)
# convert to 2 NumPy arrays
dates = index['Date'].as_matrix()
values = index['Index'].as_matrix()
print("values as NumPy")
print(values)
print("Date values")
print(dates)

# Find peaks
peaks = peakutils.indexes(values, thres=0.1, min_dist=1)

print ("peaks")
print(peaks)

a = np.asarray(peaks)
np.savetxt(r"C:\Users\Reed_2\Desktop\export.csv", a, delimiter=",")

可以访问 Python、R Studio 和 MatLab。更喜欢 Python,因为我最了解它。
非常感谢您对此提供的任何帮助。

【问题讨论】:

    标签: python pandas numpy time-series peakutils


    【解决方案1】:

    考虑返回系列s

    np.random.seed([3,1415])
    s = pd.Series(
        np.random.lognormal(.005, .5, size=100),
        pd.date_range('2015-01-01', periods=100, freq='B')
    ).cumprod()
    
    s.plot()
    

    使用生成器对索引进行切片

    def gen_slice(s, thresh):
        sidx = s.index
        idx = s.index[0]
        v = s.get_value(idx)
        yield idx
        for idx in sidx[1:]:
            v0 = s.get_value(idx)
            if (v0 / v) < 1-thresh:
                v = v0
                yield idx
    
    
    s.loc[list(gen_slice(s, .1))]
    
    2015-01-01    0.346504
    2015-01-02    0.184687
    2015-01-05    0.069298
    2015-01-06    0.022508
    2015-01-07    0.018996
    2015-01-26    0.014204
    2015-02-03    0.012777
    2015-05-01    0.008999
    2015-05-04    0.006039
    2015-05-06    0.004855
    dtype: float64
    

    我们可以看到每一个百分比的变化都小于10%

    s.loc[list(gen_slice(s, .1))].pct_change()
    
    2015-01-01         NaN
    2015-01-02   -0.467000
    2015-01-05   -0.624783
    2015-01-06   -0.675194
    2015-01-07   -0.156034
    2015-01-26   -0.252278
    2015-02-03   -0.100442
    2015-05-01   -0.295665
    2015-05-04   -0.328967
    2015-05-06   -0.195990
    dtype: float64
    

    我们可以绘制出这些事件发生的地点。

    idx = list(gen_slice(s, .1))
    
    ax = s.plot()
    ax.vlines(idx, s.min(), s.max())
    


    另见下文:

    可能需要遍历一系列元素或数据帧的行,以使下一个元素或下一行依赖于先前选择的元素或行。这称为路径依赖。

    考虑以下频率不规则的时间序列s

    #starting python community conventions
    import numpy    as np
    import pandas   as pd
    
    # n is number of observations
    n = 5000
    
    day = pd.to_datetime(['2013-02-06'])
    # irregular seconds spanning 28800 seconds (8 hours)
    seconds = np.random.rand(n) * 28800 * pd.Timedelta(1, 's')
    # start at 8 am
    start = pd.offsets.Hour(8)
    # irregular timeseries
    tidx = day + start + seconds
    tidx = tidx.sort_values()
    
    s = pd.Series(np.random.randn(n), tidx, name='A').cumsum()
    s.plot();
    

    让我们假设一个路径依赖条件。从系列的第一个成员开始,我想抓取每个后续​​元素,使得该元素与当前元素之间的绝对差大于或等于x

    我们将使用 python 生成器解决这个问题。

    生成器函数

    def mover(s, move_size=10):
        """Given a reference, find next value with
        an absolute difference >= move_size"""
        ref = None
        for i, v in s.iteritems():
            if ref is None or (abs(ref - v) >= move_size):
                yield i, v
                ref = v
    

    然后我们可以像这样定义一个新系列moves

    moves = pd.Series({i:v for i, v in mover(s, move_size=10)},
                      name='_{}_'.format(s.name))
    

    同时绘制它们

    moves.plot(legend=True)
    s.plot(legend=True)
    


    数据帧的模拟是:

    def mover_df(df, col, move_size=2):
        ref = None
        for i, row in df.iterrows():
            if ref is None or (abs(ref - row.loc[col]) >= move_size):
                yield row
                ref = row.loc[col]
    
    df = s.to_frame()
    moves_df = pd.concat(mover_df(df, 'A', 10), axis=1).T
    
    moves_df.A.plot(label='_A_', legend=True)
    df.A.plot(legend=True)
    

    【讨论】:

    • 嗨,感谢您的出色回答。但是,我还不太了解move_size。是不是在这种情况下,OP 希望变化至少为 10%,所以它等于 10?
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2019-07-31
    • 2012-08-28
    • 1970-01-01
    • 2021-06-30
    • 2021-01-13
    • 2015-08-09
    • 1970-01-01
    相关资源
    最近更新 更多