在 Python 中绘制一系列时间序列的流持续时间曲线答案

【问题标题】：Plotting a flow duration curve for a range of several timeseries in Python在 Python 中绘制一系列时间序列的流持续时间曲线
【发布时间】：2018-08-24 13:30:39
【问题描述】：

流动持续时间曲线是水文学（和其他领域）中可视化时间序列的常用方法。它们允许轻松评估时间序列中的高值和低值以及达到某些值的频率。 Python中有一种简单的方法来绘制它吗？我找不到任何允许它的 matplotlib 工具。似乎也没有其他软件包包含它，至少不能轻松绘制一系列流动持续时间曲线。

流动持续时间曲线的示例如下：

可以在此处找到有关如何创建它的一般说明： http://www.renewablesfirst.co.uk/hydropower/hydropower-learning-centre/what-is-a-flow-duration-curve/

因此，流动持续时间曲线的基本计算和绘制非常简单。只需计算超出量并将其与排序的时间序列进行对比（参见 ImportanceOfBeingErnest 的答案）。但是，如果您有多个时间序列并且想要绘制所有超出概率的值范围，则会变得更加困难。我在对这个线程的回答中提出了一个解决方案，但很高兴听到更优雅的解决方案。我的解决方案还包含一个易于使用的子图，因为不同位置通常有多个时间序列，必须单独绘制。

我所说的流量持续时间曲线范围的一个例子是：

在这里您可以看到三个不同的曲线。黑线是河流的测量值，而两个阴影区域是这两个模型的所有模型运行的范围。那么计算和绘制多个时间序列的一系列流动持续时间曲线最简单的方法是什么？

【问题讨论】：

从链接看来，您只是想根据任意 x 比例绘制排序后的流速......
确实，绘图本身是微不足道的。但是计算不是，特别是如果您想要一个有范围的流动持续时间曲线。但同样适用于例如seaborn.kdeplot。因此，我认为这可能对其他与我有同样问题的人有所帮助。

标签： python matplotlib time-series

【解决方案1】：

如果我正确理解流量持续时间曲线的概念，您只需将流量绘制为超出量的函数。

import numpy as np
import matplotlib.pyplot as plt

data = np.random.rayleigh(10,144)

sort = np.sort(data)[::-1]
exceedence = np.arange(1.,len(sort)+1) / len(sort)

plt.plot(exceedence*100, sort)
plt.xlabel("Exceedence [%]")
plt.ylabel("Flow rate")
plt.show()

从这里您可以轻松地了解到 11 或更大的流量预计 60% 的时间。

如果有多个数据集，可以使用fill_between 将它们绘制为一个范围。

import numpy as np; np.random.seed(42)
import matplotlib.pyplot as plt

data0 = np.random.rayleigh(10,144)
data1 = np.random.rayleigh(9,144)
data2 = np.random.normal(10,5,144)

data = np.c_[data0, data1, data2]

exceedence = np.arange(1.,len(data)+1) /len(data)
sort = np.sort(data, axis=0)[::-1]

plt.fill_between(exceedence*100, np.min(sort, axis=1),np.max(sort, axis=1))

plt.xlabel("Exceedence [%]")
plt.ylabel("Flow rate")
plt.grid()
plt.show()

【讨论】：

您确实理解正确。感谢您的代码示例。您以这种方式解决了比我更优雅的问题。不过，我仍然更喜欢我的解决方案，因为它允许更精细的调整和绘制一系列流动持续时间曲线的能力，我确实发现这很难解决。但如果您对此也有想法，我会很高兴听到。
@F.Jehn 我根本不理解你的代码，所以我认为如果这个问题对其他人有用，则需要一个更简单的例子，因此提供了这个答案。
这是个好主意。它试图重新表述我的问题，以便更容易理解我的意思。
所以我用我认为你所说的“范围”更新了这个。
太好了。似乎我的解决方案很糟糕。我现在将您的解决方案合并到我的解决方案中。谢谢。

【解决方案2】：

编辑：由于我的第一个答案过于复杂和不雅，我重写了它以合并 ImportanceOfBeingErnest 的解决方案。我仍然将新版本与 ImportanceOfBeingErnest 的版本放在一起，因为我认为附加功能可能会让其他人更容易为他们的时间序列绘制流持续时间曲线。如果有人可能有其他想法，请参阅：Github Repository

特点是：

更改范围流量持续时间曲线的百分位数
易于用作独立图形或子图。如果提供了 subplot 对象，则在该对象中绘制流持续时间曲线。当提供 None 时，它会创建一个并返回它
范围曲线及其比较的单独 kwargs
使用关键字将 y 轴更改为对数刻度
扩展示例以帮助了解其用法。

代码如下：

# -*- coding: utf-8 -*-
"""
Created on Thu Mar 15 10:09:13 2018

@author: Florian Ulrich Jehn
"""
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np


def flow_duration_curve(x, comparison=None, axis=0, ax=None, plot=True, 
                        log=True, percentiles=(5, 95), decimal_places=1,
                        fdc_kwargs=None, fdc_range_kwargs=None, 
                        fdc_comparison_kwargs=None):
    """
    Calculates and plots a flow duration curve from x. 

    All observations/simulations are ordered and the empirical probability is
    calculated. This is then plotted as a flow duration curve. 

    When x has more than one dimension along axis, a range flow duration curve 
    is plotted. This means that for every probability a min and max flow is 
    determined. This is then plotted as a fill between. 

    Additionally a comparison can be given to the function, which is plotted in
    the same ax.

    :param x: numpy array or pandas dataframe, discharge of measurements or 
    simulations
    :param comparison: numpy array or pandas dataframe of discharge that should
    also be plotted in the same ax
    :param axis: int, axis along which x is iterated through
    :param ax: matplotlib subplot object, if not None, will plot in that 
    instance
    :param plot: bool, if False function will not show the plot, but simply
    return the ax object
    :param log: bool, if True plot on loglog axis
    :param percentiles: tuple of int, percentiles that should be used for 
    drawing a range flow duration curve
    :param fdc_kwargs: dict, matplotlib keywords for the normal fdc
    :param fdc_range_kwargs: dict, matplotlib keywords for the range fdc
    :param fdc_comparison_kwargs: dict, matplotlib keywords for the comparison 
    fdc

    return: subplot object with the flow duration curve in it
    """
    # Convert x to an pandas dataframe, for easier handling
    if not isinstance(x, pd.DataFrame):
        x = pd.DataFrame(x)

    # Get the dataframe in the right dimensions, if it is not in the expected
    if axis != 0:
        x = x.transpose()

    # Convert comparison to a dataframe as well
    if comparison is not None and not isinstance(comparison, pd.DataFrame):
        comparison = pd.DataFrame(comparison)
        # And transpose it is neccesary
        if axis != 0:
            comparison = comparison.transpose()

    # Create an ax is neccesary
    if ax is None:
        fig, ax = plt.subplots(1,1)

    # Make the y scale logarithmic if needed
    if log:
        ax.set_yscale("log")

    # Determine if it is a range flow curve or a normal one by checking the 
    # dimensions of the dataframe
    # If it is one, make a single fdc
    if x.shape[1] == 1:
        plot_single_flow_duration_curve(ax, x[0], fdc_kwargs)   

    # Make a range flow duration curve
    else:
        plot_range_flow_duration_curve(ax, x, percentiles, fdc_range_kwargs)

    # Add a comparison to the plot if is present
    if comparison is not None:
        ax = plot_single_flow_duration_curve(ax, comparison[0], 
                                             fdc_comparison_kwargs)    

    # Name the x-axis
    ax.set_xlabel("Exceedence [%]")

    # show if requested
    if plot:
        plt.show()

    return ax


def plot_single_flow_duration_curve(ax, timeseries, kwargs):
    """
    Plots a single fdc into an ax.

    :param ax: matplotlib subplot object
    :param timeseries: list like iterable
    :param kwargs: dict, keyword arguments for matplotlib

    return: subplot object with a flow duration curve drawn into it
    """
    # Get the probability
    exceedence = np.arange(1., len(timeseries) + 1) / len(timeseries)
    exceedence *= 100
    # Plot the curve, check for empty kwargs
    if kwargs is not None:
        ax.plot(exceedence, sorted(timeseries, reverse=True), **kwargs)
    else:
        ax.plot(exceedence, sorted(timeseries, reverse=True))
    return ax


def plot_range_flow_duration_curve(ax, x, percentiles, kwargs):
    """
    Plots a single range fdc into an ax.

    :param ax: matplotlib subplot object
    :param x: dataframe of several timeseries
    :param decimal_places: defines how finely grained the range flow duration 
    curve is calculated and drawn. A low values makes it more finely grained.
    A value which is too low might create artefacts.
    :param kwargs: dict, keyword arguments for matplotlib

    return: subplot object with a range flow duration curve drawn into it
    """
    # Get the probabilites
    exceedence = np.arange(1.,len(np.array(x))+1) /len(np.array(x))
    exceedence *= 100

    # Sort the data
    sort = np.sort(x, axis=0)[::-1]

    # Get the percentiles
    low_percentile = np.percentile(sort, percentiles[0], axis=1)
    high_percentile = np.percentile(sort, percentiles[1], axis=1)

    # Plot it, check for empty kwargs
    if kwargs is not None:
        ax.fill_between(exceedence, low_percentile, high_percentile, **kwargs)
    else:
        ax.fill_between(exceedence, low_percentile, high_percentile)
    return ax

使用方法：

# Create test data
np_array_one_dim = np.random.rayleigh(5, [1, 300])
np_array_75_dim = np.c_[np.random.rayleigh(11 ,[25, 300]),
                        np.random.rayleigh(10, [25, 300]),
                        np.random.rayleigh(8, [25, 300])]
df_one_dim = pd.DataFrame(np.random.rayleigh(9, [1, 300]))
df_75_dim = pd.DataFrame(np.c_[np.random.rayleigh(8, [25, 300]),
                               np.random.rayleigh(15, [25, 300]),
                               np.random.rayleigh(3, [25, 300])])
df_75_dim_transposed = pd.DataFrame(np_array_75_dim.transpose())

# Call the function with all different arguments
fig, subplots = plt.subplots(nrows=2, ncols=3)
ax1 = flow_duration_curve(np_array_one_dim, ax=subplots[0,0], plot=False,
                          axis=1, fdc_kwargs={"linewidth":0.5})
ax1.set_title("np array one dim\nwith kwargs")

ax2 = flow_duration_curve(np_array_75_dim, ax=subplots[0,1], plot=False,
                          axis=1, log=False, percentiles=(0,100))
ax2.set_title("np array 75 dim\nchanged percentiles\nnolog")

ax3 = flow_duration_curve(df_one_dim, ax=subplots[0,2], plot=False, axis=1,
                          log=False, fdc_kwargs={"linewidth":0.5})
ax3.set_title("\ndf one dim\nno log\nwith kwargs")

ax4 = flow_duration_curve(df_75_dim, ax=subplots[1,0], plot=False, axis=1,
                          log=False)
ax4.set_title("df 75 dim\nno log")

ax5 = flow_duration_curve(df_75_dim_transposed, ax=subplots[1,1], 
                          plot=False)
ax5.set_title("df 75 dim transposed")

ax6 = flow_duration_curve(df_75_dim, ax=subplots[1,2], plot=False,
                          comparison=np_array_one_dim, axis=1, 
                          fdc_comparison_kwargs={"color":"black", 
                                                 "label":"comparison",
                                                 "linewidth":0.5},
                          fdc_range_kwargs={"label":"range_fdc"})
ax6.set_title("df 75 dim\n with comparison\nwith kwargs")
ax6.legend()

# Show the beauty
fig.tight_layout()
plt.show()

结果如下所示：

【讨论】：