负香农熵答案

【问题标题】：Negative Shannon Entropy负香农熵
【发布时间】：2021-12-02 17:53:39
【问题描述】：

我编写了一个简短的代码来计算股票的对数收益和数据的香农熵。但是，我得到了香农熵的负值，这非常奇怪。我正在使用 S=-plogp。 p 不是离散区间有问题吗？如何将 p 划分为多个区间，以便将熵计算为 S = - SUM_k(pklogpk)？

import yfinance as yf
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from scipy.stats import norm


plot_lreturnshist = False
plot_lreturns = True

#Import the data from yfinance. What Ticker, what period of time we want
AAPL = yf.Ticker("AAPL")
history = AAPL.history(period = "5y")
#Extract only the close data
Close = history["Close"]



#Set up a recurrence to add a column in our dataframe for the logarithmic returns of the stock
#Log returns are calculated as log_2(Close(day x)/Close(day x-1))  

logreturn = []
for i in range(len(Close)):
    if i == 0:
        logreturn.append(0) 
    else:
        x = np.log2(abs(Close[i]))-np.log2(abs(Close[i-1]))
        logreturn.append(x)
#Now we have an array with the logarithmic returns, we add it to the pandas dataframe
history["logreturn"] = logreturn
#We then pull it out for ease of use
lreturn = history["logreturn"]

if plot_lreturns == True:
    fig,ax = plt.subplots()
    ax.plot(lreturn, color = "dodgerblue")  


#We plot the data in a histogram, by 
if plot_lreturnshist == True:
    mu, std = norm.fit(lreturn)
    plt.hist(lreturn, bins=50, density=True, alpha=0.6, color='g', ec = 'black')
    
    xmin, xmax = plt.xlim()
    x = np.linspace(xmin, xmax, 100)
    p = norm.pdf(x, mu, std)
    plt.plot(x, p, 'k', linewidth=2)
    title = r"Fit results: $\mu$ = $%.2f$,  $\sigma$ = $%.2f$" % (mu, std)
    plt.title(title)
    plt.xlabel(r"$\ln(Y_{t+1}/Y_t$)")

    plt.show()

mu, std = norm.fit(lreturn)
p = norm.pdf(x, mu, std)
S = np.sum(-p*np.log(p))
print("S")

【问题讨论】：

标签： python pandas finance entropy yfinance

【解决方案1】：

我已经根据移动体积直方图作为概率输入制作了一个熵指标，我也得到了负值。在热力学中，负熵意味着你获得热量，所以也许这意味着市场活动增加，但它并没有告诉你朝哪个方向。

您可以在我的lib of indicators @ github 中找到我的指标尝试。它简称为“熵”

编辑：根据您的评论，我修改了熵函数，现在它给出了正值

def entropy(c_close, c_volume, period, bins=2):
    size = len(c_close)
    out = np.array([np.nan] * size)
    # ROLLING WINDOW
    for i in range(period - 1, size):
        e = i + 1
        s = e - period
        close_w = c_close[s:e]
        volume_w = c_volume[s:e]
        # HISTO BASED ON CLOSE / VOLUME
        min_w = np.min(close_w)
        norm = 1.0 / (np.max(close_w) - min_w)
        sum_h = np.array([0.0] * bins)
        for j in range(period):
            sum_h[int((close_w[j] - min_w) * bins * norm)] += volume_w[j] ** 2
        count = np.sqrt(sum_h)
        # NORMALIZE HISTO COUNT (CONVERT TO PROBA)
        count = count / sum(count)
        # DELETE PROBAS = 0 TO AVOID GAPS
        count = count[np.nonzero(count)]
        # ENTROPY 
        out[i] = -sum(count * np.log2(count))
     return out

【讨论】：