【发布时间】:2021-12-02 17:53:39
【问题描述】:
我编写了一个简短的代码来计算股票的对数收益和数据的香农熵。但是,我得到了香农熵的负值,这非常奇怪。我正在使用 S=-plogp。 p 不是离散区间有问题吗?如何将 p 划分为多个区间,以便将熵计算为 S = - SUM_k(pklogpk)?
import yfinance as yf
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from scipy.stats import norm
plot_lreturnshist = False
plot_lreturns = True
#Import the data from yfinance. What Ticker, what period of time we want
AAPL = yf.Ticker("AAPL")
history = AAPL.history(period = "5y")
#Extract only the close data
Close = history["Close"]
#Set up a recurrence to add a column in our dataframe for the logarithmic returns of the stock
#Log returns are calculated as log_2(Close(day x)/Close(day x-1))
logreturn = []
for i in range(len(Close)):
if i == 0:
logreturn.append(0)
else:
x = np.log2(abs(Close[i]))-np.log2(abs(Close[i-1]))
logreturn.append(x)
#Now we have an array with the logarithmic returns, we add it to the pandas dataframe
history["logreturn"] = logreturn
#We then pull it out for ease of use
lreturn = history["logreturn"]
if plot_lreturns == True:
fig,ax = plt.subplots()
ax.plot(lreturn, color = "dodgerblue")
#We plot the data in a histogram, by
if plot_lreturnshist == True:
mu, std = norm.fit(lreturn)
plt.hist(lreturn, bins=50, density=True, alpha=0.6, color='g', ec = 'black')
xmin, xmax = plt.xlim()
x = np.linspace(xmin, xmax, 100)
p = norm.pdf(x, mu, std)
plt.plot(x, p, 'k', linewidth=2)
title = r"Fit results: $\mu$ = $%.2f$, $\sigma$ = $%.2f$" % (mu, std)
plt.title(title)
plt.xlabel(r"$\ln(Y_{t+1}/Y_t$)")
plt.show()
mu, std = norm.fit(lreturn)
p = norm.pdf(x, mu, std)
S = np.sum(-p*np.log(p))
print("S")
【问题讨论】:
标签: python pandas finance entropy yfinance