【发布时间】:2020-09-23 06:29:09
【问题描述】:
我正在尝试计算熊猫系列的熵。具体来说,我将Direction 中的字符串分组为一个序列。具体来说,使用这个函数:
diff_dir = df.iloc[0:,1].ne(df.iloc[0:,1].shift()).cumsum()
将返回 Direction 中的字符串计数,这些字符串在更改之前是相同的。所以对于相同Direction字符串的每个序列,我想计算X,Y的熵。
使用代码相同字符串的排序是:
0 1
1 1
2 1
3 1
4 1
5 2
6 2
7 2
8 3
9 3
此代码以前可以工作,但现在返回错误。我不确定这是否是在升级之后。
import pandas as pd
import numpy as np
def ApEn(U, m = 2, r = 0.2):
'''
Approximate Entropy
Quantify the amount of regularity over time-series data.
Input parameters:
U = Time series
m = Length of compared run of data (subseries length)
r = Filtering level (tolerance). A positive number
'''
def _maxdist(x_i, x_j):
return max([abs(ua - va) for ua, va in zip(x_i, x_j)])
def _phi(m):
x = [U.tolist()[i:i + m] for i in range(N - m + 1)]
C = [len([1 for x_j in x if _maxdist(x_i, x_j) <= r]) / (N - m + 1.0) for x_i in x]
return (N - m + 1.0)**(-1) * sum(np.log(C))
N = len(U)
return abs(_phi(m + 1) - _phi(m))
def Entropy(df):
'''
Calculate entropy for individual direction
'''
df = df[['Time','Direction','X','Y']]
diff_dir = df.iloc[0:,1].ne(df.iloc[0:,1].shift()).cumsum()
# Calculate ApEn grouped by direction.
df['ApEn_X'] = df.groupby(diff_dir)['X'].transform(ApEn)
df['ApEn_Y'] = df.groupby(diff_dir)['Y'].transform(ApEn)
return df
df = pd.DataFrame(np.random.randint(0,50, size = (10, 2)), columns=list('XY'))
df['Time'] = range(1, len(df) + 1)
direction = ['Left','Left','Left','Left','Left','Right','Right','Right','Left','Left']
df['Direction'] = direction
# Calculate defensive regularity
entropy = Entropy(df)
错误:
return (N - m + 1.0)**(-1) * sum(np.log(C))
ZeroDivisionError: 0.0 cannot be raised to a negative power
【问题讨论】:
-
groupby 之后的一些组的大小为 1,这是预期的吗?
-
此外
df['ApEn_X'] = df.groupby(diff_X)['X'].transform(ApEn)将不起作用,因为如果您有一组说大小> 1,那么df.groupby(diff_X)['X'].transform(ApEn)的长度将小于df,并且分配将失败。你能解释一下代码中diff_X = df.iloc[1:,1].ne(df.iloc[1:,1].shift()).cumsum()的意图吗? -
我已经包含了更多的细节。它测量
Direction中字符串的长度,直到发生变化。