【发布时间】:2016-06-20 04:41:03
【问题描述】:
我编写了如下所示的这段代码。我遇到了严重的性能问题。尤其是我循环 5000 万次的循环(对于 z in range(total):) 似乎非常慢。我可以修改它以提高效率吗? - 也许修改它如何在 r1,r2 中存储最后 10 个值的总和?
import numpy as np
import math
import scipy.stats as sp
# Define sample size
sample=4999999
cutoff=int((sample+1)/100)
# Define days for x-day VaR
xdays=10
# Calculate the whole sample size and extended total sample size
size=sample*xdays+xdays-1
total=size+xdays
cutoff_o=int((size+1)/100)
# Sample values for kurtosis
#kurt=[0.0000001,1.0,2.0,3.0,4.0,5.0,6.0,10.0]
kurt=[6.0]
# Number of repetitions
rep=2
# Define correlation coefficient
rho=0.5
# Loop for different iterations
for x in range(rep):
uni=sp.uniform.rvs(size=total)
# Loop for different values of kurtosis
for y in kurt:
df=(6.0/y)+4.0
# Initialize arrays
t_corr=np.empty(total)
n_corr=np.empty(total)
t_corr_2=np.empty(total)
r1=np.empty(sample)
r2=np.empty(size)
r3=np.empty(sample)
r4=np.empty(size)
# Define t dist from uniform
t_dist=sp.t.ppf(uni,df)
n_dist=sp.norm.ppf(uni)
# Loop to generate autocorrelated distributions
for z in range(total):
if z==0:
t_corr[z]=t_dist[z]
n_corr[z]=n_dist[z]
t_corr_2[z]=sp.t.ppf(sp.norm.cdf(n_corr[z]),df)
else:
t_corr[z]=rho*t_dist[z-1] + math.sqrt((1-rho**2))*t_dist[z]
n_corr[z]=rho*n_dist[z-1] + math.sqrt((1-rho**2))*n_dist[z]
t_corr_2[z]=sp.t.ppf(sp.norm.cdf(n_corr[z]),df)
if z>xdays-1:
z_x=int(z/xdays)-1
if (z%xdays)==0 and z_x<sample:
r1[z_x]= sum(t_corr[z-10:z])
r3[z_x]= sum(t_corr_2[z-10:z])
r2[z-xdays]= sum(t_corr[z-10:z])
r4[z-xdays]= sum(t_corr_2[z-10:z])
print (np.partition(r1, cutoff-1)[cutoff-1], np.partition(r3, cutoff-1)[cutoff-1], np.partition(r2, cutoff_o-1)[cutoff_o-1], np.partition(r4, cutoff_o-1)[cutoff_o-1])
print ()
【问题讨论】:
-
学习使用分析器。我可以在这里猜到一些改进,但我不想保证它们甚至会提高速度。例如。
t_corr、n_corr和t_corr_2的元素似乎不超过十个,t_dist和n_dist的单个元素,那么为什么要为它们创建大数组呢? -
如果您正在寻求代码中的性能改进,您应该在CodeReview 询问
标签: python performance loops python-3.x coding-efficiency