在 python/NumPy 中广播/矢量化内部和外部 for 循环答案

【问题标题】：Broadcasting/Vectorizing inner and outer for loops in python/NumPy在 python/NumPy 中广播/矢量化内部和外部 for 循环
【发布时间】：2020-07-09 09:01:08
【问题描述】：

目的

我使用vectorization 将双for loop 变成了单for loop。我现在想摆脱最后一个loop。

我想 slice 一个 Nx3 array 坐标并计算切片部分和剩余部分之间的距离不使用 for 循环。

两种情况

(1) 切片始终为3x3。

(2) 切片是可变的，即Mx3，其中 M 始终明显小于 N

将切片的 1 行与其余部分交互的交互向量化很简单。但是，我坚持使用 for 循环来执行（在大小为 3 的切片的情况下）3 个循环，以计算所有距离。

上下文：

Nx3 数组是原子坐标，切片是特定分子中的所有原子。我想计算给定分子与系统其余部分相互作用的能量。第一步是计算分子中每个原子与所有其他原子之间的距离。第二部分是在函数中使用这些距离来计算能量，这超出了本题的范围。

这是我的一个最小工作示例（我有 vectorized 内部循环，但是，需要（真的很想......）vectorize outer loop。那个循环并不总是只有大小为 3，python 在 for 循环中很慢。

最小的工作示例

import numpy as np 

box=10 # simulation box is size 10 for this example
r = np.random.rand(1000,3) * box  # avoids huge numbers later by scaling  coords

start=0 #fixed starting index for example (first atom)
end=2   #fixed ending index for example   (last atom)

rj=np.delete(r, np.arange(start,end), 0)
ri = r[np.arange(start,end),:]

atoms_in_molecule, coords = np.shape(ri)
energy = 0
for a in range(atoms_in_molecule):
    rij = ri[a,:] - rj                # I want to get rid of this 'a' index dependance
    rij = rij - np.rint(rij/box)*box  # periodic boundary conditions - necessary
    rij_sq = np.sum(rij**2,axis=1)

    # perform energy calculation using rij_sq
    ener = 4 * ((1/rij_sq)**12 - (1/rij_sq)**6)  # dummy LJ, do not optimize
    energy += np.sum(ener)

print(energy)

这个问题不是关于优化我已经拥有的矢量化。我玩过 pdist/cdist 和其他人。我想要的只是摆脱讨厌的 for 原子循环。我会优化其余的。

【问题讨论】：

标签： python-3.x numpy vectorization array-broadcasting numpy-slicing

【解决方案1】：

这里是如何做到的：

R = ri[:,None] - rj[None, :]
R = R - np.rint(R/box)*box
R_sq = np.sum(np.square(R), axis=2)

energy = np.sum(4 * ((1/R_sq)**12 - (1/R_sq)**6))

【讨论】：