Skip to content Skip to sidebar Skip to footer

Parallelize (not Symmetric) Loops In Python

The following code is written in python and it works, i.e. returns the expected result. However, it is very slow and I believe that can be optimized. G_tensor = numpy.matlib.identi

Solution 1:

Here is a proposition that should now work (I corrected a few mistakes) but that nonetheless sould give you the general idea of how verctorization can be applied to your code in order to make efficient use of numpy arrays. Everything is build in "one-pass" (ie without any for-loops) which is the "numpythonic" way:

import numpy as np
import math
N=2
k,alpha=1,1
G = np.zeros((N,3,N,3),dtype=complex)

# np.mgrid gives convenient arrays of indices that # can be used to write readable code
i,x_i,j,x_j = np.ogrid[0:N,0:3,0:N,0:3]
# A quick demo on how we can make the identity tensor with it
G[np.where((i == j) & (x_i == x_j))] = 1
#print(G.reshape(N*3,N*3))

positions=np.random.rand(N,3)
# Here I assumed position has shape [N,3] # I build arr[i,j]=position[i] - position[j] using broadcasting # by turning position into a column and a row 
R = np.linalg.norm(positions[None,:,:]-positions[:,None,:],axis=-1)
# R is now a N,N matrix of all the distances#we reshape R to N,1,N,1 so that it can be broadcated to N,3,N,3 
R=R.reshape(N,1,N,1)

r=positions[None,:,:]-positions[:,None,:]

krq = (k*R)**2
pf = -k**2*alpha*np.exp(1j*k*R)/(4*math.pi*R)
a = 1.+(1j*k*R-1.)/(krq)
b = (3.-3.*1j*k*R-krq)/(krq)

#print(np.isnan(pf[:,0,:,0]))# here we build all the combination rx*rx rx*ry etc...
comb_r=(r[:,:,:,None]*r[:,:,None,:]).transpose([0,2,1,3])
#we compute G without the pf*A term
G = pf*(b * comb_r/(R**2))
#we add pf*a term where it is due
G[np.where(x_i == x_j)] = (G + pf*a)[np.where(x_i == x_j)]
# we didn't bother with the identity or condition i!=j so we enforce it here
G[np.where(i == j)] = 0
G[np.where((i == j) & (x_i == x_j))] = 1

print(G.reshape(N*3,N*3))

Solution 2:

Python is not a fast language. Number crunching with python should always use for time critical parts code written in a compiled language. With compilation down to the CPU level you can speed up the code by a factor up to 100 and then still go for parallelization. So I would not look down to using more cores doing inefficient stuff, but to work more efficient. I see the following ways to speed up the code:

1) Better use of numpy: Can you do your calculations instead on scalar level directly on vector/matrix level? eg. rx = positions[:,0]-positions[0,:] (not checked if that is correct) but something along those lines.

If that is not possible with your kind of calculations, than you can go for option 2 or 3

2) Use cython. Cython compiles Python code to C, which is then compiled to your CPU. By using static typing at the right places you can make your code much faster, see cython tutorials eg.: http://cython.readthedocs.io/en/latest/src/quickstart/cythonize.html

3) If you are familiar with FORTRAN, it might be a good idea to write just this part in FORTRAN and then call it from Python using f2py. In fact, your code looks a lot like FORTRAN anyway. For C and C++ SWIG is one great tool to make compiled code available in Python, but there are plenty of other techniques (cython, Boost::Python, ctypes, numba etc.)

When you have done this, and it is still to slow, using GPU power with pyCUDA or parallelization with mpi4py or multiprocessing might be an option.

Post a Comment for "Parallelize (not Symmetric) Loops In Python"