Replace Elements In Numpy Array Avoiding Loops

October 06, 2023 Post a Comment

I have a quite large 1d numpy array Xold with given values. These values shall be replaced according to the rule specified by a 2d numpy array Y: An example would be Xold=np.arra

Solution 1:

SELECTING THE FASTEST METHOD

Answers to this question provided a nice assortment of ways to replace elements in numpy array. Let's check, which one would be the quickest.

TL;DR: Numpy indexing is the winner

def meth1(): # suggested by @Slam
    for old, newin Y:  
        Xold[Xold == old] = newdef meth2(): # suggested by myself, convert y_dict = dict(Y) first
     [y_dict[i] if i in y_dict.keys() else i for i in Xold]

 def meth3(): # suggested by @Eelco Hoogendoom, import numpy_index as npi first
     npi.remap(Xold, keys=Y[:, 0], values=Y[:, 1])

 def meth4(): # suggested by @Brad Solomon, import pandas as pd first 
     pd.Series(Xold).map(pd.Series(Y[:, 1], index=Y[:, 0])).values

  # suggested by @jdehesa. create Xnew = Xold.copy() and index
  # idx = np.searchsorted(Xold, Y[:, 0]) firstdef meth5():             
     Xnew[idx] = Y[:, 1]

Not so surprising results

In[39]: timeit.timeit(meth1, number=1000000)                                                                      
 Out[39]: 12.08In[40]: timeit.timeit(meth2, number=1000000)                                                                      
 Out[40]: 2.87In[38]: timeit.timeit(meth3, number=1000000)                                                                      
 Out[38]: 55.39In[12]: timeit.timeit(meth4, number=1000000)                                                                                      
 Out[12]: 256.84In[50]: timeit.timeit(meth5, number=1000000)                                                                                      
 Out[50]: 1.12

So, the good old list comprehension is the second fastest, and the winning approach is numpy indexing combined with searchsorted().

Solution 2:

We can use np.searchsorted for a generic case when the data in first column of Y is not necessarily sorted -

sidx = Y[:,0].argsort()
out = Y[sidx[np.searchsorted(Y[:,0], Xold, sorter=sidx)],1]

Sample run -

In [53]: Xold
Out[53]: array([14, 10, 12, 13, 11])

In [54]: Y
Out[54]: 
array([[ 10,   0],
       [ 11, 100],
       [ 13, 300],
       [ 14, 400],
       [ 12, 200]])

In [55]: sidx = Y[:,0].argsort()
    ...: out= Y[sidx[np.searchsorted(Y[:,0], Xold, sorter=sidx)],1]

In [56]: outOut[56]: array([400,   0, 200, 300, 100])

If not all elements have corresponding mappings available, then we need to do a bit more of work, like so -

Baca Juga

sidx = Y[:,0].argsort()
sorted_indx = np.searchsorted(Y[:,0], Xold, sorter=sidx)
sorted_indx[sorted_indx==len(sidx)] = len(sidx)-1
idx_out = sidx[sorted_indx]
out = Y[idx_out,1]
out[Y[idx_out,0]!=Xold] = 0 # NA values as 0s

Solution 3:

Here is one possibility:

import numpy as np

Xold = np.array([0, 1, 2, 3, 4])
Y = np.array([[0, 0], [1, 100], [3, 300], [4, 400], [2, 200]])
# Checkevery X value against every Y firstvalue
m = Xold == Y[:, 0, np.newaxis]
# Check which elements in X are among Y firstvalues
# (so values that arenotin Y arenot replaced)
m_X = np.any(m, axis=0)
# Compute replacement
# Xold * (1- m_X) are the non-replaced values
# np.sum(Y[:, 1, np.newaxis] * m, axis=0) * m_X are the replaced values
Xnew = Xold * (1- m_X) + np.sum(Y[:, 1, np.newaxis] * m, axis=0) * m_X
print(Xnew)

Output:

[  0 100 200 300 400]

This method works for more or less every case (unsorted arrays, multiple repetitions of values in X, values in X not replaced, values in Y not replacing anything in X), except if you give two replacements for the same value in Y, which would be wrong anyway. However, its time and space complexity is the product of the sizes of X and Y. If your problem has additional constraints (data is sorted, no repetitions, etc.) it might be possible to do something better. For example, if X is sorted with no repeated elements and every value in Y replaces a value in X (like in your example), this would probably be faster:

import numpy as np

Xold = np.array([0, 1, 2, 3, 4])
Y = np.array([[0, 0], [1, 100], [3, 300], [4, 400], [2, 200]])
idx = np.searchsorted(Xold, Y[:, 0])
Xnew = Xold.copy()
Xnew[idx] = Y[:, 1]
print(Xnew)
# [  0100200300400]

Solution 4:

First improvement you can do is to use numpy indexing, but you'll still have 1 loop:

forold, newin Y: 
    Xold[Xold ==old] =new

Solution 5:

You can use slicing features in combination with argsort method.

Xnew = Y[Y[:,1].argsort()][:, 1][Xold]

Output

array([  0, 100, 200, 300, 400])

Python Developer