Replace Elements In Numpy Array Avoiding Loops
Solution 1:
SELECTING THE FASTEST METHOD
Answers to this question provided a nice assortment of ways to replace elements in numpy array. Let's check, which one would be the quickest.
TL;DR: Numpy indexing is the winner
def meth1(): # suggested by @Slam
for old, newin Y:
Xold[Xold == old] = newdef meth2(): # suggested by myself, convert y_dict = dict(Y) first
[y_dict[i] if i in y_dict.keys() else i for i in Xold]
def meth3(): # suggested by @Eelco Hoogendoom, import numpy_index as npi first
npi.remap(Xold, keys=Y[:, 0], values=Y[:, 1])
def meth4(): # suggested by @Brad Solomon, import pandas as pd first
pd.Series(Xold).map(pd.Series(Y[:, 1], index=Y[:, 0])).values
# suggested by @jdehesa. create Xnew = Xold.copy() and index
# idx = np.searchsorted(Xold, Y[:, 0]) firstdef meth5():
Xnew[idx] = Y[:, 1]
Not so surprising results
In[39]: timeit.timeit(meth1, number=1000000)
Out[39]: 12.08In[40]: timeit.timeit(meth2, number=1000000)
Out[40]: 2.87In[38]: timeit.timeit(meth3, number=1000000)
Out[38]: 55.39In[12]: timeit.timeit(meth4, number=1000000)
Out[12]: 256.84In[50]: timeit.timeit(meth5, number=1000000)
Out[50]: 1.12
So, the good old list comprehension is the second fastest, and the winning approach is numpy indexing combined with searchsorted()
.
Solution 2:
We can use np.searchsorted
for a generic case when the data in first column of Y
is not necessarily sorted -
sidx = Y[:,0].argsort()
out = Y[sidx[np.searchsorted(Y[:,0], Xold, sorter=sidx)],1]
Sample run -
In [53]: Xold
Out[53]: array([14, 10, 12, 13, 11])
In [54]: Y
Out[54]:
array([[ 10, 0],
[ 11, 100],
[ 13, 300],
[ 14, 400],
[ 12, 200]])
In [55]: sidx = Y[:,0].argsort()
...: out= Y[sidx[np.searchsorted(Y[:,0], Xold, sorter=sidx)],1]
In [56]: outOut[56]: array([400, 0, 200, 300, 100])
If not all elements have corresponding mappings available, then we need to do a bit more of work, like so -
sidx = Y[:,0].argsort()
sorted_indx = np.searchsorted(Y[:,0], Xold, sorter=sidx)
sorted_indx[sorted_indx==len(sidx)] = len(sidx)-1
idx_out = sidx[sorted_indx]
out = Y[idx_out,1]
out[Y[idx_out,0]!=Xold] = 0 # NA values as 0s
Solution 3:
Here is one possibility:
import numpy as np
Xold = np.array([0, 1, 2, 3, 4])
Y = np.array([[0, 0], [1, 100], [3, 300], [4, 400], [2, 200]])
# Checkevery X value against every Y firstvalue
m = Xold == Y[:, 0, np.newaxis]
# Check which elements in X are among Y firstvalues
# (so values that arenotin Y arenot replaced)
m_X = np.any(m, axis=0)
# Compute replacement
# Xold * (1- m_X) are the non-replaced values
# np.sum(Y[:, 1, np.newaxis] * m, axis=0) * m_X are the replaced values
Xnew = Xold * (1- m_X) + np.sum(Y[:, 1, np.newaxis] * m, axis=0) * m_X
print(Xnew)
Output:
[ 0 100 200 300 400]
This method works for more or less every case (unsorted arrays, multiple repetitions of values in X, values in X not replaced, values in Y not replacing anything in X), except if you give two replacements for the same value in Y, which would be wrong anyway. However, its time and space complexity is the product of the sizes of X and Y. If your problem has additional constraints (data is sorted, no repetitions, etc.) it might be possible to do something better. For example, if X is sorted with no repeated elements and every value in Y replaces a value in X (like in your example), this would probably be faster:
import numpy as np
Xold = np.array([0, 1, 2, 3, 4])
Y = np.array([[0, 0], [1, 100], [3, 300], [4, 400], [2, 200]])
idx = np.searchsorted(Xold, Y[:, 0])
Xnew = Xold.copy()
Xnew[idx] = Y[:, 1]
print(Xnew)
# [ 0100200300400]
Solution 4:
First improvement you can do is to use numpy indexing, but you'll still have 1 loop:
forold, newin Y:
Xold[Xold ==old] =new
Solution 5:
You can use slicing
features in combination with argsort
method.
Xnew = Y[Y[:,1].argsort()][:, 1][Xold]
Output
array([ 0, 100, 200, 300, 400])
Post a Comment for "Replace Elements In Numpy Array Avoiding Loops"