Python Pandas: How To Create A Binary Matrix From Column Of Lists?
I have a Python Pandas DataFrame like the following: 1 0 a, b 1 c 2 d 3 e a, b is a string representing a list of user features How can I convert this into a bi
Solution 1:
I think you can use:
df = df.iloc[:,0].str.split(', ', expand=True)
.stack()
.reset_index(drop=True)
.str.get_dummies()
print df
a b c d e
010000101000200100300010400001
EDITED:
printdf.iloc[:,0].str.replace(' ','').str.get_dummies(sep=',')
abcde011000100100200010300001
Solution 2:
I wrote a general function, with support for grouping, to do this a while back:
defsublist_uniques(data,sublist):
categories = set()
for d,t in data.iterrows():
try:
for j in t[sublist]:
categories.add(j)
except:
passreturnlist(categories)
defsublists_to_dummies(f,sublist,index_key = None):
categories = sublist_uniques(f,sublist)
frame = pd.DataFrame(columns=categories)
for d,i in f.iterrows():
iftype(i[sublist]) == listor np.array:
try:
if index_key != None:
key = i[index_key]
f =np.zeros(len(categories))
for j in i[sublist]:
f[categories.index(j)] = 1if key in frame.index:
for j in i[sublist]:
frame.loc[key][j]+=1else:
frame.loc[key]=f
else:
f =np.zeros(len(categories))
for j in i[sublist]:
f[categories.index(j)] = 1
frame.loc[d]=f
except:
passreturn frame
In[15]: aOut[15]:
agrouplabels01new[a, d]12old[a, g, h]23new[i, m, a]In[16]: sublists_to_dummies(a,'labels')
Out[16]:
adgihm011000011010102100101In[17]: sublists_to_dummies(a,'labels','group')
Out[17]:
adgihmnew210101old101010
Post a Comment for "Python Pandas: How To Create A Binary Matrix From Column Of Lists?"