Skip to content Skip to sidebar Skip to footer

Python Pandas: How To Create A Binary Matrix From Column Of Lists?

I have a Python Pandas DataFrame like the following: 1 0 a, b 1 c 2 d 3 e a, b is a string representing a list of user features How can I convert this into a bi

Solution 1:

I think you can use:

df = df.iloc[:,0].str.split(', ', expand=True)
       .stack()
       .reset_index(drop=True)
       .str.get_dummies()

print df
   a  b  c  d  e
010000101000200100300010400001

EDITED:

printdf.iloc[:,0].str.replace(' ','').str.get_dummies(sep=',')
   abcde011000100100200010300001

Solution 2:

I wrote a general function, with support for grouping, to do this a while back:

defsublist_uniques(data,sublist):
    categories = set()
    for d,t in data.iterrows():
        try:
            for j in t[sublist]:
                categories.add(j)
        except:
            passreturnlist(categories)

defsublists_to_dummies(f,sublist,index_key = None):
    categories = sublist_uniques(f,sublist)
    frame = pd.DataFrame(columns=categories)
    for d,i in f.iterrows():
        iftype(i[sublist]) == listor np.array:
            try:
                if index_key != None:
                    key = i[index_key]
                    f =np.zeros(len(categories))
                    for j in i[sublist]:
                        f[categories.index(j)] = 1if key in frame.index:
                        for j in i[sublist]:
                            frame.loc[key][j]+=1else:
                        frame.loc[key]=f
                else:
                    f =np.zeros(len(categories))
                    for j in i[sublist]:
                        f[categories.index(j)] = 1
                    frame.loc[d]=f
            except:
                passreturn frame
In[15]: aOut[15]:
   agrouplabels01new[a, d]12old[a, g, h]23new[i, m, a]In[16]: sublists_to_dummies(a,'labels')
Out[16]:
   adgihm011000011010102100101In[17]: sublists_to_dummies(a,'labels','group')
Out[17]:
     adgihmnew210101old101010

Post a Comment for "Python Pandas: How To Create A Binary Matrix From Column Of Lists?"