Create A Frequency Matrix For Bigrams From A List Of Tuples, Using Numpy Or Pandas
I am very new to Python. I have a list of tuples, where I created bigrams. This question is pretty close to my needs my_list = [('we', 'consider'), ('what', 'to'), ('use', 'the'),
Solution 1:
You can create frequancy data frame and call index-values by words:
words=sorted(list(set([item fortin my_list foritemin t])))
df = pd.DataFrame(0, columns=words, index=words)
foriin my_list:
df.at[i[0],i[1]] += 1
output:
consider of the to use we what words
consider 00000000
of 00000000
the 00000000to00000000
use 00100000
we 10000000
what 00010000
words 01000000
Note that in this one, the order in the bigram matters. If you don't care about order, you should sort the tuples by their content first, using this:
my_list = [tuple(sorted(i)) for i in my_list]
Another way is to use Counter
to do the count, but I expect it to be similar performance(again if order in bigrams matters, remove sorted
from frequency_list
):
from collections import Counter
frequency_list = Counter(tuple(sorted(i)) for i in my_list)
words=sorted(list(set([item for t in my_list for item in t])))
df = pd.DataFrame(0, columns=words, index=words)
for k,v in frequency_list.items():
df.at[k[0],k[1]] = v
output:
consider of the to use we what words
consider 00000100
of 00000001
the 00001000to00000010
use 00000000
we 00000000
what 00000000
words 00000000
Solution 2:
If you do not care about speed too much you could use for loop.
import pandas as pd
import numpy as np
from itertools import product
my_list = [('we', 'consider'), ('what', 'to'), ('use', 'the'), ('words', 'of')]
index = pd.DataFrame(my_list)[0].unique()
columns = pd.DataFrame(my_list)[1].unique()
df = pd.DataFrame(np.zeros(shape=(len(columns), len(index))),
columns=columns, index=index, dtype=int)
for idx,col in product(index, columns):
df[col].loc[idx] = my_list.count((idx, col))
print(df)
Output:
consider to the of
we 1000
what 0100
use 0010
words 0001
Post a Comment for "Create A Frequency Matrix For Bigrams From A List Of Tuples, Using Numpy Or Pandas"