Find And Select The Most Frequent Data Of Column In Pandas Dataframe
Solution 1:
Pandas 0.15.2 has a DataFrame.mode()
method. It might be of use to someone looking for this as I was.
Here are the docs.
Edit: For the Value:
DataFrame.mode()[0]
Solution 2:
This is not as straightforward as it could be (should be).
As you probably know, the statistics jargon for the most common value is the "mode." Numpy does not have a built-in function for this, but scipy does. Import it like so:
from scipy.stats.mstatsimport mode
It does more than simply return the most common value, as you can read about in the docs, so it's convenient to define a function that uses mode
to just get the most common value.
f = lambda x: mode(x, axis=None)[0]
And now, instead of value_counts()
, use apply(f)
. Here is an example:
In [20]: DataFrame([1,1,2,2,2,3], index=[1,1,1,2,2,2]).groupby(level=0).apply(f)
Out[20]:
11.022.0
dtype: object
Update: Scipy's mode
does not work with strings. For your string data, you'll need to define a more general mode function. This answer should do the trick.
Solution 3:
For whole dataframe, you can use:
dataframe.mode()
For specific column:
dataframe.mode()['Column'][0]
Second case is more useful in imputing the values.
Post a Comment for "Find And Select The Most Frequent Data Of Column In Pandas Dataframe"