How To Categorize A Range Of Values In Pandas Dataframe
Supose I have the following DataFrame: Area 0 14.68 1 40.54 2 10.82 3 2.31 4 22.3 And I want to categorize that values in range. Like A: [1,10], B: [11,20], C... Area 0
Solution 1:
For me working cat.codes
with indexing by converting list a
to numpy array:
a = list('ABCDEF')
df['new'] = np.array(a)[pd.cut(df["Area"], bins = bins).cat.codes]
print (df)
Area new
0 14.68 B
1 40.54 C
2 10.82 A
3 2.31 A
4 22.30 C
5 600.00 F
catDf = pd.Series(np.array(a)[pd.cut(df["Area"], bins = bins).cat.codes], index=df.index)
print (catDf)
0 B
1 C
2 A
3 A
4 C
5 F
dtype: object
Solution 2:
Assuming that bins is a global variable, you could do that
def number_to_bin(number):
ALPHABETS = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
for i, bin in enumerate(bins):
if number >= bin[0] and number <= bin[1]:
return ALPHABETS[i]
df["area"] = df["area"].apply(number_to_bin)
Solution 3:
You can specify the labels like following:
Note not sure which ranges you used:
pd.cut(df.Area, [1,10, 20, 50, 100], labels=['A', 'B', 'C', 'D'])
0 B
1 C
2 B
3 A
4 C
Name: Area, dtype: category
Categories (4, object): [A < B < C < D]
Post a Comment for "How To Categorize A Range Of Values In Pandas Dataframe"