Skip to content Skip to sidebar Skip to footer

Validating Dataframe Column Data

I have a below pseudocode which I need to write using pandas. if group_min_size && group_max_size if group_min_size == 0 && group_max_size > 0 if

Solution 1:

Just answer your questions step by step. Begin by creating your booleans:

min_equal_0 = df['group_min_size'] == 0
min_above_0 = df['group_min_size'] > 0
min_above_equal_2 = df['group_min_size'] >= 2
min_below_2 = df['group_min_size'] < 2

max_equal_0 = df['group_max_size'] == 0
max_above_0 = df['group_max_size'] > 0
max_above_equal_2 = df['group_max_size'] >= 2
max_below_2 = df['group_max_size'] < 2

Now we can look at creating our masks according to the pseudo-code:

first_mask = ~(min_equal_0 & max_above_0 & (max_below_2 | max_above_equal_2))
second_mask = ~(max_equal_0 & min_above_0 & (min_below_2 | min_above_equal_2))

If we combine the two:

>> first_mask & second_mask

0    False
1     True
2    False
3    False
4     True
5     True
6     True
7     True
8     True
dtype: bool

If you want to treat NaN as False, just add them:

min_is_not_null = df['group_min_size'].notnull()
max_is_not_null = df['group_max_size'].notnull()
>> min_is_not_null & max_is_not_null & first_mask & second_mask
0    False
1     True
2    False
3    False
4    False
5     True
6     True
7     True
8     True
dtype: bool

Post a Comment for "Validating Dataframe Column Data"