Split Pandas Dataframe By String
Solution 1:
1) Doing it on the fly while reading the file line-by-line and checking for NewEntry break is one approach.
2) Other way, if the dataframe already exists is to find the NewEntry and slice the dataframe into multiple ones to dff = {}
dfcol1col20foo12341bar45672stuff78943NewEntryNaN4morestuff1345Find the NewEntry rows, add [-1] and [len(df.index)] for boundary conditions
rows = [-1] + np.where(df['col1']=='NewEntry')[0].tolist() + [len(df.index)]
[-1, 3L, 5]Create the dict of dataframes
dff = {}
for i, r in enumerate(rows[:-1]):
dff[i] = df[r+1: rows[i+1]]
Dict of dataframes {0: datafram1, 1: dataframe2}
dff
{0:col1col20foo12341bar45672stuff7894, 1:col1col24morestuff1345}
Dataframe 1
dff[0]col1col20foo12341bar45672stuff7894Dataframe 2
dff[1]
col1 col2
4 morestuff 1345Solution 2:
So using your example data which I concatenated 3 times, after loading (I named the cols 'a','b','c' for convenience) we then find the indices where you have 'New Entry' and the produce a list of tuples of these positions stepwise to mark the beg, end range.
We can then iterate over this list of tuples and slice the orig df and append to list:
In [22]:t="""foo,1234,bar,4567stuff,7894NewEntry,,morestuff,1345"""df=pd.read_csv(io.StringIO(t),header=None,names=['a','b','c'])df=pd.concat([df]*3,ignore_index=True)dfOut[22]:abc0foo1234 NaN1bar4567 NaN2stuff7894 NaN3NewEntryNaNNaN4morestuff1345 NaN5foo1234 NaN6bar4567 NaN7stuff7894 NaN8NewEntryNaNNaN9morestuff1345 NaN10foo1234 NaN11bar4567 NaN12stuff7894 NaN13NewEntryNaNNaN14morestuff1345 NaNIn [30]:importitertoolsidx=df[df['a']=='New Entry'].indexidx_list= [(0,idx[0])]
idx_list=idx_list+list(zip(idx,idx[1:]))idx_listOut[30]:
[(0, 3), (3, 8), (8, 13)]
In [31]:df_list= []
for i in idx_list:print(i)ifi[0]==0:df_list.append(df[i[0]:i[1]])else:df_list.append(df[i[0]+1:i[1]])df_list(0,3)(3,8)(8,13)Out[31]:
[ abc0foo1234 NaN1bar4567 NaN2stuff7894 NaN, abc4morestuff1345 NaN5foo1234 NaN6bar4567 NaN7stuff7894 NaN, abc9morestuff1345 NaN10foo1234 NaN11bar4567 NaN12stuff7894 NaN]
Post a Comment for "Split Pandas Dataframe By String"