Skip to content Skip to sidebar Skip to footer

Merging 1300 Data Frames Into A Single Frame Becomes Really Slow

I have 1300 csv files in a directory. Each file has a date in the first column, followed by daily data for the last 20-30 years which spans another 8 columns. So like this, Data1.c

Solution 1:

The reason your loop slows down is because of at each .append(), the dataframe has to create a copy in order to allocate more memory, as described here.

If your memory can fit it all, you could first fill a list of fixed size(1300) with all data frames, and then use df = pd.concat(list_of_dataframes), which would probably avoid the issue you are having right now. Your code could be adjusted as such:

import pandas as pd 
lst = [None for _ in range(1300)] # Creates empty list

for i, filename in enumerate(os.listdir(filepath)):
    file_path = os.path.join(filepath, filename)
    df = pd.read_csv(file_path,index_col=0)
    df = pd.concat([df[[col]].assign(Source=f'{filename[:-4]}-{col}').rename(columns={col: 'Data'}) for col in df])
    lst[i] = df
    

frame = pd.concat(lst)

Post a Comment for "Merging 1300 Data Frames Into A Single Frame Becomes Really Slow"