Skip to content Skip to sidebar Skip to footer

Fulfill An Empty Dataframe With Common Index Values From Another Daframe

I have a daframe with a series of period 1 month and frequency one second. The problem the time step between records is not always 1 second. time c1 c2 2013-01-01 0

Solution 1:

I don't think you need a second dataframe. If you call resample without a fill_method, it will store NaNs for the missing periods:

df.resample("s").max()Out[62]:c1c2time2013-01-01 00:00:01  5.03.02013-01-01 00:00:02  NaNNaN2013-01-01 00:00:03  7.02.02013-01-01 00:00:04  1.05.02013-01-01 00:00:05  4.03.02013-01-01 00:00:06  5.06.02013-01-01 00:00:07  NaNNaN2013-01-01 00:00:08  NaNNaN2013-01-01 00:00:09  4.02.02013-01-01 00:00:10  7.08.0

max() here is just an arbitrary method so that it returns a dataframe. You can replace it with mean, min etc. assuming you have no duplicates. If you have duplicates, they will be aggregated by that function.

As Paul H suggested in the comments, you can use df.resample("s").asfreq() without any aggregation. It skips an unnecessary step of aggregation so it is probably more efficient. It will raise an error if you have duplicate values in the index.

Solution 2:

You need to reindex the dataframe.

import pandas
df = pandas.read_table(filename, **options)
N = 86400 * 31 #seconds per month
dates = pandas.date_range(df.index[0], periods=N-1, freq='1s')
df = df.reindex(dates)

Here's a reproducible demonstration:

df=pandas.DataFrame(data={'A':range(0,10),'B':range(0,20,2)},index=pandas.date_range('2012-01-01',freq='2s',periods=10)).reindex(pandas.date_range('2012-01-01',freq='1s',periods=25))print(df)AB2012-01-01 00:00:00  0.00.02012-01-01 00:00:01  NaNNaN2012-01-01 00:00:02  1.02.02012-01-01 00:00:03  NaNNaN2012-01-01 00:00:04  2.04.02012-01-01 00:00:05  NaNNaN2012-01-01 00:00:06  3.06.02012-01-01 00:00:07  NaNNaN2012-01-01 00:00:08  4.08.02012-01-01 00:00:09  NaNNaN2012-01-01 00:00:10  5.010.02012-01-01 00:00:11  NaNNaN2012-01-01 00:00:12  6.012.02012-01-01 00:00:13  NaNNaN2012-01-01 00:00:14  7.014.02012-01-01 00:00:15  NaNNaN2012-01-01 00:00:16  8.016.02012-01-01 00:00:17  NaNNaN2012-01-01 00:00:18  9.018.02012-01-01 00:00:19  NaNNaN2012-01-01 00:00:20  NaNNaN2012-01-01 00:00:21  NaNNaN2012-01-01 00:00:22  NaNNaN2012-01-01 00:00:23  NaNNaN2012-01-01 00:00:24  NaNNaN

Solution 3:

If you already set up the indexes in the "nan" data frame, I think you should be able to just use loc. Indexing is a really important thing to master when using Pandas. It will save you a whole lot of time, make your code a lot cleaner and can really improve your performance.

Careful though, the indexes and columns have to be the same for the trick below to work as is.

>>> import pandas as pd
>>> import numpy as np

>>> df1 = pd.DataFrame(np.random.rand(10, 3), columns=['A', 'B', 'C'])
>>> df1
          A         B         C
00.1715020.2584160.11832610.2154560.4621220.85817320.3735490.9464000.57984530.6062890.2895520.47365840.8858990.7837470.08997550.6742080.6397100.10564260.4047750.5413890.26810170.3746090.6939160.74357580.0747730.1500720.13555590.2304310.2024170.466538>>> df2 = pd.DataFrame(np.nan, index=range(15), columns=['A', 'B', 'C'])
>>> df2
     A   B   C
0  NaN NaN NaN
1  NaN NaN NaN
2  NaN NaN NaN
3  NaN NaN NaN
4  NaN NaN NaN
5  NaN NaN NaN
6  NaN NaN NaN
7  NaN NaN NaN
8  NaN NaN NaN
9  NaN NaN NaN
10 NaN NaN NaN
11 NaN NaN NaN
12 NaN NaN NaN
13 NaN NaN NaN
14 NaN NaN NaN

>>> df2.loc[df1.index] = df1    # This is where the magic happens>>> df2
           A         B         C
00.1715020.2584160.11832610.2154560.4621220.85817320.3735490.9464000.57984530.6062890.2895520.47365840.8858990.7837470.08997550.6742080.6397100.10564260.4047750.5413890.26810170.3746090.6939160.74357580.0747730.1500720.13555590.2304310.2024170.46653810       NaN       NaN       NaN
11       NaN       NaN       NaN
12       NaN       NaN       NaN
13       NaN       NaN       NaN
14       NaN       NaN       NaN

Post a Comment for "Fulfill An Empty Dataframe With Common Index Values From Another Daframe"