How To Get Missing Date In Columns Using Python Pandas

January 21, 2023 Post a Comment

I have following data frame in which i want to get missing date along with its keys in pandas. size number key date 0 153.2 K 12345 Hello 20181001 1

Solution 1:

You can do this with some fancy reshaping like this:

(df.pivot('date', 'key')
   .reindex(np.arange(df['date'].min(), df['date'].max()+1))
   .stack('key', dropna=False)
   .loc[lambda x: x['size'].isna()]
   .index
   .to_frame(index=False))

Output:

       date    key
0  20181002  Hello
1  20181002     No

How?

Reshape the dataframe such that you have a single date per row
Next, reindex the dataframe to fill in missing dates
Reshape the dataframe stacking key but keeping NaN values
Filter dataframe to only missing values using isna
Convert the index to a dataframe with to_frame

Update address date concern mentioned by @Cimbali below

df['date'] = pd.to_datetime(df['date'], format='%Y%m%d')
(df.pivot('date', 'key')
   .reindex(pd.date_range(df['date'].min(), df['date'].max(), freq='D'))
   .stack('key', dropna=False)
   .loc[lambda x: x['size'].isna()]
   .index
   .to_frame(index=False))

Output:

           0    key
0 2018-10-02  Hello
1 2018-10-02     No

Solution 2:

If we align the dates along one dimension, it becomes easier to see the common values (on the index) and where to fill (on the columns). We can do this with pivot_table. (The value here is just a placeholder with all 1s.)

>>> tab = df.assign(value=1).pivot_table(index='key', columns='date', values='value')
>>> tab
date   20181001  20181003  20181004
key                                
Hello         1         1         1
No            1         1         1

melt allows us to do the opposite transformation:

Baca Juga

>>> tab.reset_index().melt(id_vars='key').drop(columns='value')
     key      date
0  Hello  20181001
1     No  20181001
2  Hello  20181003
3     No  20181003
4  Hello  20181004
5     No  20181004

So if we want an intermediate step to add missing dates, we should probably convert them to dates first and use pd.date_range:

>>> avail_dates = pd.to_datetime(tab.columns, format='%Y%m%d')
>>> avail_dates
DatetimeIndex(['2018-10-01', '2018-10-03', '2018-10-04'], dtype='datetime64[ns]', name='date', freq=None)
>>> all_dates = pd.date_range(avail_dates.min(), avail_dates.max(), freq='D')
>>> tab_filled = tab.reindex(all_dates.strftime('%Y%m%d').astype(int), axis='columns')
>>> tab_filled
       20181001  20181002  20181003  20181004
key                                          
Hello         1       NaN         1         1
No            1       NaN         1         1

Finally get only the new columns, and do our melt trick:

>>> missing = tab_filled.drop(columns=tab.columns).reset_index().melt('key').drop(columns=['value'])
>>> missing
     key  variable
0  Hello  20181002
1     No  20181002

Here’s a shorter variant on the same principle, where we first build the dates, then a synthetic dataframe that we can melt:

>>> dates = pd.date_range(
...     *pd.to_datetime(df['date'], format='%Y%m%d').agg(['min', 'max']), freq='D'
... ).strftime('%Y%m%d').astype(int)
>>> dates
Int64Index([20181001, 20181002, 20181003, 20181004], dtype='int64')
>>> pd.DataFrame(index=pd.Index(df['key'].unique(), name='key'),
...              columns=dates.difference(df['date']))\
... .reset_index().melt('key').drop(columns=['value'])
     key  variable
0  Hello  20181002
1     No  20181002

Python Developer

How To Get Missing Date In Columns Using Python Pandas

Solution 1:

How?

Update address date concern mentioned by @Cimbali below

Solution 2:

Post a Comment for "How To Get Missing Date In Columns Using Python Pandas"