How Can I Use Map With Multi-index In Pandas?
Solution 1:
A vectorized approach:
df['gene'] = df.index #you get the index as tuple
df['gene'] = df['gene'].map(gene_d)
df = df.set_index('gene', append=True)
Resulting df:
A B C
chrom strand abs_pos gene
chrom1 - 1234 geneA 1 1 1
+ 5678 geneB 2 2 2
9876 geneC 3 3 3
chrom2 + 13579 geneD 4 4 4
8497 geneE 5 5 5
- 98765 geneF 6 6 6
76856 geneG 7 7 7
Solution 2:
Make gene_d into a dataframe:
df1 = pd.DataFrame.from_dict(gene_d, orient='index').rename(columns={0:'gene'})
Give it a multindex:
df1.index = pd.MultiIndex.from_tuples(df1.index)
Concatenate with original df:
new_df = pd.concat([df, df1], axis=1).sort_values('A')
Do some clean up:
new_df.index.rename(['chrom','strand','abs_pos'], inplace=True)
new_df.set_index('gene', append=True)
new_df
A B C
chrom strand abs_pos gene
chrom1 - 1234 geneA 1 1 1
+ 5678 geneB 2 2 2
9876 geneC 3 3 3
chrom2 + 13579 geneD 4 4 4
8497 geneE 5 5 5
- 98765 geneF 6 6 6
76856 geneG 7 7 7
Solution 3:
A non-vectorized approach, but maybe useful for people who are really struggling with this.
In my example, I have a df called bb_df, which has a multindex with [customer, months] as the structure, each site having multiple months beneath it. The multindex is structured like (levels = [level_1, level_2], labels = [level_1, level_2]). As such, you can get a full list of the level 2 levels, in order, for mapping by the following list comprehension:
[bb_df.index.levels[1][x] for x in bb_df.index.labels[1]]
Hope this helps somebody.
Solution 4:
I ran into a similar issue and found using a map was not straight forward. Instead I had to rewrite my code getting the intended answer by using a for loop.
It isn't as clean as using map, but assigning each by key avoids using the unnecessary addition of other holding dataframes, and accounts for missing values in your dictionary, say if ('chrom1', '+', 9876) already had a value you didn't want to replace.
df['gene'] = '' # Add a column for replacement strings if not present
# Create a for-loop that cycles through keys and values
for gnk, gnv in gene_d.items(): df.loc[gnk, 'gene'] = gnv
df.set_index('gene', append=True, inplace=True)
I understand that for speed, this may not be best, but I have not tested either for a larger data set.
Here is the code and the output for the problem I ran into (gene_make() simply reads in df as the question states):
gene_test = {('chrom1', '+', 9876): 'geneQ', ('chrom2', '+', 13579): 'geneP'}
gene_d = {('chrom1', '-', 1234) : 'geneA', ('chrom1', '+', 5678): 'geneB',
# ('chrom1', '+', 9876): 'geneC', ('chrom2', '+', 13579): 'geneD',
('chrom2', '+', 8497): 'geneE', ('chrom2', '-', 98765): 'geneF',
('chrom2', '-', 76856): 'geneG'}
df = gene_make()
df['gene'] = np.nan
for gnk, gnv in gene_test.items(): df.loc[gnk, 'gene'] = gnv
df.set_index('gene', append=True, inplace=True)
display(df)
df = gene_make()
df['gene'] = df.index
for gnk, gnv in gene_test.items(): df.loc[gnk, 'gene'] = gnv
df['gene'] = df['gene'].map(gene_d)
df = df.set_index('gene', append=True)
display(df)
Output:
A B C
chrom strand abs_pos gene
chrom1 - 1234 NaN 1 1 1
+ 5678 NaN 2 2 2
9876 geneQ 3 3 3
chrom2 + 13579 geneP 4 4 4
8497 NaN 5 5 5
- 98765 NaN 6 6 6
76856 NaN 7 7 7
A B C
chrom strand abs_pos gene
chrom1 - 1234 geneA 1 1 1
+ 5678 geneB 2 2 2
9876 NaN 3 3 3
chrom2 + 13579 NaN 4 4 4
8497 geneE 5 5 5
- 98765 geneF 6 6 6
76856 geneG 7 7 7
Granted, changing the order of the for-loop and the map may help solve this problem.
df = gene_make()
df['gene'] = df.index
df['gene'] = df['gene'].map(gene_d)
for gnk, gnv in gene_test.items(): df.loc[gnk, 'gene'] = gnv
df.set_index('gene', append=True, inplace=True)
display(df)
Output:
A B C
chrom strand abs_pos gene
chrom1 - 1234 geneA 1 1 1
+ 5678 geneB 2 2 2
9876 geneQ 3 3 3
chrom2 + 13579 geneP 4 4 4
8497 geneE 5 5 5
- 98765 geneF 6 6 6
76856 geneG 7 7 7
Post a Comment for "How Can I Use Map With Multi-index In Pandas?"