How Can I Use Map With Multi-index In Pandas?
Solution 1:
A vectorized approach:
df['gene'] = df.index #you get the index as tuple
df['gene'] = df['gene'].map(gene_d)
df = df.set_index('gene', append=True)
Resulting df:
A B C
chrom strand abs_pos gene
chrom1 - 1234 geneA 1 1 1
+ 5678 geneB 2 2 2
9876 geneC 3 3 3
chrom2 + 13579 geneD 4 4 4
8497 geneE 5 5 5
- 98765 geneF 6 6 6
76856 geneG 7 7 7
Solution 2:
Make gene_d into a dataframe:
df1 = pd.DataFrame.from_dict(gene_d, orient='index').rename(columns={0:'gene'})
Give it a multindex:
df1.index = pd.MultiIndex.from_tuples(df1.index)
Concatenate with original df:
new_df = pd.concat([df, df1], axis=1).sort_values('A')
Do some clean up:
new_df.index.rename(['chrom','strand','abs_pos'], inplace=True)
new_df.set_index('gene', append=True)
new_df
A B C
chrom strand abs_pos gene
chrom1 - 1234 geneA 1 1 1
+ 5678 geneB 2 2 2
9876 geneC 3 3 3
chrom2 + 13579 geneD 4 4 4
8497 geneE 5 5 5
- 98765 geneF 6 6 6
76856 geneG 7 7 7
Solution 3:
A non-vectorized approach, but maybe useful for people who are really struggling with this.
In my example, I have a df called bb_df, which has a multindex with [customer, months] as the structure, each site having multiple months beneath it. The multindex is structured like (levels = [level_1, level_2], labels = [level_1, level_2]). As such, you can get a full list of the level 2 levels, in order, for mapping by the following list comprehension:
[bb_df.index.levels[1][x] for x in bb_df.index.labels[1]]
Hope this helps somebody.
Solution 4:
I ran into a similar issue and found using a map was not straight forward. Instead I had to rewrite my code getting the intended answer by using a for loop
.
It isn't as clean as using map, but assigning each by key avoids using the unnecessary addition of other holding dataframes, and accounts for missing values in your dictionary, say if ('chrom1', '+', 9876)
already had a value you didn't want to replace.
df['gene'] = '' # Add a column for replacement strings if not present
# Create a for-loop that cycles through keys and values
for gnk, gnv in gene_d.items(): df.loc[gnk, 'gene'] = gnv
df.set_index('gene', append=True, inplace=True)
I understand that for speed, this may not be best, but I have not tested either for a larger data set.
Here is the code and the output for the problem I ran into (gene_make()
simply reads in df
as the question states):
gene_test = {('chrom1', '+', 9876): 'geneQ', ('chrom2', '+', 13579): 'geneP'}
gene_d = {('chrom1', '-', 1234) : 'geneA', ('chrom1', '+', 5678): 'geneB',
# ('chrom1', '+', 9876): 'geneC', ('chrom2', '+', 13579): 'geneD',
('chrom2', '+', 8497): 'geneE', ('chrom2', '-', 98765): 'geneF',
('chrom2', '-', 76856): 'geneG'}
df = gene_make()
df['gene'] = np.nan
for gnk, gnv in gene_test.items(): df.loc[gnk, 'gene'] = gnv
df.set_index('gene', append=True, inplace=True)
display(df)
df = gene_make()
df['gene'] = df.index
for gnk, gnv in gene_test.items(): df.loc[gnk, 'gene'] = gnv
df['gene'] = df['gene'].map(gene_d)
df = df.set_index('gene', append=True)
display(df)
Output:
A B C
chrom strand abs_pos gene
chrom1 - 1234 NaN 1 1 1
+ 5678 NaN 2 2 2
9876 geneQ 3 3 3
chrom2 + 13579 geneP 4 4 4
8497 NaN 5 5 5
- 98765 NaN 6 6 6
76856 NaN 7 7 7
A B C
chrom strand abs_pos gene
chrom1 - 1234 geneA 1 1 1
+ 5678 geneB 2 2 2
9876 NaN 3 3 3
chrom2 + 13579 NaN 4 4 4
8497 geneE 5 5 5
- 98765 geneF 6 6 6
76856 geneG 7 7 7
Granted, changing the order of the for-loop
and the map
may help solve this problem.
df = gene_make()
df['gene'] = df.index
df['gene'] = df['gene'].map(gene_d)
for gnk, gnv in gene_test.items(): df.loc[gnk, 'gene'] = gnv
df.set_index('gene', append=True, inplace=True)
display(df)
Output:
A B C
chrom strand abs_pos gene
chrom1 - 1234 geneA 1 1 1
+ 5678 geneB 2 2 2
9876 geneQ 3 3 3
chrom2 + 13579 geneP 4 4 4
8497 geneE 5 5 5
- 98765 geneF 6 6 6
76856 geneG 7 7 7
Post a Comment for "How Can I Use Map With Multi-index In Pandas?"