Reverse Complement Of Dna Strand Using Python

January 22, 2024 Post a Comment

I have a DNA sequence and would like to get reverse complement of it using Python. It is in one of the columns of a CSV file and I'd like to write the reverse complement to another

Solution 1:

The other answers are perfectly fine, but if you plan to deal with real DNA sequences I suggest using Biopython. What if you encounter a character like "-", "*" or indefinitions? What if you want to do further manipulations of your sequences? Do you want to create a parser for each file format out there?

The code you ask for is as easy as:

from Bio.Seq import Seq

seq = Seq("TCGGGCCC")print seq.reverse_complement()
# GGGCCCGA

Now if you want to do another transformations:

print seq.complement()
print seq.transcribe()
print seq.translate()

Outputs

AGCCCGGG
UCGGGCCC
SG

And if you run into strange chars, no need to keep adding code to your program. Biopython deals with it:

seq = Seq("TCGGGCCCX")print seq.reverse_complement()
# XGGGCCCGA

Solution 2:

In general, a generator expression is simpler than the original code and avoids creating extra list objects. If there can be multiple-character insertions go with the other answers.

complement = {'A': 'T', 'C': 'G', 'G': 'C', 'T': 'A'}
seq = "TCGGGCCC"reverse_complement = "".join(complement.get(base, base) for base in reversed(seq))

Solution 3:

importstring
old_chars = "ACGT"
replace_chars = "TGCA"
tab = string.maketrans(old_chars,replace_chars)
print"AAAACCCGGT".translate(tab)[::-1]

that will give you the reverse compliment = ACCGGGTTTT

Solution 4:

The get method of a dictionary allows you to specify a default value if the key is not in the dictionary. As a preconditioning step I would map all your non 'ATGC' bases to single letters (or punctuation or numbers or anything that wont show up in your sequence), then reverse the sequence, then replace the single letter alternates with their originals. Alternatively, you could reverse it first and then search and replace things like sni with ins.

alt_map = {'ins':'0'}
complement = {'A': 'T', 'C': 'G', 'G': 'C', 'T': 'A'} 

def reverse_complement(seq):    
    for k,v in alt_map.iteritems():
        seq = seq.replace(k,v)
    bases = list(seq) 
    bases = reversed([complement.get(base,base) for base in bases])
    bases = ''.join(bases)
    for k,v in alt_map.iteritems():
        bases = bases.replace(v,k)
    return bases

>>> seq = "TCGGinsGCCC"
>>> print"Reverse Complement:"
>>> print(reverse_complement(seq))
GGGCinsCCGA

Solution 5:

The fastest one liner for reverse complement is the following:

defrev_compl(st):
    nn = {'A': 'T', 'C': 'G', 'G': 'C', 'T': 'A'}
    return"".join(nn[n] for n inreversed(st))

Python Developer