Skip to content Skip to sidebar Skip to footer

Python Email Quoted-printable Encoding Problem

I am extracting emails from Gmail using the following: def getMsgs(): try: conn = imaplib.IMAP4_SSL('imap.gmail.com', 993) except: print 'Failed to connect' print 'I

Solution 1:

You could/should use the email.parser module to decode mail messages, for example (quick and dirty example!):

from email.parser import FeedParser
f = FeedParser()
f.feed("<insert mail message here, including all headers>")
rootMessage = f.close()

# Now you can access the message and its submessages (if it's multipart)print rootMessage.is_multipart()

# Or check for errorsprint rootMessage.defects

# If it's a multipart message, you can get the first submessage and then its payload# (i.e. content) like so:
rootMessage.get_payload(0).get_payload(decode=True)

Using the "decode" parameter of Message.get_payload, the module automatically decodes the content, depending on its encoding (e.g. quoted printables as in your question).

Solution 2:

If you are using Python3.6 or later, you can use the email.message.Message.get_content() method to decode the text automatically. This method supersedes get_payload(), though get_payload() is still available.

Say you have a string s containing this email message (based on the examples in the docs):

Subject: Ayons asperges pourle=?utf-8?q?d=C3=A9jeuner?=
From: =?utf-8?q?Pep=C3=A9?= Le Pew <pepe@example.com>
To: Penelope Pussycat <penelope@example.com>,
 Fabrette Pussycat <fabrette@example.com>
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0

    Salut!

    Celaressemble=C3=A0 un excellent recipie[1] d=C3=A9jeuner.

    [1] http://www.yummly.com/recipe/Roasted-Asparagus-Epicurious-203718

    --Pep=C3=A9
   =20

Non-ascii characters in the string have been encoded with the quoted-printable encoding, as specified in the Content-Transfer-Encoding header.

Create an email object:

import email
from email importpolicymsg= email.message_from_string(s, policy=policy.default)

Setting the policy is required here; otherwise policy.compat32 is used, which returns a legacy Message instance that doesn't have the get_content method. policy.default will eventually become the default policy, but as of Python3.7 it's still policy.compat32.

The get_content() method handles decoding automatically:

print(msg.get_content())

Salut!

Cela ressemble à un excellent recipie[1] déjeuner.

[1] http://www.yummly.com/recipe/Roasted-Asparagus-Epicurious-203718

--Pepé

If you have a multipart message, get_content() needs to be called on the individual parts, like this:

for part in message.iter_parts():
    print(part.get_content())

Solution 3:

That's known as quoted-printable encoding. You probably want to use something like quopri.decodestring - http://docs.python.org/library/quopri.html

Post a Comment for "Python Email Quoted-printable Encoding Problem"