Python Unicode Error : Why Do I Keep Getting This Caracters Although I Used Encode(utf-8)?
for p in articles2: url = p.find('a')['href'] title = p.find('h3').get_text().strip().encode('utf-8') print(title) OUTPUT: c3\xa9gie de d\xc3\xa9fense active et pr\xc
Solution 1:
Try a different encoding, it seems this characters are Latin-1.
You can find more encodings here
Solution 2:
Use split()
and join
to translate the characters.
i.e "Zoom sur la course effr\xc3\xa9n\xc3\xa9e pour trouver un vaccin"
will be 'Zoom sur la course effrénée pour trouver un vaccin'
after join
and split()
Then encode
it to ascii
ignoring errors 'ignore'
and decode
it to utf-8
this is in order to remove the special characters such as é
Should look like:
"".join(the_text_to_clean.strip()).encode('ascii', 'ignore').decode("utf-8")
How it applies in your code
for p in articles2:
url = p.find('a')['href']
title = p.find('h3').get_text()
title = "".join(title.strip()).encode('ascii', 'ignore').decode("utf-8") #clean title
print(title)
Post a Comment for "Python Unicode Error : Why Do I Keep Getting This Caracters Although I Used Encode(utf-8)?"