Skip to content Skip to sidebar Skip to footer

Python Unicode Error : Why Do I Keep Getting This Caracters Although I Used Encode(utf-8)?

for p in articles2: url = p.find('a')['href'] title = p.find('h3').get_text().strip().encode('utf-8') print(title) OUTPUT: c3\xa9gie de d\xc3\xa9fense active et pr\xc

Solution 1:

Try a different encoding, it seems this characters are Latin-1.

You can find more encodings here

Solution 2:

Use split() and join to translate the characters.

i.e "Zoom sur la course effr\xc3\xa9n\xc3\xa9e pour trouver un vaccin" will be 'Zoom sur la course effrénée pour trouver un vaccin' after join and split()

Then encode it to ascii ignoring errors 'ignore' and decode it to utf-8 this is in order to remove the special characters such as é

Should look like:

"".join(the_text_to_clean.strip()).encode('ascii', 'ignore').decode("utf-8")

How it applies in your code

for p in articles2:
   url = p.find('a')['href']
   title = p.find('h3').get_text()
   title = "".join(title.strip()).encode('ascii', 'ignore').decode("utf-8") #clean title
   print(title)

Post a Comment for "Python Unicode Error : Why Do I Keep Getting This Caracters Although I Used Encode(utf-8)?"