Python Unicode Error : Why Do I Keep Getting This Caracters Although I Used Encode(utf-8)?

October 02, 2024 Post a Comment

for p in articles2: url = p.find('a')['href'] title = p.find('h3').get_text().strip().encode('utf-8') print(title) OUTPUT: c3\xa9gie de d\xc3\xa9fense active et pr\xc

Solution 1:

Try a different encoding, it seems this characters are Latin-1.

You can find more encodings here

Solution 2:

Use split() and join to translate the characters.

i.e "Zoom sur la course effr\xc3\xa9n\xc3\xa9e pour trouver un vaccin" will be 'Zoom sur la course effrÃ©nÃ©e pour trouver un vaccin' after join and split()

Then encode it to ascii ignoring errors 'ignore' and decode it to utf-8 this is in order to remove the special characters such as Ã©

Should look like:

"".join(the_text_to_clean.strip()).encode('ascii', 'ignore').decode("utf-8")

How it applies in your code

for p in articles2:
   url = p.find('a')['href']
   title = p.find('h3').get_text()
   title = "".join(title.strip()).encode('ascii', 'ignore').decode("utf-8") #clean title
   print(title)

Python Developer

Python Unicode Error : Why Do I Keep Getting This Caracters Although I Used Encode(utf-8)?

Solution 1:

Solution 2:

Post a Comment for "Python Unicode Error : Why Do I Keep Getting This Caracters Although I Used Encode(utf-8)?"