Skip to content Skip to sidebar Skip to footer

Can The [a-zA-Z] Python Regex Pattern Be Made To Match And Replace Non-ASCII Unicode Characters?

In the following regular expression, I would like each character in the string replaced with an 'X', but it isn't working. In Python 2.7: >>> import re >>> re.sub

Solution 1:

You may use

(?![\d_])\w
[^\W\d_]

If used in Python 2.x, the re.U / re.UNICODE modifier is necessary. The (?![\d_]) look-ahead is restricting the \w shorthand class so as it could not match any digits (\d) or underscores. The [^\W\d_] pattern matches any word char other than digits and underscores.

See regex demo.

A Python 3 demo:

import re
print (re.sub(r"(?![\d_])\w","X","dfäg"))
# => XXXX

print (re.sub(r"[^\W\d_]","X","dfäg"))
# => XXXX

As for Python 2:

# -*- coding: utf-8 -*-
import re
s = "dfäg"
w = re.sub(ur'(?![\d_])\w', u'X', s.decode('utf8'), 0, re.UNICODE).encode("utf8")
print(w)

Post a Comment for "Can The [a-zA-Z] Python Regex Pattern Be Made To Match And Replace Non-ASCII Unicode Characters?"