Can The [a-zA-Z] Python Regex Pattern Be Made To Match And Replace Non-ASCII Unicode Characters?
In the following regular expression, I would like each character in the string replaced with an 'X', but it isn't working. In Python 2.7: >>> import re >>> re.sub
Solution 1:
You may use
(?![\d_])\w
[^\W\d_]
If used in Python 2.x, the re.U
/ re.UNICODE
modifier is necessary. The (?![\d_])
look-ahead is restricting the \w
shorthand class so as it could not match any digits (\d
) or underscores. The [^\W\d_]
pattern matches any word char other than digits and underscores.
See regex demo.
import re
print (re.sub(r"(?![\d_])\w","X","dfäg"))
# => XXXX
print (re.sub(r"[^\W\d_]","X","dfäg"))
# => XXXX
As for Python 2:
# -*- coding: utf-8 -*-
import re
s = "dfäg"
w = re.sub(ur'(?![\d_])\w', u'X', s.decode('utf8'), 0, re.UNICODE).encode("utf8")
print(w)
Post a Comment for "Can The [a-zA-Z] Python Regex Pattern Be Made To Match And Replace Non-ASCII Unicode Characters?"