Collapsing Whitespace In A String
Solution 1:
Here's a single-step approach (but the uppercasing actually uses a string method -- much simpler!):
rex = re.compile(r'\W+')
result = rex.sub(' ', strarg).upper()
where strarg
is the string argument (don't use names that shadow builtins or standard library modules, please).
Solution 2:
s = "$$$aa1bb2 cc-dd ee_ff ggg."
re.sub(r'\W+', ' ', s).upper()
# ' AA1BB2 CC DD EE_FF GGG '
Is _ punctuation?
re.sub(r'[_\W]+', ' ', s).upper()
# ' AA1BB2 CC DD EE FF GGG '
Don't want the leading and trailing space?
re.sub(r'[_\W]+', ' ', s).strip().upper()
# 'AA1BB2 CC DD EE FF GGG'
Solution 3:
result = rex.sub(' ', string) # this produces a string with tons of whitespace paddingresult = rex.sub('', result) # this reduces all those spaces
Because you typo'd and forgot to use rex_s for the second call instead. Also, you need to substitute at least one space back in or you'll end up with any multiple-space gap becoming no gap at all, instead of a single-space gap.
result = rex.sub(' ', string) # this produces a string with tons of whitespace paddingresult = rex_s.sub(' ', result) # this reduces all those spaces
Solution 4:
Do you have to use regular expressions? Do you feel you must do it in one line?
>>>import string>>>s = "stuff . // : /// more-stuff .. .. ...$%$% stuff -> DD">>>s2 = ''.join(c for c in s if c in string.letters + ' ')>>>' '.join(s2.split())
'stuff morestuff stuff DD'
Solution 5:
works in python3 this will retain the same whitespace character you collapsed. So if you have a tab and a space next to each other they wont collapse into a single character.
def collapse_whitespace_characters(raw_text):
ret = ''iflen(raw_text) > 1:
prev_char = raw_text[0]
ret += prev_char
for cur_char in raw_text[1:]:
if not cur_char.isspace() or cur_char != prev_char:
ret += cur_charprev_char= cur_char
else:
ret = raw_text
return ret
this one will collapse whitespace sets into the first whitespace character it sees
def collapse_whitespace(raw_text):
ret = ''iflen(raw_text) > 1:
prev_char = raw_text[0]
ret += prev_char
for cur_char in raw_text[1:]:
if not cur_char.isspace() or \
(cur_char.isspace() and not prev_char.isspace()):
ret += cur_charprev_char= cur_char
else:
ret = raw_text
return ret
>>> collapse_whitespace_characters('we like spaces and\t\t TABS AND WHATEVER\xa0\xa0IS') 'we like spaces and\t TABS\tAND WHATEVER\xa0IS' >>> collapse_whitespace('we like spaces and\t\t TABS AND WHATEVER\xa0\xa0IS') 'we like spaces and\tTABS\tAND WHATEVER\xa0IS'
for punctuation
def collapse_punctuation(raw_text):
ret = ''iflen(raw_text) > 1:
prev_char = raw_text[0]
ret += prev_char
for cur_char in raw_text[1:]:
if cur_char.isalnum() or cur_char != prev_char:
ret += cur_charprev_char= cur_char
else:
ret = raw_text
return ret
to actually answer the question
orig ='stuff . // : /// more-stuff .. .. ...$%$% stuff -> DD'
collapse_whitespace(''.join([(c.upper() if c.isalnum() else' ') for c in orig]))
as said, the regexp would be something like
re.sub('\W+', ' ', orig).upper()
Post a Comment for "Collapsing Whitespace In A String"