Skip to content Skip to sidebar Skip to footer

Extract Emails From Html Using Regex

I'm trying to extract any jabber accounts (emails) using regex from this page. I've tried using regex: \w+@[\w.-]+|\{(?:\w+, *)+\w+\}@[\w.-]+ ...but it's not producing the desired

Solution 1:

This might work:

[^\s@<>]+@[^\s@<>]+\.[^\s@<>]+

p = re.compile(ur'[^\s@<>]+@[^\s@<>]+\.[^\s@<>]+', re.MULTILINE | re.IGNORECASE)
test_str = r'...'
re.findall(p, test_str)

See example.

Solution 2:

# -*- coding: utf-8 -*-
s = '''
...YOUR HTML page source code HERE..........

'''import re
reobj = re.compile(r"\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,6}\b", re.IGNORECASE)
print re.findall(reobj, s.decode('utf-8'))

Result

[u'skypeman@jabbim.cz', u'sonics@creep.im', u'voxis_team@lsd-25.ru', u'voxis_team@lsd-25.ru', u'adhrann@jabbim.cz', u'jabberwocky@jabber.systemli.org']

Solution 3:

Try this one:

reg_emails=r'^((([0-9a-zA-Z]+)[\_\.\-])*([0-9a-zA-Z]+))@((([0-9a-zA-Z]+)[\_\.\-])*([0-9a-zA-Z]+))\.((([0-9a-zA-Z]+)[\_\.\-])*([0-9a-zA-Z]+))$'

Post a Comment for "Extract Emails From Html Using Regex"