Regular Expression Extracting Unwanted Ip Address From Log File
I have sever.log file. My regular expression is extracting all the digits which is having 3 digits separated by dots. My code, out and desired is below 192.168.10.20 - - [18/Jul/20
Solution 1:
If you have IPs on each line you may simply read line by line and split them and get the first item:
#line1=r'''192.168.10.20 - - [18/Jul/2017:08:41:37 +0000] "PUT /search/tag/list HTTP/1.0" 200 5042 "http://cooper.com/homepage/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/5342 (KHTML, like Gecko) Chrome/14.0.870.0 Safari/5342"#10.30.24.3 - - [18/Jul/2017:08:45:15 +0000] "POST /search/tag/list HTTP/1.0" 200 4939 "http://www.cole-brown.net/category/main/list/privacy/" "Mozilla/5.0 (X11; Linux i686) AppleWebKit/5322 (KHTML, like Gecko) Chrome/14.0.843.0 Safari/5322"#98.5.45.3 - - [18/Jul/2017:08:45:49 +0000] "GET /apps/cart.jsp?appID=8471 HTTP/1.0" 200 4958 "http://knight-chase.com/post.jsp" "Mozilla/5.0 (Macintosh; U; PPC Mac OS X 10_7_3; rv:1.9.6.20) Gecko/2013-11-03 17:44:01 Firefox/3.8"#98.5.45.3 - - [18/Jul/2017:08:45:49 +0000] "GET /apps/cart.jsp?appID=8471 HTTP/1.0" 200 4958 "http://knight-chase.com/post.jsp" "Mozilla/5.0 (Macintosh; U; PPC Mac OS X 10_7_3; rv:1.9.6.20) Gecko/2013-11-03 17:44:01 Firefox/3.8"'''
result ={}
withopen (r'C:\Users\ubuntu\Desktop\Tests\apache.log', 'r') as fr1:
for line in fr1:
ip = line.split()[0]
if ip in result:
result[ip] += 1else:
result[ip] = 1print(result)
# => {'192.168.10.20': 1, '10.30.24.3': 1, '98.5.45.3': 2}
See the Python demo.
To only get the IP at the start of the line with regex you may use
r'(?m)^\d{1,3}(?:\.\d{1,3}){3}'
See the regex demo.
Note a better IP regex (see this reference) matching at the start of a line is
r'^(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(?:\.(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)){3}'
Or even this one, considering you have a space after each IP:
r'^(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(?:\.(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)){3}(?!\S)'
Details
(?m)^
- start of a line\d{1,3}
- 1 to 3 digits(?:\.\d{1,3}){3}
- three occurrences of.
and 1 to 3 digits.
See the Python demo:
import re
line1=r'''192.168.10.20 - - [18/Jul/2017:08:41:37 +0000] "PUT /search/tag/list HTTP/1.0" 200 5042 "http://cooper.com/homepage/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/5342 (KHTML, like Gecko) Chrome/14.0.870.0 Safari/5342"
10.30.24.3 - - [18/Jul/2017:08:45:15 +0000] "POST /search/tag/list HTTP/1.0" 200 4939 "http://www.cole-brown.net/category/main/list/privacy/" "Mozilla/5.0 (X11; Linux i686) AppleWebKit/5322 (KHTML, like Gecko) Chrome/14.0.843.0 Safari/5322"
98.5.45.3 - - [18/Jul/2017:08:45:49 +0000] "GET /apps/cart.jsp?appID=8471 HTTP/1.0" 200 4958 "http://knight-chase.com/post.jsp" "Mozilla/5.0 (Macintosh; U; PPC Mac OS X 10_7_3; rv:1.9.6.20) Gecko/2013-11-03 17:44:01 Firefox/3.8"'''
rx = r"^\d{1,3}(?:\.\d{1,3}){3}\b"
listofip = re.findall(rx, line1, re.M)
result ={}
for ip in listofip:
if ip in result:
result[ip] += 1else:
result[ip] = 1print(result)
# => {'192.168.10.20': 1, '10.30.24.3': 1, '98.5.45.3': 1}
Solution 2:
Your log file is a CSV file, and the IP address is in the first column. There is no point in using regex for this.
import csv
withopen('apache.log', encoding='utf8') as logfile:
reader = csv.reader(logfile, delimiter=' ')
for row in reader:
print(row[0])
outputs
192.168.10.20 10.30.24.3 98.5.45.3
Post a Comment for "Regular Expression Extracting Unwanted Ip Address From Log File"