Error In Splitting A Block Of Data Using Python
I have a parsed through a file and I need to split the data according to LogType .Below is my data: ================================================================================
Solution 1:
We could easily split the logs using regular expressions in python
. The following code splits the logs by an or
of two conditions.
Condition1: Multiple occurrences of =
followed by a \n
Condition2: 2 occurrences of \n
If any of the conditions is satisfied, we get the output. filter
will remove any empty strings returned by the split
and return an object
. This object
is then converted to a list
.
import re
text = """===================================================================================
LogType:container-localizer-syslog
Log Upload Time :Thu Jun 25 12:24:45 +0100 2020
LogLength:0
Log Contents:
LogType:stderr
Log Upload Time :Thu Jun 25 12:24:52 +0100 2020
LogLength:3000
Log Contents:
20/06/25 12:19:33 INFO datasources.FileScanRDD
20/06/25 12:19:40 INFO executor.EXECUTOR: Finished task 18.0 in stage 0.0 (TID 18),18994 bytes result sent to driver.
20/06/21 12:19:40 INFO eas
20/06/25 12:20:41 WARN Warning as the node is accessed without started
===================================================================================
LogType:container-localizer-syslog
Log Upload Time :Thu Jun 25 12:24:45 +0100 2020
LogLength:0
"""
output = list(filter(None, re.compile('[=]+.\n|\n\n').split(text)))
print(output)
OUTPUT:
['LogType:container-localizer-syslog\nLog Upload Time :Thu Jun 25 12:24:45 +0100 2020\nLogLength:0\nLog Contents:', 'LogType:stderr\nLog Upload Time :Thu Jun 25 12:24:52 +0100 2020\nLogLength:3000\nLog Contents:\n20/06/25 12:19:33 INFO datasources.FileScanRDD\n20/06/25 12:19:40 INFO executor.EXECUTOR: Finished task 18.0 in stage 0.0 (TID 18),18994 bytes result sent to driver.\n20/06/21 12:19:40 INFO eas\n20/06/25 12:20:41 WARN Warning as the node is accessed without started', 'LogType:container-localizer-syslog\nLog Upload Time :Thu Jun 25 12:24:45 +0100 2020\nLogLength:0\n']
Solution 2:
If you have multiple logs in one file, try this:
import re
results={}
logs = re.split('^=', text, 0, re.MULTILINE)
forlogin logs:
if (len(log) > 0):
first, rest = log.split('=\n')
print('first', first)
print('rest',rest)
print("\n\n")
Output:
first=================================================================================
rest LogType:container-localizer-syslog
Log Upload Time :Thu Jun 2512:24:45+01002020
LogLength:0
Log Contents:
LogType:stderr
Log Upload Time :Thu Jun 2512:24:52+01002020
LogLength:3000
Log Contents:
20/06/2512:19:33 INFO datasources.FileScanRDD
20/06/2512:19:40 INFO executor.EXECUTOR: Finished task 18.0in stage 0.0 (TID 18),18994 bytes result sent to driver.
20/06/2112:19:40 INFO eas
20/06/2512:20:41 WARN Warning as the node is accessed without started
first=================================================================================
rest LogType:container-localizer-syslog
Log Upload Time :Thu Jun 2512:24:45+01002020
LogLength:0
Solution 3:
you can use this as per your question .
text=text.replace('=','')
all_log_types=text.split('\n\n') # splitting based on an Empty lineprint(all_log_types)
Post a Comment for "Error In Splitting A Block Of Data Using Python"