Skip to content Skip to sidebar Skip to footer

Error In Splitting A Block Of Data Using Python

I have a parsed through a file and I need to split the data according to LogType .Below is my data: ================================================================================

Solution 1:

We could easily split the logs using regular expressions in python. The following code splits the logs by an or of two conditions.

Condition1: Multiple occurrences of = followed by a \n

Condition2: 2 occurrences of \n

If any of the conditions is satisfied, we get the output. filter will remove any empty strings returned by the split and return an object. This object is then converted to a list.

import re

text = """===================================================================================
LogType:container-localizer-syslog
Log Upload Time :Thu Jun 25 12:24:45 +0100 2020
LogLength:0
Log Contents:

LogType:stderr
Log Upload Time :Thu Jun 25 12:24:52 +0100 2020
LogLength:3000
Log Contents:
20/06/25 12:19:33 INFO datasources.FileScanRDD
20/06/25 12:19:40 INFO executor.EXECUTOR: Finished task 18.0 in stage 0.0 (TID 18),18994 bytes result sent to driver.
20/06/21 12:19:40 INFO eas
20/06/25 12:20:41 WARN Warning as the node is accessed without started

===================================================================================
LogType:container-localizer-syslog
Log Upload Time :Thu Jun 25 12:24:45 +0100 2020
LogLength:0
"""


output = list(filter(None, re.compile('[=]+.\n|\n\n').split(text)))

print(output)

OUTPUT:

['LogType:container-localizer-syslog\nLog Upload Time :Thu Jun 25 12:24:45 +0100 2020\nLogLength:0\nLog Contents:', 'LogType:stderr\nLog Upload Time :Thu Jun 25 12:24:52 +0100 2020\nLogLength:3000\nLog Contents:\n20/06/25 12:19:33 INFO datasources.FileScanRDD\n20/06/25 12:19:40 INFO executor.EXECUTOR: Finished task 18.0 in stage 0.0 (TID 18),18994 bytes result sent to driver.\n20/06/21 12:19:40 INFO eas\n20/06/25 12:20:41 WARN Warning as the node is accessed without started', 'LogType:container-localizer-syslog\nLog Upload Time :Thu Jun 25 12:24:45 +0100 2020\nLogLength:0\n']

Solution 2:

If you have multiple logs in one file, try this:

import re

results={}
logs = re.split('^=', text, 0, re.MULTILINE)

forlogin logs:
    if (len(log) > 0):
        first, rest = log.split('=\n')
        print('first', first)
        print('rest',rest)
        print("\n\n")

Output:

first=================================================================================
rest LogType:container-localizer-syslog
Log Upload Time :Thu Jun 2512:24:45+01002020
LogLength:0
Log Contents:

LogType:stderr
Log Upload Time :Thu Jun 2512:24:52+01002020
LogLength:3000
Log Contents:
20/06/2512:19:33 INFO datasources.FileScanRDD
20/06/2512:19:40 INFO executor.EXECUTOR: Finished task 18.0in stage 0.0 (TID 18),18994 bytes result sent to driver.
20/06/2112:19:40 INFO eas
20/06/2512:20:41 WARN Warning as the node is accessed without started



first=================================================================================
rest LogType:container-localizer-syslog
Log Upload Time :Thu Jun 2512:24:45+01002020
LogLength:0

Solution 3:

you can use this as per your question .

text=text.replace('=','')
 all_log_types=text.split('\n\n') # splitting based on an Empty lineprint(all_log_types)

Post a Comment for "Error In Splitting A Block Of Data Using Python"