Skip to content Skip to sidebar Skip to footer

Parse A Plain Text File Into A Csv File Using Python

I have a series of HTML files that are parsed into a single text file using Beautiful Soup. The HTML files are formatted such that their output is always three lines within the tex

Solution 1:

I'm not entirely sure what CSV library you're using, but it doesn't look like Python's built-in one. Anyway, here's how I'd do it:

import csv
import itertools

withopen('extracted.txt', 'r') as in_file:
    stripped = (line.strip() for line in in_file)
    lines = (line for line in stripped if line)
    grouped = itertools.izip(*[lines] * 3)
    withopen('extracted.csv', 'w') as out_file:
        writer = csv.writer(out_file)
        writer.writerow(('title', 'intro', 'tagline'))
        writer.writerows(grouped)

This sort of makes a pipeline. It first gets data from the file, then removes all the whitespace from the lines, then removes any empty lines, then groups them into groups of three, and then (after writing the CSV header) writes those groups to the CSV file.

To combine the last two columns as you mentioned in the comments, you could change the writerow call in the obvious way and the writerows to:

writer.writerows((title, intro + tagline) fortitle, intro, tagline in grouped)

Solution 2:

Perhaps I didn't understand you correctly, but you can do:

file = open("extracted.txt")

# if you don't want to do .strip() again, just create a list of the stripped # lines first.
lines = [line.strip() for line in file if line.strip()]

for i, line in enumerate(lines):
    csv.SetCell(i % 3, line)

Post a Comment for "Parse A Plain Text File Into A Csv File Using Python"