Skip to content Skip to sidebar Skip to footer

Making A Text File Which Will Contain My List Items And Applying Regular Expression To It

I am supposed to make a code which will read a text file containing some words with some common linguistic features. Apply some regular expression to all of the words and write one

Solution 1:

I think the approach you have with one word by line is better since you don't have to trouble yourself with delimiters and striping.

With a file like this:

king
sing
ping
cling
booked
looked
cooked
packed

And a code like this, using re.sub to replace a pattern:

import re
with open("new_abcd.txt", "w") asnew, open("abcd.txt") as original:
    for word in original:
      new_word = re.sub("ing$", "xyz", word)
      new_word = re.sub("ed$", "abcd", new_word)
      new.write(new_word)

It creates a resulting file:

kxyz
sxyz
pxyz
clxyz
bookabcd
lookabcd
cookabcd
packabcd

I tried out with the diacritic you gave us and it seems to work fine:

print(re.sub("ा$", "ing", "का"))
>>> कing

EDIT: added multiple replacement. You can have your replacements into a list and iterate over it to do re.sub as follows.

import re

# List where first is pattern and second is replacement string
replacements = [("ing$", "xyz"), ("ed$", "abcd")]

with open("new_abcd.txt", "w") as new, open("abcd.txt") as original:
    for word in original:
      new_word = word
      for pattern, replacement in replacements:
        new_word = re.sub(pattern, replacement, word)
        if new_word != word:
           break
      new.write(new_word)

This limits one modification per word, only the first that modifies the word is taken.

Solution 2:

It is recommended that for starters, utilize the with context manager to open your file, this way you do not need to explicitly close the file once you are done with it.

Another added advantage is then you are able to process the file line by line, this will be very useful if you are working with larger sets of data. Writing them in a single line or csv format will then all depend on the requirement of your output and how you would want to further process them.

As an example, to read from a file and say substitute a substring, you can use re.sub.

import re

withopen('abcd.txt', 'r') as f:
    for line in f:
        #do something hereprint(re.sub("ing$",'ring',line.strip()))

>>
kring
sring
pring
clring

Another nifty trick is to manage both the input and output utilizing the same context manager like:

import re

withopen('abcd.txt', 'r') as f, open('out_abcd.txt', 'w') as o:
    for line in f:
        #notice that we add '\n' to write each output to a newline
        o.write(re.sub("ing$",'ring',line.strip())+'\n')

This create an output file with your new contents in a very memory efficient way.

If you'd like to write to a csv file or any other specific formats, I highly suggest you spend sometime to understand Python's input and output functions here. If linguistics in text is what you are going for that understand encoding of different languages and further study Python's regex operations.

Post a Comment for "Making A Text File Which Will Contain My List Items And Applying Regular Expression To It"