Skip to content Skip to sidebar Skip to footer

Python Ordered Dict Issue

If I have a CSV file that has a dictionary value for each line (with columns being ['Location'], ['MovieDate'], ['Formatted_Address'], ['Lat'], ['Lng']), I have been told to use Or

Solution 1:

You just need a couple of changes, you need to join the lat and long,to remove the dupe lat and longs we need to also use that as the key:

withopen("data.csv") as f,open("new.csv" ,"w") asout:
    r = csv.reader(f)
    wr= csv.writer(out)
    header = next(r)
    forrowin r:
        od.setdefault((row[0], row[-2], row[-1]), []).append(" ".join(row[1:-2]))
    wr.writerow(header)
    for loc,vals in od.items():
        wr.writerow([loc[0]] + vals+list(loc[1:]))

Output:

Location,MovieDate,Formatted_Address,Lat,Lng
"Edgebrook Park, Chicago ","Jun-7 A League of Their Own Edgebrook Park, 6525 North Hiawatha Avenue, Chicago, IL 60646, USA","Jun-9 It's a Mad, Mad, Mad, Mad World Edgebrook Park, 6525 North Hiawatha Avenue, Chicago, IL 60646, USA",41.9998876,-87.7627672

A League of Their Own is first because it comes before the mad,mad line, row[1:-2] gets everything bar the lat,long and location, we store the lat and long in our key tuple to avoid duplicating writing it at the end of each row.

Using names and unpacking might make it a little easier to follow:

withopen("data.csv") as f, open("new.csv", "w") asout:
    r = csv.reader(f)
    wr = csv.writer(out)
    header = next(r)
    for row in r:
        loc, mov, form, lat, long = row
        od.setdefault((loc, lat, long), []).append("{} {}".format(mov, form))
    wr.writerow(header)
    for loc, vals in od.items():
        wr.writerow([loc[0]] + vals + list(loc[1:]))

Using csv.Dictwriter to keep five columns:

od = OrderedDict()
import csv

withopen("data.csv") as f, open("new.csv", "w") asout:
    r = csv.DictReader(f,fieldnames=['Location', 'MovieDate', 'Formatted_Address', 'Lat', 'Lng'])
    wr = csv.DictWriter(out, fieldnames=r.fieldnames)
    forrowin r:
        od.setdefault(row["Location"], dict(Location=row["Location"], Lat=row["Lat"], Lng=row["Lng"],
                                        MovieDate=[], Formatted_Address=row["Formatted_Address"]))

        od[row["Location"]]["MovieDate"].append(row["MovieDate"])
    for loc, vals in od.items():
        od[loc]["MovieDate"]= ", ".join(od[loc]["MovieDate"])
        wr.writerow(vals)

# Output:

"Edgebrook Park, Chicago ","Jun-7 A League of Their Own, Jun-9 It's a Mad, Mad, Mad, Mad World","Edgebrook Park, 6525 North Hiawatha Avenue, Chicago, IL 60646, USA",41.9998876,-87.7627672

So the five columns remain intact, we joined the "MovieDate" into single strings and Formatted_Address=form is always unique so we don't need to update that.

It turns out to match what you wanted all we needed to do was concatenate the MovieDate's and remove duplicate entries for Location, Lat, Lng and 'Formatted_Address'.

Solution 2:

Assuming location is the first item of the row:

dict = {}
for line in f:
    if line[0] notindict:
        dict[line[0]] = []
    dict[line[0]].append(line[1:])

And for every location, you have the entire rest of the row

forkey, value in dict.iteritems():
    out.write(key + value)

Post a Comment for "Python Ordered Dict Issue"