Group Data From A Csv File By Field Value
I have a csv file which has duplicate value in first column . I want to collect all value of second column in a list for one value of first column column1 column2 a 54.2 s
Solution 1:
It seems fairly straightforward using python's CSV reader.
data.csv
a,54.2
s,78.5
k,89.62a,77.2a,65.56
script.py
import csv
result = {}
withopen('data.csv', 'rb') as csvfile:
csvreader = csv.reader(csvfile, delimiter=',', quotechar='"')
for row in csvreader:
if row[0] in result:
result[row[0]].append(row[1])
else:
result[row[0]] = [row[1]]
print result
output
{
'a': ['54.2', '77.2', '65.56'],
's': ['78.5'],
'k': ['89.62']
}
As @Pete poined out, you can beautify it using defaultdict:
script.py
import csv
from collections import defaultdict
result= defaultdict(list) # each entry of the dict is, bydefault, an empty list
withopen('data.csv', 'rb') as csvfile:
csvreader = csv.reader(csvfile, delimiter=',', quotechar='"')
forrowin csvreader:
result[row[0]].append(row[1])
print result
Solution 2:
One way of doing this is by using pandas, populate a dataframe, use groupby and then apply list to all the groups:
import pandas as pd
df = pd.DataFrame({'column1':['a','s','k','a','a'],'column2':
[54.2,78.5,89.62,77.2,65.56]})
print(df.groupby('column1')['column2'].apply(list))
output:
column1a[54.2, 77.2, 65.56]k[89.62]s[78.5]Name: column2, dtype: object
Solution 3:
Similar approach I tried Use groupby with apply and last convert Series to json by Series.to_json
Input
df = pd.DataFrame({'column1':['a','s','k','a','a'],'column2':[54.2,78.5,89.62,77.2,65.56]})
inputData
column1 column2
0a54.201 s 78.502 k 89.623a77.204a65.56
Answer :
jsonData = df.groupby('column1')['column2'].apply(list)
print(jsonData.to_json())
# if you want write a file into json
jsonData.to_json(r"D:/abc/def/xyz.json")
Desired output
{"a":[54.2,77.2,65.56],"k":[89.62],"s":[78.5]}
Post a Comment for "Group Data From A Csv File By Field Value"