Unpickle A Data Structure Vs. Build By Calling Readlines()
Solution 1:
Would there be any performance increase if I did this?
Test it and see!
try:
import cPickle as pickle
except:
import pickle
import timeit
deflines():
withopen('lotsalines.txt') as f:
return f.readlines()
defpickles():
withopen('lotsalines.pickle', 'rb') as f:
return pickle.load(f)
ds = lines()
withopen('lotsalines.pickle', 'wb') as f:
t = timeit.timeit(lambda: pickle.dump(ds, file=f, protocol=-1), number=1)
print('pickle.dump: {}'.format(t))
print('readlines: {}'.format(timeit.timeit(lines, number=10))
print('pickle.load: {}'.format(timeit.timeit(pickles, number=10))
My 'lotsalines.txt' file is just that source duplicated until it's 655360 lines long, or 15532032 bytes.
Apple Python 2.7.2:
readlines: 0.640027999878pickle.load: 2.67698192596
And the pickle file is 19464748 bytes.
Python.org 3.3.0:
readlines: 1.5357899703085423pickle.load: 1.5975534357130527
And it's 20906546 bytes.
So, Python 3 has sped up pickle
quite a bit over Python 2, at least if you use pickle protocol 3, but it's still nowhere near as fast as a simple readlines
. (And readlines
has gotten a lot slower in 3.x, as well as being deprecated.)
But really, if you've got performance concerns, you should consider whether you need the list
in the first place. A quick test shows that building a list
of this size is almost half the cost of the readlines
(timing list(range(655360))
in 3.x, list(xrange(655360))
in 2.x). And it uses a ton of memory (which is probably actually why it's slow, too). If you don't actually need the list
—and usually you don't—just iterate over the file, getting lines as you need them.
Post a Comment for "Unpickle A Data Structure Vs. Build By Calling Readlines()"