Skip to content Skip to sidebar Skip to footer

Dumping A Dictionary To A Yaml File While Preserving Order

I've been trying to dump a dictionary to a YAML file. The problem is that the program that imports the YAML file needs the keywords in a specific order. This order is not alphabeti

Solution 1:

3 Years later - yaml.dump has a sort_keys kwarg that is set to True by default. Set it to False to not reorder:

withopen(CaseFile, 'w') as f:
    yaml.dump(lyml, f, default_flow_style=False, sort_keys=False)

Solution 2:

Use an OrderedDict instead of dict. Run the below setup code at the start. Now yaml.dump, should preserve the order. More details here and here

defsetup_yaml():
  """ https://stackoverflow.com/a/8661021 """
  represent_dict_order = lambda self, data:  self.represent_mapping('tag:yaml.org,2002:map', data.items())
  yaml.add_representer(OrderedDict, represent_dict_order)    
setup_yaml()

Example: https://pastebin.com/raw.php?i=NpcT6Yc4

Solution 3:

PyYAML supports representer to serialize a class instance to a YAML node.

yaml.YAMLObject uses metaclass magic to register a constructor, which transforms a YAML node to a class instance, and a representer, which serializes a class instance to a YAML node.

Add following lines above your code:

defrepresent_dictionary_order(self, dict_data):
    returnself.represent_mapping('tag:yaml.org,2002:map', dict_data.items())

defsetup_yaml():
    yaml.add_representer(OrderedDict, represent_dictionary_order)

setup_yaml()

Then you can use OrderedDict to preserve the order in yaml.dump():

import yaml
from collections import OrderedDict

def represent_dictionary_order(self, dict_data):
    return self.represent_mapping('tag:yaml.org,2002:map', dict_data.items())

def setup_yaml():
    yaml.add_representer(OrderedDict, represent_dictionary_order)

setup_yaml()    

dic = OrderedDict()

dic['a'] = 1
dic['b'] = 2
dic['c'] = 3print(yaml.dump(dic))
# {a: 1, b: 2, c: 3}

Solution 4:

Your difficulties are a result of assumptions on multiple levels that are incorrect and, depending on your YAML parser, might not be transparently resolvable.

In Python's dict the keys are unordered (at least for Python < 3.6). And even though the keys have some order in the source file, as soon as they are in the dict they aren't:

d = {'WaterDepth':0.,'WaveDirection':0.,'WaveGamma':0.,'WaveAlpha':0.}
for key in d:
    print key

gives:

WaterDepth
WaveGamma
WaveAlpha
WaveDirection

If you want your keys ordered you can use the collections.OrderedDict type (or my own ruamel.ordereddict type which is in C and more than an order of magnitude faster), and you have to add the keys ordered, either as a list of tuples:

from ruamel.ordereddict import ordereddict
# from collections import OrderedDict as ordereddict  # < this will work as well
d = ordereddict([('WaterDepth', 0.), ('WaveDirection', 0.), ('WaveGamma', 0.), ('WaveAlpha', 0.)])
for key in d:
    print key

which will print the keys in the order they were specified in the source.

The second problem is that even if a Python dict has some key ordering that happens to be what you want, the YAML specification does explicitly say that mappings are unordered and that is the way e.g. PyYAML implements the dumping of Python dict to YAML mapping (And the other way around). Also, if you dump an ordereddict or OrderedDict you normally don't get the plain YAML mapping that you indicate you want, but some tagged YAML entry.

As losing the order is often undesirable, in your case because your reader assumes some order, in my case because that made it difficult to compare versions because key ordering would not be consistent after insertion/deletion, I implemented round-trip consistency in ruamel.yaml so you can do:

import sys
import ruamel.yaml as yaml

yaml_str = """\
- BaseFile: myfile.dat
- Environment:
    WaterDepth: 0.0
    WaveDirection: 0.0
    WaveGamma: 0.0
    WaveAlpha: 0.0
"""

data = yaml.load(yaml_str, Loader=yaml.RoundTripLoader)
print(data)
yaml.dump(data, sys.stdout, Dumper=yaml.RoundTripDumper)

which gives you exactly your output result. data works as a dict (and so does `data['Environment'], but underneath they are smarter constructs that preserve order, comments, YAML anchor names etc). You can of course change these (adding/deleting key-value pairs), which is easy, but you can also build these from scratch:

import sys
import ruamel.yamlas yaml
from ruamel.yaml.commentsimportCommentedMap

baseFile = 'myfile.dat'
lyml = [{'BaseFile': baseFile}]
lyml.append({'Environment': CommentedMap([('WaterDepth', 0.), ('WaveDirection', 0.), ('WaveGamma', 0.), ('WaveAlpha', 0.)])})
yaml.dump(data, sys.stdout, Dumper=yaml.RoundTripDumper)

Which again prints the contents with keys in the order you want them. I find the later less readable, than when starting from a YAML string, but it does construct the lyml data structure somewhat faster.

Post a Comment for "Dumping A Dictionary To A Yaml File While Preserving Order"