YAML Starter - The missing manuals
This page describes the very basics of YAML, as a quick reference.
It doesn’t take much to be able to use the YAML format, but subtleties and corner cases appear very quickly. There are a few gotchas listed and how to avoid them.
YAML is a generic serialization format similar to JSON: it describes an arbitrary data structure in a “human-friendly” unicode format. The meaning of the data structure is up to the program reading the file. YAML has a schema specification, though it looks like it’s rarely used (or used implicitely).
YAML specification: https://yaml.org/spec/1.2.2
Overview
YAML defines three types of structures:
- Scalars: strings, possibly interpreted differently by the parser (e.g. as a number)
- Sequence: ordered list
- Mapping: string to arbitrary value, including another sequence or mapping
Comments start with a hash: #
By default YAML parsers are supposed to abide by the JSON schema, which defines a few scalars: true, false, null, integers/floats. Everything else is treated as a string. Well, except when it’s not: some parsers treat ‘yes’ the same as ‘true’ (see section Scalars below).
YAML files can contain multiple “documents” (i.e. distinct data structures). Each document is supposed to start with a triple dash and end with a triple dot. It seems that a lot of parsers (including Python’s) don’t care and accept files without those markers.
Scalars
- number literals: 1, 1.2, 0x12d4
- boolean: true, false. The Python parser seems to accept yes, no, on, off, and also True, TRUE, Yes, YES, but not TRue or yeS (these are strings)
- null/none: null, ~
- strings: written between single or double quotes, and a lot of other weird ways (see dedicated section below)
Mappings
aka dictionary, hash table.
Example YAML file:
key1: value1
key2: value2
Corresponding json:
{
"key1": "value1"
"key2": "value2"
}
The same can be written “inline”: {key1: value1, key2: value2}
It doesn’t have to on a single line, but probably not a good idea.
Sequences
aka lists
Example YAML file with a list:
- string 1
- 2
- 3 this is a string too
This is the same list of strings and numbers:
[string 1, 2, 3 this is a string too]
Though it’s probably better to always add double quotes to
avoid issues with inline commas: ["string 1", 2, "3 this is a string too"]
Equivalent in JSON:
["string 1", 2, "3 this is a string too"]
Nesting
You can of course nest mappings and sequences.
key1: value1
key2:
- string 1
- 2
- 3 this is a string too
Equivalent JSON:
{
"key1": "value1",
"key2": ["string 1", "2", "3 this is a string too"]
}
Strings
TL;DR: It’s madness. See https://brettweir.com/blog/yaml-strings/ for all the madness glory.
This section shows robust ways of writing strings. Some information here may be specific to the Python parser.
Single lines
# YAML (all equivalent)
# Good practice: alway quote unless it's a single word
key: hello world
key: 'hello world'
key: "hello world"
# JSON
{'key': 'hello world'}
Multiple lines
Concatenates all the lines putting a single space between them
# Note the \n at the end of the string
# JSON: {'key': 'hello world\n'}
key: >
hello
world
# Note the LACK of \n at the end string
# JSON: {'key': 'hello world'}
key: >-
hello
world
Concatenates all the lines putting a single \n between them, similar to Python’s dedent.
Same as above, the dash symbol in |- controls the presence of the final \n.
# JSON: {'key': 'hello\nworld\n'} Note both \n
key: |
hello
world
# JSON: {'key': 'hello\n world'}
key: |-
hello
world
Strings Gotchas
Now for some madness.
TL;DR:
- always quote strings containing a space
- avoid splitting a string when it starts on the same line as a key.
On the pitfall of not quoting a string (beware of inline comments):
# JSON: {'key': 'mambo'}
key: mambo #5
If there are no colons or leading dashes, things are treated as a
single one-line string (similar to the > case above):
# JSON: {'key': 'this is a single line of text'}
key: this
is
a single
line of text
# JSON: {'key': 'this is - a single line of text'}
key: this
is
- a single
line of text
But watch out:
# JSON: {'key': '-is - a single - line of text'}
key:
-is
- a single
- line of text
# JSON: {'key': ['is', 'a single', 'line of text']}
key:
- is
- a single
- line of text
# Syntax error
key:
- is
-a single
- line of text
# Syntax error
key: - hello
Beware of implicit conversions, like yes -> true.
# May be specific to the Python parser
# JSON: {'key': [true, 'we', 'can']}
key:
- yes
- we
- can
More corner cases
# JSON: {'key': 'hello "world"'}
key: hello
"world"
# But: syntax error
key: "hello"
"world"
Yoyonax