Data formats — JSON · YAML · TOML · XML
Data formats — JSON · YAML · TOML · XML
When programs exchange data with each other they need an agreed-upon notation. The four we meet most often in config files, API responses, CI workflows, and environment variables are JSON, YAML, TOML, and XML.
1. About the four formats
| Format | Origin | First standard | Common seats |
|---|---|---|---|
| JSON | Douglas Crockford extracted it from JS object syntax in 2001 | RFC 4627 (2006), current RFC 8259 (2017), ECMA-404 (2013) | API responses, configuration, data serialization. |
| YAML | Clark Evans, Oren Ben-Kiki, Ingy döt Net, 2001 | YAML 1.0 (2004), 1.2 (2009) | CI workflows, Kubernetes, configuration. |
| TOML | Tom Preston-Werner (GitHub co-founder), 2013 | TOML 1.0 (2021) | Rust Cargo, Python pyproject, static-site config. |
| XML | W3C, 1998 | XML 1.0 (1998) | SOAP, RSS, some configs (Maven), documents. |
JSON is the simplest. YAML is human-friendly but riddled with traps. TOML was created to dodge those traps. XML has the richest expressiveness but is heavy.
2. JSON
{
"name": "lee",
"age": 30,
"tags": ["dev", "ko"],
"active": true,
"address": null,
"profile": {
"bio": "안녕"
}
}
Six data types — string, number, boolean, null, array, object. No comments, no trailing commas (,]).
| Strengths | Weaknesses |
|---|---|
| Simple and universal standard. | No comments. |
| Almost every language supports it via the standard library. | Quotes and braces feel heavy when typed by hand. |
| Machine-friendly. | Large integers hit limits (IEEE 754 double). |
JSON5 (a 2012 variant allowing comments and trailing commas) and JSON Lines (one object per line, used in logs and streams) are close relatives.
3. YAML
name: lee
age: 30
tags:
- dev
- ko
active: true
address: null
profile:
bio: 안녕
Indentation defines structure. Spaces only (no tabs). Comments start with #. Data types are a superset of JSON's plus multi-document, anchors, and tags.
# Anchors and references (DRY)
defaults: &defaults
retries: 3
timeout: 30
dev:
<<: *defaults
host: localhost
prod:
<<: *defaults
host: example.com
# Multi-line strings
folded: >
Multiple lines
collapsed onto
a single line
literal: |
Multiple lines
preserved
4. YAML's Norway problem
In YAML 1.1 the Norwegian country code NO is parsed as boolean false.
countries:
- NO # ← becomes boolean false
- SE
- DK
yes, no, on, off, Y, N are also booleans. The 1.2 standard removed this interpretation, but many libraries still default to 1.1. To stay safe, quote them:
countries:
- "NO"
- "SE"
Another trap — octal interpretation. Some parsers read 010 as octal 8.
5. TOML
name = "lee"
age = 30
active = true
address = ""
tags = ["dev", "ko"]
[profile]
bio = "안녕"
[servers.dev]
host = "localhost"
port = 8080
[servers.prod]
host = "example.com"
port = 443
[[items]]
id = 1
[[items]]
id = 2
Clear key=value syntax. Comments start with #. Data types are richer than JSON, with first-class dates and times.
| Strengths | Weaknesses |
|---|---|
| Little ambiguity. | Deep nesting gets verbose. |
| Comments allowed. | Not as expressive as YAML. |
| Comfortable to type by hand. | Readability drops in complex structures. |
Cargo (Cargo.toml), Python pyproject.toml, and static-site generators like Hugo and Zola use it as a standard.
6. XML
<?xml version="1.0" encoding="UTF-8"?>
<user id="42">
<name>lee</name>
<age>30</age>
<tags>
<tag>dev</tag>
<tag>ko</tag>
</tags>
</user>
Tags, attributes, namespaces, DTD/XSD schemas — expressive in many directions. Once dominant in SOAP, RSS, Atom, and Office Open XML (.docx), it has lost ground to JSON in recent years.
7. At a glance
| Item | JSON | YAML | TOML | XML |
|---|---|---|---|---|
| Comments | None | Yes | Yes | Yes |
| Indent-sensitive | No | Yes | No | No |
| Human friendliness | Medium | High | High | Low |
| Ambiguity | Low | High | Low | Low |
| Schema | JSON Schema | Borrows JSON Schema | Sparse | Rich (XSD, DTD) |
8. Other paths
Formats we run into in particular niches:
- Protocol Buffers (protobuf) — Google, 2008. Binary. Schema first.
- MessagePack — binary JSON. JSON-compatible plus smaller size.
- CBOR — RFC 8949. IoT-friendly binary format.
- HOCON — Typesafe's Config. A human-friendly variant of JSON.
- EDN — Clojure data format.
- CSV · TSV — the simplest tabular format. Riddled with comma and quote escaping pitfalls.
9. Standard tools per language
// JS — only JSON ships in the standard library
const obj = JSON.parse('{"a":1}');
const s = JSON.stringify(obj, null, 2);
// YAML/TOML are libraries (js-yaml, smol-toml)
import yaml from "js-yaml";
const data = yaml.load(text);
# Python
import json
data = json.loads(s); s = json.dumps(data, indent=2, ensure_ascii=False)
import yaml # PyYAML
data = yaml.safe_load(text)
import tomllib # 3.11+ standard
data = tomllib.loads(text)
Command-line conversion:
cat data.json | jq . # mac · Linux. On Windows use choco install jq
yq -o=json . config.yaml # YAML → JSON
yq -P . data.json # JSON → YAML
10. Common pitfalls
JSON — no trailing commas, keys must use double quotes, no comments.
YAML — never indent with tabs (spaces only). Boolean traps like the Norway problem. Empty values vs null (~, null, empty string) differ in notation.
TOML — defining the same key in multiple places is an error. Array of tables ([[items]]) vs regular tables can be confusing at first.
XML — namespaces stretch parser code. XXE (external entity) security flaws make it worth checking parser options.
Encoding — almost always UTF-8. A BOM can prevent some parsers from finding the first key.
No-comment seats — to leave a memo in JSON config, people sometimes wedge in a _comment key as a workaround. JSON5 or JSONC (VS Code) is an alternative.
Versions — the YAML 1.1 vs 1.2 split. Check the library docs to know which side you are on.
Closing thoughts
Each data format settles into a fixed role and is hard to swap arbitrarily. JSON is the standard for APIs and config, YAML for CI · k8s · docker compose, TOML for language package managers (Rust, Python), XML for legacy seats — once this matrix clicks, even an unfamiliar file reads quickly.
Next
- wsl2
RFC 8259 JSON · ECMA-404 · YAML 1.2 · TOML 1.0 · XML 1.0 · Norway Problem · JSON5 · jq · yq · JSON Schema for reference.