What is YAML: Understanding the Basics
YAML (YAML Ain’t Markup Language) is a human-readable data serialization format commonly used for configuration files and data exchange between programming languages.
YAML, which stands for “YAML Ain't Markup Language” (originally “Yet Another Markup Language”), is a flexible and accessible data serialization language often used in configuration files, data exchanges, and automated workflows. Its minimalist style and structure make it easily readable for humans (like developers and system administrators) while still being structured enough for machine processing.
What is a YAML file?
YAML files are plain text files that store data using YAML’s indentation-based structure. Because YAML mimics natural language and uses structured whitespace, these files often look like organized lists or outlines.
Common characteristics of YAML files include:
- Using the file extension
.ymlor.yaml - Optionally beginning a document with three dashes (
---) - Using indentation and whitespace to define parent-child hierarchy
- Representing most data as key-value pairs (
key: value) - Optionally ending a document with three periods (...)
Below is a minimal GitLab CI configuration that defines a single job and stage. Even with just a few lines of YAML, you can automate useful tasks like running tests whenever code is pushed. As your workflow grows, you can easily expand this structure to include builds, deployments, and more advanced automation logic.
stages:
- build
- test
build:
stage: build
script:
- echo "Building app"
artifacts:
paths:
- dist/
test:
stage: test
script:
- echo "Running tests"
- ./run-tests.sh
dependencies:
- build
In modern DevOps, YAML has become the go-to standard for configuration files. It’s easy to write, read, and modify, enabling teams to iterate quickly and collaborate effectively across tools and environments.
Specifically, YAML is commonly used in:
- Infrastructure as code (IaC): IaC uses configuration files to define and manage computing infrastructure (virtual machines, networks, cloud services) in a consistent, scalable way. Tools like Ansible and Kubernetes rely on YAML to describe system state and automate tasks.
- Continuous integration and continuous delivery (CI/CD) pipeline configuration: Many CI/CD tools use YAML files to define pipeline jobs, execution stages, triggers, and environment rules. Because these configurations live alongside application code, they are fully version-controlled and easy to automate.
- Deployments: YAML plays a critical role in application deployment, as manifest files define the desired state of applications, guide update processes, and preserve historical configurations to support rollbacks when needed.
YAML has many uses, such as defining configuration settings, the desired state of an application, or the steps of an automated workflow. It plays a crucial role in DevOps, enabling developers to integrate critical tools like Ansible and Kubernetes.
YAML in Ansible
Ansible is an open-source automation tool used to automate a wide range of IT tasks, including configuration management, application deployment, and system provisioning.
An Ansible playbook is essentially a configuration file written in YAML that defines an automated task or process. An Ansible module uses these playbooks to assess the current state of managed nodes and execute the prescribed task.
Together, YAML and Ansible make infrastructure automation more accessible by providing clear, version-controlled instructions that teams can easily audit and maintain.
YAML for Kubernetes
Kubernetes is an open-source orchestration platform used to deploy, scale, and manage containerized applications. Every Kubernetes resource (Pods, Deployments, Services, ConfigMaps) in a cluster is defined using a manifest file, typically written in YAML, that specifies the desired state of a resource, such as:
- How many replicas to run
- What container images to use
- How networking and storage should behave
- Which configuration values or secrets to apply
Kubernetes then continuously compares the cluster’s actual state to the state described in the YAML file and reconciles any differences. Because Kubernetes can become highly complex, YAML plays a crucial role in making its configuration more approachable and maintainable.
YAML’s popularity stems from a combination of readability, flexibility, and broad compatibility. Its core advantages include:
Simplicity: Natural language structure
YAML is user-friendly for both technical and non-technical users because it mirrors natural language more closely than other data formats, such as JavaScript Object Notation (JSON) or Extensible Markup Language (XML). IT professionals don’t need to learn an entirely new programming language or do extensive training to use YAML. Instead, YAML’s minimalist syntax and hierarchical indentation, similar to Python, make it easier for humans to read and modify.
Compatibility: Platform-independent
As a data serialization language, YAML’s purpose is to map data structures to a standardized plain-text format. YAML files can then be translated or interpreted by any other programming language, regardless of operating system or application. This allows YAML to serve as a bridge among tools, platforms, and the humans managing them.
Structure: Ideal for configuration files
YAML is widely used for configuration files because developers value its readability when working with large configurations. Its simple syntax and clear hierarchy make relationships between elements easy to understand, which helps teams parse data and spot mistakes or inconsistencies.
While YAML shares some similarities with other data serialization languages, it has several key characteristics that make it unique:
Structure and syntax
YAML files contain nodes, or individual data elements, that are nested based on a hierarchical structure. Instead of relying on structural characters like brackets or quotation marks, YAML uses whitespace and indentation to express relationships between elements.
Indentation rules
Indentation is the key to achieving the nested structure of a YAML file. Using a consistent system of spaces (tabs are not allowed and will cause an error), developers can indicate hierarchy by indenting child elements more than their parent.
Note: The number of spaces used to indent doesn’t matter as long as it is consistent throughout the entire file. Therefore, all items on the same hierarchical level must be indented with the same number of spaces.
Mapping and sequences
In YAML, lists or collections appear in one of two forms: mappings or sequences:
- Mappings, similar to dictionaries or objects, contain an unordered set of key-value pairs (
key: value) - Sequences, or lists/arrays, are ordered collections, with each item marked by a hyphen and a space (
- item)
Incorporating comments
A useful feature of YAML is its support for comments, indicated with a hash symbol (#). This allows users to leave explanations and contextual information that may be relevant to teammates or future maintainers.
YAML and JSON are both data serialization formats that humans and computers can easily interpret and parse. YAML is considered a superset of JSON, meaning any valid JSON file is also valid YAML.
XML is another popular data serialization format that is less human-readable. Developed in the 1990s, XML predates both JSON and YAML and is widely used in legacy and enterprise applications.
| YAML | JSON | XML | |
|---|---|---|---|
| Characteristics | Prioritizes human readability, mimics natural language, and uses indentation to create structure rather than symbols | Designed for simple, predictable parsing using braces, brackets, and commas; considered readable but slightly more rigid | A highly structured, tag-based format that resembles HTML and can be less human-friendly |
| Unique features | Allows comments, supports multi-document files, includes anchors and aliases for reusing data blocks | Lightweight and minimal with universal support across programming languages | Designed for strict, formal structures and supports schema validation |
| Structure | Indentation-based hierarchy, flexible and expressive | Symbol-based structure using {}, [], and , | Tag-based structure using opening and closing elements |
| Common uses | Configuration files, automation workflows, and infrastructure-as-code | Data exchange between web clients, servers, and APIs | Data exchange in enterprise systems, document formats, and legacy applications |
While YAML is known for its simple structure and syntax, it also has several advanced features that make it powerful enough to handle complex configurations and workflows.
Anchors and aliases
Anchors and aliases allow developers to define a block of data once and reference it multiple times throughout the file. When you update the anchor, every alias automatically reflects the change when parsed. This avoids duplication, reduces file size, and ensures consistency.
- Anchor: A reusable block marked with an ampersand (&)
- Alias: A reference to that block, marked with an asterisk (*)
Tags and data types
YAML infers many basic data types automatically, including integers, floats, booleans, and timestamps. However, when you need more control or are working with complex structures, tags let you explicitly declare a value’s type.
- Tag: Marked with a double exclamation point (!!) followed by the type name
Bundling related documents
Another advanced concept that sets YAML apart is multi-document support. This capability is useful for bundling, transporting, and managing related configurations for a single component. Multi-document support is also useful when developers need to define variations for different contexts.
- Multi-document delineation: Distinct documents or data structures within a single file are separated with three hyphens (
---)
While YAML itself doesn’t support file-to-file linking natively, leading developer platforms such as GitLab have built custom mechanisms that allow developers to link and reuse separate files, scripts, and configuration components.
Writing effective YAML requires attention to detail. The following best practices help ensure your files remain readable and error-free.
Promoting efficiency and simplicity
YAML’s strengths lie in its ease of use, adaptability, and simplicity. Make the most of these qualities with these best practices:
- Reduce parsing errors through consistent indentation, always using spaces instead of tabs
- Minimize repetition and streamline updates by using anchors and aliases instead of duplicating blocks of data
- Enhance readability and simplify version comparison by limiting line length to 80-120 characters per line
- Maintain clarity by using flow style (
{},[]) only for short, simple collections
Writing error-free files
While YAML files are quick to write and modify, configuration errors can cause serious application failures or security issues. To reduce risks, implement these practices:
- Add comments to explain complex or unusual configuration settings and provide context for teammates
- Avoid the most common YAML error by configuring your text editor to convert tabs to spaces
- Quote strings when necessary, including any string that:
- Looks like a number, boolean, or null ("
123", "true", "on") - Contains special characters (
:,{},[],#) - Begins with an alias (*) or sequence indicator (-)
- Looks like a number, boolean, or null ("
- Integrate linters and validators into your workflows and CI/CD pipelines to catch syntax issues and enforce style guidelines
- For platform-specific files, validate YAML against that platform’s schema using tools like kubeconform to ensure both syntactic correctness and proper field usage
YAML’s blend of readability and structure makes it an essential tool in modern DevOps, powering everything from CI/CD pipelines to Kubernetes deployments. By mastering its core concepts, teams can write clean, dependable YAML that supports automation and scales with their infrastructure.
Explore GitLab’s full YAML syntax guide to learn how to build powerful, flexible CI/CD pipelines.
Start shipping better software faster
See what your team can do with the intelligent
DevSecOps platform.