Playbooks get unwieldy fast. What starts as a clean 30-line YAML file turns into a 400-line monster with duplicated tasks, inconsistent variable names, and no hope of reuse across projects. Ansible roles fix this by giving your automation a standardized structure that you can share, version, and compose like building blocks.
This guide walks through building two real roles from scratch: a base_system role that handles OS-level setup across Rocky Linux and Ubuntu, and an nginx_app role that depends on it. Along the way, we cover the directory structure, variable precedence, import_role vs include_role, dependency chains, and a real cross-platform gotcha that will bite you if you don’t plan for it. If you’ve been writing Ansible playbooks and want to level up, roles are the next step.
Current as of April 2026. Verified on Rocky Linux 10.1 and Ubuntu 24.04 with ansible-core 2.16.14, community.general 12.5.0
Prerequisites
Before starting, you need:
- Ansible installed on a control node (tested with ansible-core 2.16.14 on Rocky Linux 10.1)
- Two or more managed nodes with SSH access configured. This guide uses Rocky Linux 10.1 (10.0.1.11) and Ubuntu 24.04 (10.0.1.12)
- Working knowledge of Ansible playbooks and ad-hoc commands
- The
community.generalandansible.posixcollections installed (ansible-galaxy collection install community.general ansible.posix)
If you haven’t set up Ansible yet, follow Install and Configure Ansible on Linux first. Keep the Ansible cheat sheet handy as a quick reference while working through this guide.
What Is an Ansible Role?
A role is a standardized directory structure that packages related tasks, handlers, variables, templates, and files into a single reusable unit. Instead of cramming everything into one playbook, you break your automation into roles like base_system, nginx_app, postgresql, each handling one concern.
The practical benefits are significant. Roles enforce the DRY principle: write your base system setup once, use it in every project. Teams can own specific roles independently. You can version them with Git tags and publish them to Ansible Galaxy for the community. Most importantly, roles make your automation testable because each role has a clear boundary and a predictable interface through variables.
Role Directory Structure
The ansible-galaxy command scaffolds the standard role layout for you:
ansible-galaxy role init roles/base_system
The output confirms the role was created:
- Role roles/base_system was created successfully
List the generated files to see the full structure:
find roles/base_system -type f | sort
The layout looks like this:
roles/base_system/defaults/main.yml
roles/base_system/handlers/main.yml
roles/base_system/meta/main.yml
roles/base_system/README.md
roles/base_system/tasks/main.yml
roles/base_system/tests/inventory
roles/base_system/tests/test.yml
roles/base_system/vars/main.yml
Each directory serves a specific purpose. Here’s what goes where:
| Directory | Purpose | When You Need It |
|---|---|---|
defaults/ | Default variable values (lowest precedence) | Always. This is the role’s public API |
tasks/ | The main task list executed by the role | Always. The core of the role |
handlers/ | Handlers triggered by notify in tasks | When tasks need service restarts or reloads |
vars/ | High-precedence variables (overrides defaults) | For values that should rarely be changed |
templates/ | Jinja2 templates deployed with template module | Config files that need variable substitution |
files/ | Static files deployed with copy module | Scripts, certificates, anything deployed as-is |
meta/ | Role metadata and dependency declarations | When publishing to Galaxy or chaining roles |
tests/ | Test playbook and inventory for CI/CD | When testing the role in isolation |
You won’t always need every directory. A simple role might only have tasks/ and defaults/. Delete what you don’t use.
Build Your First Role: base_system
This role handles the common ground that every server needs: base packages, an admin user, timezone, and firewall ports. The key challenge is making it work across both RHEL and Debian families without separate roles.
Define Defaults
Open the defaults file. These values serve as the role’s public interface, and consumers can override any of them:
vi roles/base_system/defaults/main.yml
Add the following variable definitions:
---
base_packages_rhel:
- vim-enhanced
- tmux
- curl
- wget
base_packages_debian:
- vim
- tmux
- curl
- wget
firewall_allowed_ports:
- "22/tcp"
admin_user: deployer
Notice the separate package lists for each OS family. Package names differ between distributions (vim-enhanced on RHEL, vim on Debian), and trying to unify them into one list leads to failures.
Write the Tasks
The tasks file is where the actual work happens. Open it:
vi roles/base_system/tasks/main.yml
Add the OS-aware task definitions:
---
- name: Install base packages (RHEL)
ansible.builtin.dnf:
name: "{{ base_packages_rhel }}"
state: present
when: ansible_os_family == "RedHat"
- name: Install base packages (Debian)
ansible.builtin.apt:
name: "{{ base_packages_debian }}"
state: present
update_cache: true
when: ansible_os_family == "Debian"
- name: Create admin user
ansible.builtin.user:
name: "{{ admin_user }}"
groups: "{{ (ansible_os_family == 'RedHat') | ternary('wheel', 'sudo') }}"
append: true
shell: /bin/bash
create_home: true
- name: Set timezone to UTC
community.general.timezone:
name: UTC
notify: Restart cron
- name: Configure firewall ports (RHEL)
ansible.posix.firewalld:
port: "{{ item }}"
permanent: true
state: enabled
immediate: true
loop: "{{ firewall_allowed_ports }}"
when: ansible_os_family == "RedHat"
Two patterns worth noting here. The when: ansible_os_family conditionals let the same role run on both RHEL and Debian without branching into separate roles. The ternary filter on the user task picks wheel on RHEL or sudo on Debian for the admin group. We’ll come back to why that ternary is critical in the troubleshooting section.
Add Handlers
Handlers run only when triggered by a notify directive, which keeps them from executing unnecessarily. Open the handlers file:
vi roles/base_system/handlers/main.yml
The cron service has different names across OS families, so the handler accounts for that:
---
- name: Restart cron
ansible.builtin.service:
name: "{{ (ansible_os_family == 'RedHat') | ternary('crond', 'cron') }}"
state: restarted
When the timezone task reports a change, Ansible triggers this handler at the end of the play. If the timezone was already set to UTC, the handler never fires.
Build a Dependent Role: nginx_app
Real infrastructure involves layers. A web server role shouldn’t reinstall base packages or recreate the admin user. It should declare a dependency on base_system and focus on its own job. That’s what meta/main.yml dependencies do.
Scaffold the second role:
ansible-galaxy role init roles/nginx_app
Declare the Dependency
Edit the metadata file to declare base_system as a dependency and pass custom variables (opening HTTP/HTTPS ports in addition to SSH):
vi roles/nginx_app/meta/main.yml
Add the following:
---
galaxy_info:
author: John Kibet
description: Deploy and configure Nginx web server
license: MIT
min_ansible_version: "2.16"
platforms:
- name: EL
versions: ["10"]
- name: Ubuntu
versions: [noble]
dependencies:
- role: base_system
vars:
firewall_allowed_ports:
- "22/tcp"
- "80/tcp"
- "443/tcp"
When Ansible applies nginx_app, it first resolves and executes base_system with the overridden firewall_allowed_ports list. The dependency runs before any tasks in nginx_app itself.
Role Defaults and Tasks
Set the Nginx defaults:
vi roles/nginx_app/defaults/main.yml
These are the variables consumers can override when including the role:
---
nginx_port: 80
nginx_server_name: _
nginx_root: /usr/share/nginx/html
Now write the tasks. Open the tasks file:
vi roles/nginx_app/tasks/main.yml
The tasks handle installation, configuration, and service management across both OS families:
---
- name: Install Nginx (RHEL)
ansible.builtin.dnf:
name: nginx
state: present
when: ansible_os_family == "RedHat"
- name: Install Nginx (Debian)
ansible.builtin.apt:
name: nginx
state: present
when: ansible_os_family == "Debian"
- name: Deploy Nginx configuration
ansible.builtin.template:
src: nginx.conf.j2
dest: "{{ (ansible_os_family == 'RedHat') | ternary('/etc/nginx/conf.d/app.conf', '/etc/nginx/sites-enabled/app.conf') }}"
mode: "0644"
notify: Reload Nginx
- name: Remove default site (Debian)
ansible.builtin.file:
path: /etc/nginx/sites-enabled/default
state: absent
when: ansible_os_family == "Debian"
notify: Reload Nginx
- name: Start and enable Nginx
ansible.builtin.service:
name: nginx
state: started
enabled: true
The config path differs between distributions: RHEL uses /etc/nginx/conf.d/ while Debian uses /etc/nginx/sites-enabled/. The ternary filter handles this cleanly in one task instead of duplicating it with when conditionals.
Create the Jinja2 Template
Create the templates directory and add the Nginx config template:
mkdir -p roles/nginx_app/templates
Edit the template file:
vi roles/nginx_app/templates/nginx.conf.j2
The template uses role variables for all configurable values:
server {
listen {{ nginx_port }};
server_name {{ nginx_server_name }};
root {{ nginx_root }};
index index.html;
location / {
try_files $uri $uri/ =404;
}
}
Don’t forget the handler. Create roles/nginx_app/handlers/main.yml:
vi roles/nginx_app/handlers/main.yml
Add the reload handler:
---
- name: Reload Nginx
ansible.builtin.service:
name: nginx
state: reloaded
Run the Roles
Create a playbook that applies the nginx_app role (which automatically pulls in base_system as a dependency):
vi site.yml
The playbook is minimal because all the logic lives in the roles:
---
- name: Configure servers with roles
hosts: all
become: true
roles:
- nginx_app
Execute it:
ansible-playbook -i inventory site.yml
The output shows the full dependency chain in action. Notice how base_system tasks run first on both hosts, then nginx_app tasks follow:
PLAY [Configure servers with roles] ********************************************
TASK [Gathering Facts] *********************************************************
ok: [managed-rocky]
ok: [managed-ubuntu]
TASK [base_system : Install base packages (RHEL)] ******************************
skipping: [managed-ubuntu]
ok: [managed-rocky]
TASK [base_system : Install base packages (Debian)] ****************************
skipping: [managed-rocky]
ok: [managed-ubuntu]
TASK [base_system : Create admin user] *****************************************
ok: [managed-rocky]
changed: [managed-ubuntu]
TASK [base_system : Set timezone to UTC] ***************************************
ok: [managed-rocky]
changed: [managed-ubuntu]
TASK [base_system : Configure firewall ports (RHEL)] ***************************
skipping: [managed-ubuntu]
ok: [managed-rocky] => (item=22/tcp)
ok: [managed-rocky] => (item=80/tcp)
ok: [managed-rocky] => (item=443/tcp)
TASK [nginx_app : Install Nginx (RHEL)] ****************************************
skipping: [managed-ubuntu]
ok: [managed-rocky]
TASK [nginx_app : Install Nginx (Debian)] **************************************
skipping: [managed-rocky]
changed: [managed-ubuntu]
TASK [nginx_app : Deploy Nginx configuration] **********************************
ok: [managed-rocky]
changed: [managed-ubuntu]
TASK [nginx_app : Remove default site (Debian)] ********************************
skipping: [managed-rocky]
changed: [managed-ubuntu]
TASK [nginx_app : Start and enable Nginx] **************************************
ok: [managed-rocky]
ok: [managed-ubuntu]
RUNNING HANDLER [base_system : Restart cron] ***********************************
changed: [managed-ubuntu]
RUNNING HANDLER [nginx_app : Reload Nginx] *************************************
changed: [managed-ubuntu]
PLAY RECAP *********************************************************************
managed-rocky : ok=8 changed=0 unreachable=0 failed=0 skipped=3 rescued=0 ignored=0
managed-ubuntu : ok=10 changed=7 unreachable=0 failed=0 skipped=3 rescued=0 ignored=0
The Rocky node shows changed=0 because the roles were already applied during testing. The Ubuntu node shows changed=7 for the first run. This is idempotency at work: run it again and both will show zero changes.
Variable Precedence in Roles
Understanding where to put variables is one of the trickiest parts of roles. Ansible has over 20 levels of variable precedence, but for roles, four levels matter most:
- Role defaults (
defaults/main.yml) have the lowest precedence. They’re meant to be overridden - Role vars (
vars/main.yml) have higher precedence. Use these for values that shouldn’t change often - Role parameters (passed when including the role in a playbook) override both defaults and vars
- Extra vars (
--extra-varson the command line) override everything, always
In production, you’ll want to keep most values in defaults/ so consumers can customize the role without forking it. Reserve vars/ for internal constants that callers should not change (like OS-specific paths).
Here’s a practical example. The base_system role defines admin_user: deployer in defaults. Running the playbook normally creates that user:
id deployer
The system confirms the user exists with the expected group membership:
uid=1001(deployer) gid=1001(deployer) groups=1001(deployer),10(wheel)
Override it at runtime with --extra-vars:
ansible-playbook site.yml --extra-vars "admin_user=superadmin"
Now a different user gets created instead:
uid=1002(superadmin) gid=1002(superadmin) groups=1002(superadmin),10(wheel)
Extra vars always win. This is useful for CI/CD pipelines where you want to inject environment-specific values without touching role files. For a complete variable reference, check the Ansible cheat sheet which covers all precedence levels.
import_role vs include_role
Ansible gives you two ways to use roles inside tasks: import_role (static) and include_role (dynamic). The difference matters more than you’d expect.
With import_role, Ansible parses the role at playbook load time. All tasks are visible upfront. Run --list-tasks on a playbook that uses import_role:
ansible-playbook test-import.yml --list-tasks
Every task from the role appears in the listing:
playbook: test-import.yml
play #1 (all): Test import_role TAGS: []
tasks:
base_system : Install base packages (RHEL) TAGS: []
base_system : Install base packages (Debian) TAGS: []
base_system : Create admin user TAGS: []
base_system : Set timezone to UTC TAGS: []
base_system : Configure firewall ports (RHEL) TAGS: []
Now try the same with include_role:
ansible-playbook test-include.yml --list-tasks
The individual tasks are hidden because they’re resolved at runtime:
playbook: test-include.yml
play #1 (all): Test include_role TAGS: []
tasks:
Include base_system TAGS: []
Here’s when to use each:
| Feature | import_role (static) | include_role (dynamic) |
|---|---|---|
| Parsing time | Playbook load | Runtime (when reached) |
Tasks visible in --list-tasks | Yes | No |
Works with when on each task | Yes (applied to every task) | Yes (applied only to the include) |
| Can loop over | No | Yes |
| Tags apply to | All tasks inside the role | Only the include statement |
| Best for | Standard role application | Conditional or looped roles |
The rule of thumb: use import_role (or the roles: keyword in a play) by default. Switch to include_role only when you need to loop over a role or conditionally include it based on runtime facts. If you’re working with Ansible Vault for sensitive variables, both import and include methods handle encrypted variables the same way.
The Wheel Group Error: Writing OS-Aware Roles
This is the kind of gotcha that costs you an hour if you don’t know about it. When the base_system role was first tested with a hardcoded groups: wheel on the admin user task, it worked fine on Rocky Linux. Then it hit Ubuntu:
fatal: [managed-ubuntu]: FAILED! => {"changed": false, "msg": "Group wheel does not exist"}
Ubuntu (and all Debian-based systems) use the sudo group instead of wheel for administrative access. The fix uses Ansible’s ternary filter to pick the right group based on the OS family:
groups: "{{ (ansible_os_family == 'RedHat') | ternary('wheel', 'sudo') }}"
This pattern applies broadly. Any time you write a role that targets multiple OS families, watch for these differences:
| Item | RHEL/Rocky | Ubuntu/Debian |
|---|---|---|
| Admin group | wheel | sudo |
| Cron service | crond | cron |
| Package manager | dnf | apt |
| Nginx config path | /etc/nginx/conf.d/ | /etc/nginx/sites-enabled/ |
| Firewall tool | firewalld | ufw |
| SELinux/AppArmor | SELinux enforcing | AppArmor (usually permissive) |
In production, you’ll encounter this with database roles, LEMP stack roles, and basically any role that touches system-level resources. Build the OS-awareness in from day one. Retrofitting it later means rewriting and retesting everything.
Organizing a Multi-Role Project
Once you have several roles, project layout matters. Here’s the structure that scales well:
project/
├── ansible.cfg
├── inventory/
│ ├── production
│ └── staging
├── group_vars/
│ ├── all.yml
│ └── webservers.yml
├── host_vars/
│ └── managed-rocky.yml
├── roles/
│ ├── base_system/
│ └── nginx_app/
├── site.yml
├── webservers.yml
└── dbservers.yml
Split your playbooks by function: site.yml applies everything, webservers.yml targets only web servers. Group vars let you set variables per inventory group without cluttering role defaults. This separation becomes essential when managing larger deployments like Kubernetes clusters where dozens of roles interact.
Sharing Roles with Ansible Galaxy
Ansible Galaxy is both a public registry and a CLI tool. You can pull community roles or publish your own. To install a role from Galaxy:
ansible-galaxy role install geerlingguy.docker
For production use, pin roles to specific versions in a requirements.yml file:
---
roles:
- name: geerlingguy.docker
version: "7.4.1"
- name: geerlingguy.nginx
version: "3.2.0"
Install all pinned roles at once:
ansible-galaxy install -r requirements.yml
Pinning versions prevents surprises. An unpinned role that auto-updates to a breaking version at 2 AM is not a fun way to start your morning. For managing Docker containers with Ansible, Galaxy roles can save significant setup time.
Troubleshooting
Error: “Group wheel does not exist”
This occurs on Debian/Ubuntu when a task hardcodes groups: wheel. The admin group on Debian systems is sudo, not wheel. Use the ternary filter as shown above to handle both families.
Role dependency runs twice
If two roles both depend on base_system, Ansible runs it only once by default (this is called deduplication). But if the two declarations pass different variables, Ansible runs it twice with the respective variable sets. This is usually what you want, but if it causes issues, set allow_duplicates: false in the dependency’s meta/main.yml.
Handler not firing after template change
Handlers only run when the notifying task reports changed. If the template content hasn’t actually changed (same variables, same template), the task reports ok and the handler won’t fire. This is correct behavior. If you need to force a handler, use ansible.builtin.meta: flush_handlers or run with --force-handlers.
Production Hardening Tips
Before using roles in production, consider these practices from real-world deployments:
- Tag everything. Add tags to tasks so you can run subsets:
ansible-playbook site.yml --tags "nginx"skips base_system entirely when you only need to update the web config - Use
ansible-lint. It catches common mistakes like using deprecated modules, missing FQCNs, and incorrectmodeformats before they reach production - Test roles in isolation. The
tests/directory exists for a reason. Create a test playbook that applies only one role against a throwaway VM - Version your roles with Git tags. When something breaks in production, you need to know which role version caused it and roll back to the previous tag
- Keep defaults minimal. Every variable in
defaults/main.ymlis part of the role’s public API. Once published, changing a default name is a breaking change for every consumer
For more complex automation patterns, explore the official Ansible roles documentation which covers advanced features like role argument validation and conditional imports.