Ansible

Test Ansible Roles with Molecule: Real Pitfalls and Fixes

You wrote a slick Ansible role on your laptop. It converged in 12 seconds. Three weeks later a colleague applies it to a fresh Rocky 10 box at 11 PM, the play stops on a missing package, the on-call shift turns into a debugging marathon, and the postmortem politely calls it “scope drift.” Roles fail in production for the same reason any code fails in production: they were never run anywhere except the author’s machine.

Original content from computingforgeeks.com - post 167151

Molecule is the standard test harness for Ansible roles. It boots a container or VM, runs your role against it, asserts the result, and tears the world down before you’ve had a chance to leave a stale resource lying around. This guide builds a real role, runs it through every Molecule phase on Rocky Linux 10, then proves the same role still passes when the matrix expands to Ubuntu 24.04. The pitfalls along the way are not made up. Each one was hit during the testing pass that produced this article.

Tested April 2026 on Rocky Linux 10.1 with Ansible 2.18.16, Molecule 26.4.0, Podman 5.6.0, and the molecule_plugins podman driver 25.8.12.

What you need

  • A control host running Rocky Linux 10 or another modern Linux. Two CPUs and 4 GB of RAM is plenty.
  • Network access to quay.io and docker.io for the test container images.
  • Familiarity with writing Ansible roles and the basic playbook flow. If you are brand new to Ansible, work through the automation guide first.

The full role and Molecule scaffolding ships in the companion repo at c4geeks/ansible/intermediate/ansible-molecule-testing. Clone it if you want to skip ahead and read tested code first.

Step 1: Set reusable shell variables

Two paths show up in nearly every command below. Pin them once at the top of the session so the rest of the guide works as a copy-paste:

export LAB_DIR="$HOME/molecule-lab"
export ROLE_NAME="cfg_nginx_site"

Confirm the variables hold and aren’t blank:

echo "Lab:  ${LAB_DIR}"
echo "Role: ${ROLE_NAME}"

Re-run the two export lines if you reconnect or open a new shell.

Step 2: Bootstrap the controller

Molecule, Ansible, and the driver plugins live in a single Python virtual environment. Using a venv keeps the toolchain isolated from the system Python and lets you upgrade Molecule without breaking other Ansible work on the same host. Install the bare minimum from the OS, then use uv for the Python side because it resolves dependencies in seconds rather than minutes:

sudo dnf -y install epel-release
sudo dnf -y install python3 git podman curl
curl -LsSf https://astral.sh/uv/install.sh | sh
echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.bashrc
source ~/.bashrc

Create the lab and the venv, then install Ansible plus Molecule and the podman driver:

mkdir -p "${LAB_DIR}" && cd "${LAB_DIR}"
uv venv .venv --python 3.12
source .venv/bin/activate
uv pip install ansible-core==2.18.* molecule molecule-plugins[podman,docker] ansible-lint pytest-testinfra

You should now have working ansible and molecule binaries:

ansible --version | head -2
molecule --version

The output reports the exact versions used in this guide:

ansible [core 2.18.16]
  config file = None
molecule 26.4.0 using python 3.12 
    ansible:2.18.16
    podman:25.8.12 from molecule_plugins requiring collections: containers.podman>=1.7.0 ansible.posix>=1.3.0

Those numbers matter. Future Molecule releases sometimes change defaults, and pinning your CI image to known-good versions saves on debugging time when an unrelated PR turns the build red.

Step 3: The role under test

Real testing needs a real role. Anything trivial like “create a file” hides the failure modes that Molecule catches. The role used here, cfg_nginx_site, installs nginx, renders a virtual host with a Jinja2 template, opens the firewall when running on bare metal, and refuses to proceed if nginx -t rejects the rendered config. It supports both RHEL and Debian families through OS-conditional includes.

Scaffold the role with ansible-galaxy so the directory layout matches the convention every contributor expects:

mkdir -p "${LAB_DIR}/roles" && cd "${LAB_DIR}/roles"
ansible-galaxy role init "${ROLE_NAME}"
cd "${ROLE_NAME}"

Replace the generated stubs with the role content from the companion repo. The full source lives at roles/cfg_nginx_site. Three pieces are worth highlighting because they shape what Molecule will be asked to verify.

The OS dispatcher in tasks/main.yml keeps the entry point thin and routes installation through OS-specific include files:

- name: Include OS-family specific install tasks
  ansible.builtin.include_tasks: "install_{{ ansible_os_family | lower }}.yml"

The vhost template in templates/site.conf.j2 renders with whatever values the consumer passed in defaults/main.yml or overrode at play time. The verifier later asserts both the server name and the /healthz location are present in the rendered file. For a refresher on the templating language, see the Jinja2 templates guide.

The firewall block in tasks/main.yml is gated to skip when the play runs inside a container, because the kernel keyring is not available there:

- name: Configure firewall (skipped when running inside a container)
  when:
    - nginx_site_manage_firewall
    - ansible_virtualization_type not in ['container', 'docker', 'podman']
  block:
    - name: Open HTTP port (firewalld, RHEL family)
      ansible.posix.firewalld:
        port: "{{ nginx_site_listen_port }}/tcp"
        zone: "{{ nginx_site_open_firewall_zone }}"
        permanent: true
        immediate: true
        state: enabled
      when: ansible_os_family == 'RedHat'

That guard is non-negotiable for container-based testing. Without it the role explodes the moment Molecule tries to talk to a non-existent firewalld inside a stripped image.

Step 4: Anatomy of a Molecule scenario

Each scenario is a directory under molecule/ inside the role. Molecule auto-discovers them and runs whichever one you name. For the default scenario, six files do the work:

FileRole in the test cycle
molecule.ymlDriver, platforms, provisioner config, test sequence
converge.ymlThe play that applies the role under test
prepare.ymlOne-time setup the image needs before the role runs
side_effect.ymlOptional: simulate drift, then re-converge to prove self-healing
verify.ymlAssertions over the converged state
cleanup.ymlOptional: trim before destroy so teardown is fast

Drop the working scenario in place:

mkdir -p molecule/default
# Then create the six files from the companion repo

The molecule.yml for the default scenario is short. It uses the podman driver, runs a privileged Rocky stream10 container with systemd, and points the role lookup at the parent directory so include_role can find cfg_nginx_site:

---
role_name_check: 1

dependency:
  name: galaxy
  enabled: true

driver:
  name: podman

platforms:
  - name: cfg-rocky10
    image: quay.io/centos/centos:stream10
    pre_build_image: true
    privileged: true
    systemd: always
    command: /usr/sbin/init
    capabilities:
      - SYS_ADMIN

provisioner:
  name: ansible
  config_options:
    defaults:
      callback_result_format: yaml
      gathering: smart
    ssh_connection:
      pipelining: true
  inventory:
    host_vars:
      cfg-rocky10:
        nginx_site_server_name: rocky.test
        nginx_site_index_title: "Rocky 10 nginx, deployed by Ansible + tested by Molecule"
  env:
    ANSIBLE_ROLES_PATH: ../../../

verifier:
  name: ansible

scenario:
  name: default
  test_sequence:
    - dependency
    - syntax
    - create
    - prepare
    - converge
    - idempotence
    - side_effect
    - verify
    - cleanup
    - destroy

A few of those keys are easy to miss the first time:

  • privileged: true with systemd: always and the SYS_ADMIN capability are what let nginx start under systemd inside the container. Without them, systemctl start nginx errors out with “System has not been booted with systemd”.
  • ANSIBLE_ROLES_PATH: ../../../ climbs from molecule/default/ back up to the directory that contains the role, so include_role: name: cfg_nginx_site resolves cleanly.
  • callback_result_format: yaml turns the result blocks into multi-line YAML instead of inline JSON. Do not also set stdout_callback: yaml. That option referenced the community.general.yaml callback, which was removed in community.general 12.0.0.

The converge.yml is just the play that exercises the role. Note become: false because the container runs as root already and the minimal centos:stream10 image does not ship sudo:

---
- name: Converge
  hosts: all
  become: false
  tasks:
    - name: Apply cfg_nginx_site role
      ansible.builtin.include_role:
        name: cfg_nginx_site

The prepare.yml handles the cross-distro fact that package names differ. Refresh the apt cache on Debian-family hosts, install diagnostic tools regardless:

---
- name: Prepare
  hosts: all
  become: false
  gather_facts: true
  tasks:
    - name: Refresh apt cache (Debian family)
      ansible.builtin.apt:
        update_cache: true
        cache_valid_time: 3600
      when: ansible_os_family == 'Debian'

    - name: Install diagnostics tools used by the verifier
      ansible.builtin.package:
        name:
          - curl
          - "{{ 'procps-ng' if ansible_os_family == 'RedHat' else 'procps' }}"
          - "{{ 'iproute' if ansible_os_family == 'RedHat' else 'iproute2' }}"
        state: present

The verify.yml is where the test earns its keep. It collects package facts, slurps the rendered vhost, hits both the index and a health endpoint over HTTP, and asserts everything came out right. Failures here are real failures, not flapping infrastructure:

---
- name: Verify
  hosts: all
  become: false
  gather_facts: true
  tasks:
    - name: Collect package facts
      ansible.builtin.package_facts:
        manager: auto

    - name: nginx package is installed
      ansible.builtin.assert:
        that:
          - "'nginx' in ansible_facts.packages"

    - name: Read rendered vhost config
      ansible.builtin.slurp:
        src: /etc/nginx/conf.d/example.conf
      register: vhost

    - name: vhost contains the expected directives
      ansible.builtin.assert:
        that:
          - "'rocky.test' in (vhost.content | b64decode)"
          - "'/healthz' in (vhost.content | b64decode)"

    - name: nginx is active and running
      ansible.builtin.systemd:
        name: nginx
      register: svc

    - name: Assert nginx state
      ansible.builtin.assert:
        that:
          - svc.status.ActiveState == 'active'
          - svc.status.SubState == 'running'

    - name: Hit the index page
      ansible.builtin.uri:
        url: "http://127.0.0.1/"
        return_content: true
        status_code: 200
      register: idx

    - name: Index page contains site title
      ansible.builtin.assert:
        that:
          - "'tested by Molecule' in idx.content"

    - name: Hit the health endpoint
      ansible.builtin.uri:
        url: "http://127.0.0.1/healthz"
        status_code: 200
        return_content: true
      register: hz

    - name: Health endpoint returns ok
      ansible.builtin.assert:
        that:
          - hz.content == "ok\n"

Confirm Molecule sees all the scenarios you have set up:

molecule list

The output is a clean ASCII table that summarises every scenario alongside its driver and current state:

Output of molecule list showing default ci and multi-os scenarios

Three scenarios show up in the table because the default plus the optional CI and multi-OS scenarios from the companion repo are already present. Each one targets a different test profile.

Step 5: Run the default scenario phase by phase

Molecule has eleven possible phases. Most articles run molecule test and call it a day, but watching each phase in isolation makes the failure modes obvious. Start with a syntax check. It catches typos before any container is even pulled:

molecule syntax -s default

A clean run looks like this:

INFO     [default > discovery] scenario test matrix: syntax
INFO     [default > prerun] Performing prerun with role_name_check=1...
INFO     [default > syntax] Executing
INFO     Sanity checks: 'podman'

playbook: /root/molecule-lab/roles/cfg_nginx_site/molecule/default/converge.yml
INFO     [default > syntax] Executed: Successful

Next, ask Molecule to create the test instance. The container is built from the image declared under platforms and registered as a managed host:

molecule create -s default

Now run prepare and converge in one shot. converge implicitly runs prepare first if it has not been run for the current instance:

molecule converge -s default

The output stream tells the story of the role applying itself task by task. The PLAY RECAP at the end is what to scan for at a glance:

PLAY RECAP *********************************************************************
cfg-rocky10                : ok=12   changed=7    unreachable=0    failed=0    skipped=2    rescued=0    ignored=0

Twelve tasks succeeded, seven changed system state, the firewall and Debian-only tasks were correctly skipped, and zero failed. Run converge a second time. This is the idempotence test in disguise:

molecule idempotence -s default

If anything reports changed on the second run, the role is not idempotent. The first time you write a real role this almost always fails. The first idempotence run for cfg_nginx_site failed too, on the index template. The original Jinja2 template embedded ansible_date_time.iso8601, which is naturally different each second. The fix is to drop volatile values from the rendered output:

<!-- DON'T render this -->
<p>Test fingerprint: <code>{{ ansible_date_time.iso8601 }}</code></p>

With that gone, the second converge produces zero changes and idempotence passes:

PLAY RECAP *********************************************************************
cfg-rocky10                : ok=11   changed=0    unreachable=0    failed=0    skipped=2    rescued=0    ignored=0
INFO     [default > idempotence] Executed: Successful

Now the verifier asserts the actual end state, not just “Ansible thinks it converged”:

molecule verify -s default

Each assert task announces its result. A passing verify reads as a wall of All assertions passed:

TASK [Hit the index page] ******************************************************
ok: [cfg-rocky10]

TASK [Index page contains site title] ******************************************
ok: [cfg-rocky10] =>
    changed: false
    msg: All assertions passed

TASK [Health endpoint returns ok] **********************************************
ok: [cfg-rocky10] =>
    changed: false
    msg: All assertions passed

PLAY RECAP *********************************************************************
cfg-rocky10                : ok=11   changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

Combine everything into the canonical CI command, which runs the entire sequence in one shot, then destroys the instance whether the run passed or failed:

molecule test -s default

The summary at the bottom is the one signal a CI job needs:

molecule test default phases all marked Executed Successful

Every phase reports Executed: Successful, including the optional side_effect and cleanup phases that we will look at next.

Step 6: Side-effect tests prove self-healing

Idempotence proves the role does not flap on a clean host. It says nothing about what happens after a human SSHs in at 3 AM and edits a config by hand. Side-effect playbooks model exactly that scenario. The one used here stops nginx and overwrites the rendered vhost with garbage, then re-converges and asserts the role notices and fixes it:

---
- name: Side effect: simulate a manual config drift
  hosts: all
  become: false
  gather_facts: false
  tasks:
    - name: Stop nginx (simulating an operator typing the wrong systemctl)
      ansible.builtin.systemd:
        name: nginx
        state: stopped

    - name: Vandalise the vhost (simulating a manual edit gone wrong)
      ansible.builtin.copy:
        content: "# I broke this on purpose\n"
        dest: /etc/nginx/conf.d/example.conf
        mode: "0644"

- name: Re-converge to prove the role self-heals the drift
  hosts: all
  become: false
  tasks:
    - name: Re-apply cfg_nginx_site role
      ansible.builtin.include_role:
        name: cfg_nginx_site

When this play runs as part of molecule test, the next thing you see is the role re-rendering the vhost (because the contents differ from the template) and reloading nginx. The PLAY RECAP shows the recovery in action:

PLAY RECAP *********************************************************************
cfg-rocky10                : ok=14   changed=5    unreachable=0    failed=0    skipped=2    rescued=0    ignored=0
INFO     [default > side_effect] Executed: Successful

If a future change to the role broke its drift-recovery story, this scenario would catch it before it shipped. That is the exact failure mode that makes operations engineers grumpy at 3 AM, and it is the kind of bug a static lint cannot catch.

Step 7: Multi-OS scenario in parallel

One Molecule instance is a sanity check. Real roles run on more than one distro, so the test surface should too. Add a second scenario at molecule/multi-os/molecule.yml that spins both Rocky 10 and Ubuntu 24.04 in the same run:

platforms:
  - name: cfg-rocky10
    image: quay.io/centos/centos:stream10
    pre_build_image: true
    privileged: true
    systemd: always
    command: /usr/sbin/init
    capabilities:
      - SYS_ADMIN
  - name: cfg-ubuntu2404
    image: docker.io/geerlingguy/docker-ubuntu2404-ansible:latest
    pre_build_image: true
    privileged: true
    systemd: always
    command: /lib/systemd/systemd
    capabilities:
      - SYS_ADMIN

The Ubuntu image from Jeff Geerling is purpose-built for Ansible role testing. It ships systemd configured to run as PID 1, which is exactly what you need for service tasks. Symlink the shared playbooks so Rocky and Ubuntu both run the same converge, prepare, and verify code paths:

cd molecule/multi-os
ln -sf ../default/converge.yml converge.yml
ln -sf ../default/prepare.yml  prepare.yml

Run the full cycle on both hosts at once:

molecule test -s multi-os

Molecule executes each task on every host in parallel. The PLAY RECAP at the bottom shows both passing cleanly, the Rocky-family conditional running where appropriate and the Debian-family one running on the Ubuntu host:

molecule converge and verify recap on Rocky 10 and Ubuntu 24.04 hosts

Verify ran the same assertions on each host. The role passed on both. That is the green light to merge.

Step 8: ansible-lint with the production profile

Molecule does not run the linter for you anymore. The molecule lint subcommand was removed in 6.x and never came back. Run ansible-lint separately, which is exactly what your CI should do too:

cd "${LAB_DIR}/roles/${ROLE_NAME}"
ansible-lint .

The lint config in .ansible-lint selects the production profile, which is the strictest preset. It catches role-name violations, missing handlers, modules invoked without their FQCN, and dozens of other issues that have bitten production roles in the past. With the role in this guide it passes cleanly:

ansible-lint Passed 0 failures 0 warnings production profile

Skip rules sparingly and document the reason in the config when you do. A skip with no comment is one of the most common review-feedback items on Ansible PRs.

Step 9: Wire Molecule into GitHub Actions

Manual testing only catches the bugs your laptop bothers to run into. The whole point of Molecule is that your CI runs the same scenario every push. The minimal workflow at .github/workflows/molecule.yml looks like this:

name: molecule

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

jobs:
  test:
    runs-on: ubuntu-24.04
    strategy:
      matrix:
        scenario: [default, multi-os]
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.12"
      - name: Install Molecule and friends
        run: |
          pip install ansible-core==2.18.* molecule molecule-plugins[podman] ansible-lint
      - name: Lint
        run: ansible-lint roles/cfg_nginx_site/
      - name: Test
        working-directory: roles/cfg_nginx_site
        run: molecule test -s ${{ matrix.scenario }}

Two things make that workflow honest. First, the matrix runs both the default and multi-os scenarios, so a regression on either distro fails the build. Second, the lint step runs before molecule test, which short-circuits long test runs when something obvious is broken. If you are also using Event-Driven Ansible in the same repo, point a third matrix entry at its scenario.

Real pitfalls hit during testing

The fixes above did not show up by inspiration. Each one was a real failure during the writing of this article. They will hit you too if you skip the same defaults. Treat this as a triage checklist when Molecule throws a tantrum.

SymptomWhat it meansFix
community.general.yaml has been removed stdout_callback: yaml in molecule.yml points at a callback that ships in community.general 11.x and earlier. Removed in 12.0.0. Use callback_result_format: yaml only. Drop stdout_callback: yaml.
tmpfs is of type list and we were unable to convert to dict The molecule_plugins podman driver expects tmpfs as a dict whose values are mount-option strings. Either omit tmpfs entirely (systemd:always handles the mounts) or use tmpfs: {/run: 'size=64m'} with a non-empty option.
executable file 'sudo' not found in $PATH The play set become: true, but the minimal centos:stream10 image has no sudo binary. Set become: false on every play that targets a container. You are root inside it already.
executable file 'set' not found in $PATH An ansible.builtin.raw task started with shell builtins like set -e, but the image has no shell on PATH. Drop the raw bootstrap if Python is already in the image. If not, prefix the command with /bin/bash -lc.
the role 'X' was not found Molecule did not add the role’s parent directory to ANSIBLE_ROLES_PATH, so include_role: name: X failed. Set provisioner.env.ANSIBLE_ROLES_PATH: '../../../' in molecule.yml. Climbs from molecule/default/ to the dir containing the role.
Idempotence fails on a template task The template renders something that changes between runs, usually a timestamp or a random ID. Strip volatile values from the rendered output. If you must include them, store them in a stable file outside the template body.
No package matching 'curl' is available The package module ran on a Debian-family host without a refreshed apt cache. Run apt update first, scoped with when: ansible_os_family == 'Debian'. Use OS-aware package names too: procps on Debian, procps-ng on RHEL.

None of these are documented next to each other anywhere in the official docs. Most of them are written up as one-line GitHub issues that took an afternoon each to triage. Bookmark this table.

Source code and the wider series

Everything in this article is in the companion repo at c4geeks/ansible/intermediate/ansible-molecule-testing. Clone the repo, cd into the role directory, run molecule test, and you should see the same green output that produced the screenshots above. If anything in your environment makes it red, the troubleshooting table is the place to start.

Once your role tests well, the next pieces in the toolbox are usually Ansible Vault for encrypting role secrets and a dynamic inventory source so production runs do not depend on a hand-edited hosts file. The full series index lives in the Ansible automation guide, and the cheat sheet is handy when you forget which flag controls forks.

Related Articles

Kubernetes How to force delete a Kubernetes Namespace Automation How To Integrate SonarQube with Jenkins Git Disable User Creation (Signup) on GitLab welcome page Automation Install Gitea on Ubuntu 26.04 LTS

Leave a Comment

Press ESC to close