AI coding agents aren’t just for web developers cranking out React components. If you spend your days writing Terraform modules, Ansible playbooks, Kubernetes manifests, and bash scripts, these tools fit right into your workflow. OpenCode, paired with the Oh-My-OpenAgent plugin, turns a terminal into a multi-agent system that can generate, review, and refactor infrastructure code across any LLM provider you configure.
This guide walks through practical, real-world examples of using OpenCode to produce DevOps infrastructure code. We cover Terraform, Ansible, Kubernetes YAML, and shell scripting, with honest assessments of what the tool gets right and where you still need to apply your own judgment. The install guide for OpenCode and Oh-My-OpenAgent on Linux covers the setup process if you haven’t done that yet.
Tested April 2026 with OpenCode 1.4.0, Oh-My-OpenAgent 2.1.0 on Rocky Linux 9.5
What You’ll Learn
- How to generate production-quality Terraform modules with OpenCode prompts
- Generating and reviewing Ansible playbooks with SELinux and handler considerations
- Creating Kubernetes deployments, services, and ingress manifests from natural language
- Writing shell scripts with proper error handling and logging
- Using Oh-My-OpenAgent’s multi-agent mode to tackle complex infrastructure tasks
- What AI agents consistently get wrong with infrastructure code (and how to catch it)
Prerequisites
- OpenCode installed and configured with an API key (see the installation guide)
- Oh-My-OpenAgent plugin installed for multi-agent orchestration
- Basic familiarity with Terraform, Ansible, Kubernetes, and shell scripting
- Tested on: Rocky Linux 9.5, OpenCode 1.4.0, Oh-My-OpenAgent 2.1.0
- At least one LLM provider configured (Claude, GPT-4o, DeepSeek, or a local model via Ollama)
How AI Agents Handle Infrastructure Code
AI coding agents work best when they have clear constraints: a specific cloud provider, a target OS, a defined architecture. Vague prompts like “set up my infrastructure” produce vague results. Specific prompts like “create a Terraform module for an AWS VPC with two public subnets, two private subnets, and a NAT gateway in us-east-1” produce code you can actually use.
What works well: generating boilerplate (variable definitions, output blocks, resource scaffolding), following established patterns (three-tier architectures, standard Kubernetes deployment specs), and producing syntactically correct HCL, YAML, and bash. Where it falls short: understanding your specific environment’s constraints, getting provider version pinning right, and handling complex state dependencies. Treat AI-generated infrastructure code the same way you’d treat a pull request from a junior engineer. Review everything, run terraform plan, test in staging.
Generating Terraform Code with OpenCode
Start OpenCode in your Terraform project directory. The agent works best when it can see your existing module structure and provider configuration.
cd ~/terraform-projects/aws-infrastructure
opencode
Once inside the OpenCode TUI, give it a specific prompt for the VPC module:
opencode run "Create a Terraform module for an AWS VPC with 2 public subnets, 2 private subnets, a NAT gateway, and outputs for all subnet IDs. Use variables for CIDR ranges and region. Target Terraform 1.9+ and AWS provider 5.x."
OpenCode generates the module as separate files. Here’s what a typical variables.tf looks like:
variable "vpc_cidr" {
description = "CIDR block for the VPC"
type = string
default = "10.0.0.0/16"
}
variable "region" {
description = "AWS region"
type = string
default = "us-east-1"
}
variable "public_subnet_cidrs" {
description = "CIDR blocks for public subnets"
type = list(string)
default = ["10.0.1.0/24", "10.0.2.0/24"]
}
variable "private_subnet_cidrs" {
description = "CIDR blocks for private subnets"
type = list(string)
default = ["10.0.10.0/24", "10.0.11.0/24"]
}
variable "environment" {
description = "Environment name for tagging"
type = string
default = "production"
}
variable "availability_zones" {
description = "List of availability zones"
type = list(string)
default = ["us-east-1a", "us-east-1b"]
}
The generated main.tf creates the VPC, subnets, internet gateway, NAT gateway, and route tables:
terraform {
required_version = ">= 1.9.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
}
resource "aws_vpc" "main" {
cidr_block = var.vpc_cidr
enable_dns_support = true
enable_dns_hostnames = true
tags = {
Name = "${var.environment}-vpc"
Environment = var.environment
ManagedBy = "terraform"
}
}
resource "aws_internet_gateway" "main" {
vpc_id = aws_vpc.main.id
tags = {
Name = "${var.environment}-igw"
}
}
resource "aws_subnet" "public" {
count = length(var.public_subnet_cidrs)
vpc_id = aws_vpc.main.id
cidr_block = var.public_subnet_cidrs[count.index]
availability_zone = var.availability_zones[count.index]
map_public_ip_on_launch = true
tags = {
Name = "${var.environment}-public-${count.index + 1}"
Tier = "public"
}
}
resource "aws_subnet" "private" {
count = length(var.private_subnet_cidrs)
vpc_id = aws_vpc.main.id
cidr_block = var.private_subnet_cidrs[count.index]
availability_zone = var.availability_zones[count.index]
tags = {
Name = "${var.environment}-private-${count.index + 1}"
Tier = "private"
}
}
resource "aws_eip" "nat" {
domain = "vpc"
tags = {
Name = "${var.environment}-nat-eip"
}
}
resource "aws_nat_gateway" "main" {
allocation_id = aws_eip.nat.id
subnet_id = aws_subnet.public[0].id
tags = {
Name = "${var.environment}-nat-gw"
}
depends_on = [aws_internet_gateway.main]
}
resource "aws_route_table" "public" {
vpc_id = aws_vpc.main.id
route {
cidr_block = "0.0.0.0/0"
gateway_id = aws_internet_gateway.main.id
}
tags = {
Name = "${var.environment}-public-rt"
}
}
resource "aws_route_table" "private" {
vpc_id = aws_vpc.main.id
route {
cidr_block = "0.0.0.0/0"
nat_gateway_id = aws_nat_gateway.main.id
}
tags = {
Name = "${var.environment}-private-rt"
}
}
resource "aws_route_table_association" "public" {
count = length(var.public_subnet_cidrs)
subnet_id = aws_subnet.public[count.index].id
route_table_id = aws_route_table.public.id
}
resource "aws_route_table_association" "private" {
count = length(var.private_subnet_cidrs)
subnet_id = aws_subnet.private[count.index].id
route_table_id = aws_route_table.private.id
}
And the outputs.tf to expose the IDs:
output "vpc_id" {
description = "ID of the VPC"
value = aws_vpc.main.id
}
output "public_subnet_ids" {
description = "IDs of public subnets"
value = aws_subnet.public[*].id
}
output "private_subnet_ids" {
description = "IDs of private subnets"
value = aws_subnet.private[*].id
}
output "nat_gateway_id" {
description = "ID of the NAT Gateway"
value = aws_nat_gateway.main.id
}
Reviewing the Generated Terraform
The agent produced clean, well-structured code. A few things it got right: proper tagging, depends_on for the NAT gateway, DNS support enabled on the VPC, and separate route tables for public and private subnets. Things to verify before running terraform apply:
- Provider version: the
~> 5.0constraint is broad. Pin it tighter in production, something like~> 5.82based on what you’re actually running - CIDR ranges: make sure they don’t overlap with existing VPCs or on-premises networks
- Single NAT gateway: fine for dev, but production workloads need one per AZ for high availability
- No VPC flow logs: you’ll want those for compliance and debugging
Iterating on the Module
The real power shows up when you iterate. Ask OpenCode to extend the module:
opencode run "Add a bastion host in the first public subnet with a security group allowing SSH from a configurable CIDR. Use Amazon Linux 2023 AMI. Include a variable for the SSH key name."
OpenCode reads the existing module files, understands the VPC structure, and adds the bastion resources that reference the subnets already defined. This context awareness is what separates an AI agent from a simple code snippet generator.
Always validate iterative changes:
terraform fmt -recursive
terraform validate
terraform plan
The plan output shows exactly what resources will be created, modified, or destroyed before you commit anything.
Generating Ansible Playbooks
Ansible playbook generation is where AI agents shine because playbooks follow predictable patterns: install packages, template configs, manage services, set firewall rules. Give OpenCode a clear target:
opencode run "Write an Ansible playbook that installs Nginx on Rocky Linux 9, configures a reverse proxy to a backend on port 8080, opens firewall ports 80 and 443, sets up Let's Encrypt SSL with certbot, and handles SELinux booleans. Use handlers for service restarts."
Here’s the kind of playbook OpenCode produces:
---
- name: Configure Nginx reverse proxy with SSL
hosts: webservers
become: true
vars:
backend_port: 8080
domain_name: app.example.com
certbot_email: [email protected]
tasks:
- name: Install Nginx and certbot
ansible.builtin.dnf:
name:
- nginx
- certbot
- python3-certbot-nginx
state: present
- name: Start and enable Nginx
ansible.builtin.systemd:
name: nginx
state: started
enabled: true
- name: Allow SELinux httpd network connections
ansible.posix.seboolean:
name: httpd_can_network_connect
state: true
persistent: true
- name: Open HTTP and HTTPS in firewalld
ansible.posix.firewalld:
service: "{{ item }}"
permanent: true
immediate: true
state: enabled
loop:
- http
- https
- name: Deploy Nginx reverse proxy config
ansible.builtin.template:
src: templates/nginx-reverse-proxy.conf.j2
dest: /etc/nginx/conf.d/{{ domain_name }}.conf
owner: root
group: root
mode: "0644"
notify: Reload Nginx
- name: Remove default server block
ansible.builtin.file:
path: /etc/nginx/conf.d/default.conf
state: absent
notify: Reload Nginx
- name: Obtain SSL certificate
ansible.builtin.command:
cmd: >
certbot certonly --nginx
-d {{ domain_name }}
--non-interactive
--agree-tos
-m {{ certbot_email }}
creates: /etc/letsencrypt/live/{{ domain_name }}/fullchain.pem
- name: Deploy SSL Nginx config
ansible.builtin.template:
src: templates/nginx-ssl-proxy.conf.j2
dest: /etc/nginx/conf.d/{{ domain_name }}.conf
owner: root
group: root
mode: "0644"
notify: Reload Nginx
- name: Set up certbot renewal cron
ansible.builtin.cron:
name: "certbot renewal"
job: "certbot renew --quiet --post-hook 'systemctl reload nginx'"
hour: "3"
minute: "30"
weekday: "1"
handlers:
- name: Reload Nginx
ansible.builtin.systemd:
name: nginx
state: reloaded
What the Agent Got Right
The playbook uses FQCNs (fully qualified collection names) like ansible.builtin.dnf instead of the short form. It includes the httpd_can_network_connect SELinux boolean, which is essential for Nginx to proxy to a backend port on RHEL-family systems. Handlers are used correctly to reload Nginx only when configuration changes, not on every run. The creates parameter on the certbot task makes it idempotent.
What to Fix Before Running
A few adjustments for production use:
- The playbook assumes the domain already resolves to the server. Add a verification task or document that as a prerequisite
- The Jinja2 templates (
nginx-reverse-proxy.conf.j2andnginx-ssl-proxy.conf.j2) are referenced but not generated. Ask OpenCode for those separately - The certbot task uses
--nginxplugin, which works well, but some teams prefer--standaloneor--webrootfor more control - Consider adding a
nginx -tvalidation task before reloading to catch syntax errors in generated configs
Use the Oracle agent in Oh-My-OpenAgent for an architecture review:
opencode run "@oracle Review this Ansible playbook for production readiness. Check idempotency, error handling, and security."
The Oracle agent analyzes the playbook structure and flags issues like missing validate parameters on template tasks and the absence of a rollback strategy if certbot fails.
Kubernetes Manifests
Kubernetes YAML is notoriously verbose, which makes it an ideal candidate for AI generation. Most deployments follow the same pattern: Deployment, Service, maybe an Ingress or HPA.
opencode run "Create Kubernetes manifests for a Python Flask app: a Deployment with 3 replicas, resource limits, health checks, and a non-root security context. Add a ClusterIP Service and an Ingress with TLS. Use the image registry.example.com/flask-app:1.2.0."
The generated manifests cover all three resources. Here’s the Deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
name: flask-app
labels:
app: flask-app
version: "1.2.0"
spec:
replicas: 3
selector:
matchLabels:
app: flask-app
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
template:
metadata:
labels:
app: flask-app
version: "1.2.0"
spec:
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 1000
containers:
- name: flask-app
image: registry.example.com/flask-app:1.2.0
ports:
- containerPort: 5000
protocol: TCP
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 256Mi
livenessProbe:
httpGet:
path: /healthz
port: 5000
initialDelaySeconds: 10
periodSeconds: 15
failureThreshold: 3
readinessProbe:
httpGet:
path: /ready
port: 5000
initialDelaySeconds: 5
periodSeconds: 10
env:
- name: FLASK_ENV
value: "production"
- name: PORT
value: "5000"
The Service and Ingress:
---
apiVersion: v1
kind: Service
metadata:
name: flask-app
labels:
app: flask-app
spec:
type: ClusterIP
selector:
app: flask-app
ports:
- port: 80
targetPort: 5000
protocol: TCP
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: flask-app
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
nginx.ingress.kubernetes.io/ssl-redirect: "true"
spec:
ingressClassName: nginx
tls:
- hosts:
- flask.example.com
secretName: flask-app-tls
rules:
- host: flask.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: flask-app
port:
number: 80
Review Notes
Solid output overall. The security context with runAsNonRoot and a specific UID is a good practice many engineers skip. The rolling update strategy with maxUnavailable: 0 ensures zero downtime during deploys. A few things to check:
- The health check endpoints (
/healthzand/ready) must actually exist in your Flask app. If they don’t, the pods will crashloop - The Ingress assumes you have cert-manager and an nginx ingress controller installed. Adjust annotations for your specific setup (Traefik, AWS ALB, etc.)
- Resource limits depend entirely on your application’s actual usage. Run load tests and adjust based on real metrics
- No
PodDisruptionBudgetwas generated, which you’ll want for production clusters
Preview what would change in your cluster before applying:
kubectl diff -f flask-app/
This shows a diff of what Kubernetes would modify without actually applying anything.
Shell Scripts and Automation
Backup scripts, deployment wrappers, log rotation, health checks. Every DevOps team has a collection of shell scripts that grew organically over the years. AI agents produce scripts with better structure and error handling than most hand-written ones.
opencode run "Write a bash script that dumps all PostgreSQL databases, compresses each with zstd, uploads to an S3 bucket with a date prefix, removes local dumps older than 7 days, and logs everything to syslog. Include error handling and exit on failure."
The output is a well-structured script with proper error handling:
#!/usr/bin/env bash
set -euo pipefail
# Configuration
S3_BUCKET="s3://company-backups/postgresql"
BACKUP_DIR="/var/backups/postgresql"
RETENTION_DAYS=7
DATE_PREFIX=$(date +%Y-%m-%d)
LOG_TAG="pg-backup"
PG_USER="postgres"
log() {
logger -t "$LOG_TAG" "$1"
echo "[$(date '+%Y-%m-%d %H:%M:%S')] $1"
}
cleanup() {
local exit_code=$?
if [[ $exit_code -ne 0 ]]; then
log "ERROR: Backup failed with exit code $exit_code"
fi
exit $exit_code
}
trap cleanup EXIT
mkdir -p "$BACKUP_DIR"
log "Starting PostgreSQL backup"
# Get list of databases, excluding templates
DATABASES=$(sudo -u "$PG_USER" psql -At -c \
"SELECT datname FROM pg_database WHERE datistemplate = false AND datname != 'postgres';")
if [[ -z "$DATABASES" ]]; then
log "WARNING: No databases found to back up"
exit 0
fi
BACKUP_COUNT=0
FAILED_COUNT=0
for db in $DATABASES; do
DUMP_FILE="$BACKUP_DIR/${db}_${DATE_PREFIX}.sql"
COMPRESSED_FILE="${DUMP_FILE}.zst"
log "Dumping database: $db"
if sudo -u "$PG_USER" pg_dump --format=plain --verbose "$db" > "$DUMP_FILE" 2>/dev/null; then
log "Compressing $db with zstd"
zstd --rm -q -T0 "$DUMP_FILE"
FILESIZE=$(stat --format=%s "$COMPRESSED_FILE" 2>/dev/null || stat -f%z "$COMPRESSED_FILE")
log "Uploading $db ($FILESIZE bytes) to S3"
if aws s3 cp "$COMPRESSED_FILE" "$S3_BUCKET/$DATE_PREFIX/${db}.sql.zst" --quiet; then
log "Successfully uploaded $db to S3"
((BACKUP_COUNT++))
else
log "ERROR: Failed to upload $db to S3"
((FAILED_COUNT++))
fi
else
log "ERROR: Failed to dump database $db"
((FAILED_COUNT++))
fi
done
# Clean up old local backups
log "Removing local backups older than $RETENTION_DAYS days"
find "$BACKUP_DIR" -name "*.sql.zst" -mtime +$RETENTION_DAYS -delete
log "Backup complete: $BACKUP_COUNT succeeded, $FAILED_COUNT failed"
if [[ $FAILED_COUNT -gt 0 ]]; then
exit 1
fi
Script Quality Assessment
This is production-ready with minor tweaks. The set -euo pipefail at the top catches errors early. The trap ensures cleanup logging happens even on failure. Dual logging to both syslog and stdout means you get output in cron emails and can search with journalctl -t pg-backup. The -T0 flag on zstd uses all available CPU cores for compression.
What to adjust: the pg_dump format is plain, which works for smaller databases. For anything over a few gigabytes, switch to --format=custom or --format=directory with --jobs for parallel dumping. Also consider adding a .pgpass file or environment variable for authentication instead of relying on peer auth.
Using Multi-Agent Mode for Complex Tasks
Oh-My-OpenAgent introduces specialized agents that collaborate on larger tasks. When you invoke the ultrawork command, three agents coordinate: Prometheus (the planner), Sisyphus (the orchestrator), and Hephaestus (the executor).
opencode run "@ultrawork Set up a complete CI/CD pipeline with GitHub Actions that builds a Docker image, pushes to ECR, runs Trivy security scan, deploys to EKS staging with Helm, runs integration tests, and promotes to production on approval."
Prometheus breaks the task into discrete components: the Dockerfile, GitHub Actions workflow, Helm chart, and deployment scripts. Sisyphus determines the execution order and dependencies between them. Hephaestus generates each file.
The result is typically five or six files:
.github/workflows/deploy.ymlwith the full pipeline including build, scan, stage, test, and promote jobsDockerfilewith multi-stage build and non-root userhelm/values-staging.yamlandhelm/values-production.yamlscripts/integration-test.shfor post-deploy verification
Multi-agent mode produces more cohesive results than generating each file independently because the planner ensures all pieces reference each other correctly. The GitHub Actions workflow references the exact Helm values files, the integration test script hits the correct staging URL, and the Docker image tag propagates through every step.
You can also use the Momus agent specifically for code review:
opencode run "@momus Review the generated CI/CD pipeline for security issues, missing error handling, and production readiness."
Momus typically catches things like missing --immutable tags, absence of OIDC for ECR authentication (instead of long-lived access keys), and missing timeout values on GitHub Actions jobs.
Best Practices for AI-Generated Infrastructure Code
After testing OpenCode extensively with DevOps workflows, these practices consistently prevent issues.
Always dry-run before applying. Every tool in the DevOps ecosystem has a preview mode. Use it.
terraform plan -out=tfplan
ansible-playbook site.yml --check --diff
kubectl diff -f manifests/
shellcheck backup-script.sh
These commands should become muscle memory after every AI-generated code session.
Pin versions explicitly. AI agents tend to use loose version constraints or skip pinning entirely. Lock down provider versions in Terraform, collection versions in Ansible, and image tags in Kubernetes. A latest tag in a Deployment manifest is a ticking time bomb.
Review IAM policies and RBAC carefully. AI agents err on the side of permissiveness because overly restrictive permissions break things during testing. An Action: "*" in a generated IAM policy is functional but violates least privilege. Narrow it down to the specific actions your workload needs.
Test in isolation first. Create a throwaway environment (a dedicated Terraform workspace, a Kind cluster, a Vagrant box) and deploy the AI-generated code there before touching staging or production.
Use the review agents. Oh-My-OpenAgent includes Momus for code review and Oracle for architecture review. Running both on generated code catches issues that a single pass misses, because each agent evaluates from a different perspective.
What AI Agents Get Wrong
Honesty about limitations matters more than hype. After months of using AI agents for infrastructure code, these are the patterns where they consistently need correction.
Outdated provider and module versions. AI models have training data cutoffs. The agent might generate Terraform code with AWS provider 4.x syntax when 5.x changed the API. Always check the provider changelog and run terraform init -upgrade to catch incompatibilities.
SELinux and AppArmor are an afterthought. Most generated playbooks and scripts assume permissive mode or ignore mandatory access controls entirely. On RHEL-family systems with SELinux enforcing (which is every properly configured production server), missing setsebool or semanage commands cause silent failures that are painful to debug. Always check ausearch -m avc -ts recent after deploying AI-generated configurations.
Generic security groups and firewall rules. AI agents often open wider ranges than necessary. A generated security group allowing 0.0.0.0/0 on port 22 is technically correct but terrible practice. Restrict source CIDRs to your bastion network or VPN ranges.
Complex state and dependencies. Terraform state management, Ansible inventory patterns for multi-tier deployments, and Kubernetes operators with CRDs are areas where AI-generated code needs significant human review. The agent can scaffold the structure, but the business logic of “deploy database before the app” or “drain node before upgrading” requires understanding your specific architecture.
Secrets in plain text. Generated code sometimes puts passwords or API keys directly in YAML files or shell scripts. Always move secrets to Vault, AWS Secrets Manager, Kubernetes Secrets (or better, External Secrets Operator), or encrypted Ansible Vault files.
Frequently Asked Questions
Can AI agents replace DevOps engineers?
No. AI agents accelerate the parts of DevOps that are repetitive and pattern-based: writing boilerplate, scaffolding standard architectures, generating initial manifests. The judgment calls (which architecture to use, how to handle failure modes, what the security boundary should be) still require a human who understands the production environment. Think of AI agents as a faster way to get a first draft that you then refine.
Which LLM model works best for infrastructure code?
Claude Sonnet 4 and GPT-4o produce the most accurate Terraform and Kubernetes code in our testing. DeepSeek V3 is surprisingly good for Ansible playbooks and shell scripts, especially when running locally via Ollama for air-gapped environments. The model matters less than the specificity of your prompt. A detailed prompt with constraints produces better code on any model than a vague prompt on the best model.
Is AI-generated infrastructure code safe for production?
With review, yes. The generated code needs the same scrutiny you’d apply to any pull request: check for overly permissive IAM, validate resource limits, verify version pins, and test in staging. The tools exist to catch issues (terraform plan, ansible --check, kubectl diff, checkov, tfsec). Use them. Skip the review step and you’ll learn why the hard way.