Set Up Claude Code for DevOps Engineers [Hands-On]

Most AI coding tools generate snippets you paste into an editor. Claude Code operates at a different level. It runs real commands on your machine, SSHes into servers, executes Terraform plans, runs Ansible playbooks, and builds Docker images. It inherits your SSH keys, cloud credentials, and kubeconfig. For DevOps engineers, this turns “write me a playbook” into “write it, run it, show me the output” in a single conversation.

Original content from computingforgeeks.com - post 164869

Below are six demos you can reproduce on your own servers, plus the CLAUDE.md rules, permission lockdowns, and hooks that keep Claude Code from wrecking production. Each prompt is copy-pasteable. For the full command reference, see the Claude Code cheat sheet.

Tested March 2026 | Claude Code with Opus 4.6 (80.9% SWE-bench), macOS, Rocky Linux 10 VMs

Prerequisites

A Claude subscription (Pro at $20/mo for light use, Max at $100-200/mo for heavy infrastructure sessions) or an Anthropic Console API key
macOS, Linux, or Windows with WSL
SSH access to at least one Linux server (for demos 1, 2, and 4)
Docker installed locally (for demo 3)
kubectl with a cluster or minikube (for demo 5, optional)

Opus 4.6 handles complex tasks well: multi-file Terraform refactoring, debugging subtle Ansible failures, designing Kubernetes architectures. Sonnet 4.6 is roughly 5x cheaper and covers routine work: boilerplate generation, Dockerfiles, health checks. Use Sonnet daily, switch to Opus for deep reasoning. The /model command switches mid-session.

Install Claude Code

Claude Code runs on macOS, Linux, and Windows. The native installer is the fastest method:

curl -fsSL https://claude.ai/install.sh | bash

On macOS, Homebrew also works:

brew install --cask claude-code

After installation, navigate to any project directory and run claude to start a session. You’ll be prompted to log in on first launch.

How Claude Code Works for Infrastructure

Claude Code executes bash commands as your user. Whatever you can do in a terminal, Claude Code can do. It picks up your SSH agent, AWS credentials, kubeconfig, Docker socket, and any environment variables in your shell. This makes it immediately useful for infrastructure work without extra configuration.

DevOps Task	Works?	How	Watch Out For
SSH to servers	Yes	`ssh user@host "cmd"` via Bash	Non-interactive only (no vi/nano)
Terraform	Yes	Runs CLI directly, inherits cloud creds	Can run apply/destroy if allowed
Ansible	Yes	Runs `ansible-playbook` directly	Use vault password files, not prompts
Docker	Yes	Uses your local Docker socket	Needs sandbox exclusion
kubectl / Helm	Yes	Reads your `~/.kube/config`	Needs `~/.kube` access in sandbox
Cloud CLIs (aws, gcloud, az)	Yes	Inherits your authenticated session	Whatever you can do, Claude can do
CI/CD headless mode	Yes	`claude -p "task" --allowedTools`	Needs `ANTHROPIC_API_KEY`

The 1M token context window (Opus 4.6) means Claude Code can ingest entire infrastructure repositories, including Terraform modules, Ansible roles, and Kubernetes manifests, in a single session without losing context halfway through a refactor.

Demo 1: Server Health Check via SSH

The most relatable demo for any sysadmin. Point Claude Code at a server and ask for a health report. One prompt, multiple SSH commands, formatted results.

Open Claude Code and type this prompt:

SSH into 192.168.1.50 and give me a full health report: OS version,
uptime, disk usage, memory, top 5 CPU processes, any failed systemd
services, and pending security updates

Claude Code connects over SSH and runs the relevant commands on its own: hostnamectl for the OS, uptime, df -h for disk, free -h for memory, ps aux --sort=-%cpu for top processes, systemctl --failed for broken services, and dnf check-update --security for pending patches. The output comes back as a structured report:

Server Health Report: 192.168.1.50
═══════════════════════════════════
OS:        Rocky Linux 10.1 (Blue Onyx)
Kernel:    6.12.8-1.el10.x86_64
Uptime:    14 days, 3:22

Disk Usage:
  /        42% of 50G
  /var     67% of 100G  ⚠ approaching 80%
  /tmp     12% of 10G

Memory:    3.2G used / 8.0G total (40%)
Swap:      0B used / 2.0G total

Top 5 CPU Processes:
  PID   %CPU  COMMAND
  1842  12.3  prometheus
  2104   8.1  grafana-server
  1567   4.2  node_exporter
  1233   2.1  nginx
  1001   0.8  sshd

Failed Services: none

Pending Security Updates: 3 packages
  kernel-6.12.8-1.el10
  openssl-3.2.3-2.el10
  systemd-256.10-1.el10

This replaces the ad-hoc shell script you’d normally write. Claude Code notices that /var is at 67% and flags it proactively, something a raw df -h won’t do for you. You can follow up with “clean old logs on that server” and Claude Code handles it without you writing another command. For a deeper reference on SSH operations, see the SSH commands cheat sheet.

Demo 2: Generate and Run an Ansible Playbook

Ansible playbooks are structured YAML with predictable patterns, which makes them an excellent target for AI generation. The difference with Claude Code is that it doesn’t just write the playbook. It runs it.

Prompt Claude Code with a specific deployment request:

Write an Ansible playbook that installs Nginx on the Rocky Linux 10
server at 192.168.1.50, creates a custom index page, opens firewall
ports 80 and 443, and enables the service. Dry-run first, then apply.

Claude Code generates the playbook with proper module usage:

---
- name: Deploy Nginx on Rocky Linux 10
  hosts: all
  become: true
  tasks:
    - name: Install nginx
      ansible.builtin.dnf:
        name: nginx
        state: present

    - name: Deploy custom index page
      ansible.builtin.copy:
        dest: /usr/share/nginx/html/index.html
        content: |
          
          Deployed by Ansible via Claude Code

          
        owner: nginx
        group: nginx
        mode: '0644'

    - name: Start and enable nginx
      ansible.builtin.systemd:
        name: nginx
        state: started
        enabled: true

    - name: Open HTTP and HTTPS in firewalld
      ansible.posix.firewalld:
        service: "{{ item }}"
        permanent: true
        immediate: true
        state: enabled
      loop:
        - http
        - https

It runs the dry run (--check) first to show what would change:

PLAY [Deploy Nginx on Rocky Linux 10] *****************************************

TASK [Install nginx] **********************************************************
changed: [192.168.1.50]

TASK [Deploy custom index page] ***********************************************
changed: [192.168.1.50]

TASK [Start and enable nginx] *************************************************
changed: [192.168.1.50]

TASK [Open HTTP and HTTPS in firewalld] ***************************************
changed: [192.168.1.50] => (item=http)
changed: [192.168.1.50] => (item=https)

PLAY RECAP ********************************************************************
192.168.1.50    : ok=4    changed=4    unreachable=0    failed=0

After confirming the dry run looks correct, Claude Code runs the playbook for real and verifies with curl http://192.168.1.50 to confirm Nginx responds. The entire loop happens in one conversation. If you need Ansible installed on your control machine, handle that first.

Demo 3: Build a Production Dockerfile

Claude Code doesn’t just generate Dockerfiles. It writes the file, builds the image, and reports the result.

Create a multi-stage Dockerfile for the Python Flask app in ./app/.
Alpine base, non-root user, proper layer caching. Build it and compare
the size against a naive single-stage build.

Claude Code generates a production-quality Dockerfile:

# Build stage
FROM python:3.12-alpine AS builder
WORKDIR /build
COPY requirements.txt .
RUN pip install --no-cache-dir --prefix=/install -r requirements.txt

# Runtime stage
FROM python:3.12-alpine
RUN adduser -D appuser
WORKDIR /app
COPY --from=builder /install /usr/local
COPY ./app/ .
USER appuser
EXPOSE 5000
CMD ["gunicorn", "-b", "0.0.0.0:5000", "app:app"]

It builds both versions and shows the size comparison:

REPOSITORY    TAG            SIZE
myapp         multi-stage    87MB
myapp         naive          1.2GB

The multi-stage build is 93% smaller. Security best practices are applied by default: non-root user, no cache directories in the image, minimal base. If you’re running Docker on RHEL systems, the Docker Engine installation guide for Rocky Linux 10 covers the prerequisites.

Demo 4: Diagnose and Fix a Broken Service

This is the demo that sells Claude Code to sysadmins. Break something on purpose, then watch Claude Code trace the problem and fix it without hand-holding.

Introduce a syntax error in the Nginx config:

ssh [email protected] "sed -i 's/;$//' /etc/nginx/nginx.conf"
ssh [email protected] "systemctl restart nginx"

Nginx fails to restart. Now ask Claude Code to figure it out:

The web server on 192.168.1.50 is down. Figure out why and fix it.

Claude Code runs through its own diagnostic sequence without being told what to check:

Step 1: Check service status
$ ssh [email protected] "systemctl status nginx"
→ Active: failed (Result: exit-code)

Step 2: Read the error log
$ ssh [email protected] "journalctl -u nginx --no-pager -n 20"
→ nginx: [emerg] unexpected "}" in /etc/nginx/nginx.conf:84

Step 3: Validate the config
$ ssh [email protected] "nginx -t"
→ nginx: configuration file test failed

Step 4: Inspect the broken line
$ ssh [email protected] "sed -n '80,90p' /etc/nginx/nginx.conf"
→ Found missing semicolons on line 84

Step 5: Fix the syntax error
$ ssh [email protected] "sed -i '84s/$/;/' /etc/nginx/nginx.conf"

Step 6: Re-validate
$ ssh [email protected] "nginx -t"
→ configuration file test is successful

Step 7: Restart and verify
$ ssh [email protected] "systemctl restart nginx && systemctl is-active nginx"
→ active

The full debugging loop took about 30 seconds. Claude Code followed the same sequence an experienced sysadmin would: check status, read the journal, validate config, find the line, fix it, re-validate, restart. It does it faster because it doesn’t pause between commands. For more on managing services, the systemctl commands reference covers the essentials.

Demo 5: Generate Kubernetes Manifests

Kubernetes YAML is verbose and easy to get wrong. Claude Code generates manifests with production defaults that most engineers forget to include.

Generate a Kubernetes deployment for a Node.js app: 3 replicas,
resource limits, liveness probe on /health, readiness probe on /ready,
ClusterIP service on port 3000. Validate with --dry-run=client.

Claude Code produces the deployment and service manifests with best-practice defaults:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nodeapp
  labels:
    app: nodeapp
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nodeapp
  template:
    metadata:
      labels:
        app: nodeapp
    spec:
      containers:
      - name: nodeapp
        image: nodeapp:1.0.0
        ports:
        - containerPort: 3000
        resources:
          requests:
            cpu: 100m
            memory: 128Mi
          limits:
            cpu: 500m
            memory: 256Mi
        livenessProbe:
          httpGet:
            path: /health
            port: 3000
          initialDelaySeconds: 10
          periodSeconds: 15
        readinessProbe:
          httpGet:
            path: /ready
            port: 3000
          initialDelaySeconds: 5
          periodSeconds: 10
---
apiVersion: v1
kind: Service
metadata:
  name: nodeapp
spec:
  type: ClusterIP
  selector:
    app: nodeapp
  ports:
  - port: 3000
    targetPort: 3000

Resource requests, limits, and both probe types are included by default. The image tag uses a specific version (not latest). Claude Code adds these because it understands Kubernetes best practices, not because you asked for each one explicitly. After the dry-run validates, apply to a real cluster. The kubectl cheat sheet covers the commands you’ll use to inspect the deployment.

Demo 6: Headless Mode in CI/CD

Claude Code runs non-interactively with the -p flag. Pipe a Terraform plan into it for automated security review:

terraform plan -out=tfplan
terraform show -json tfplan | claude -p "Review this plan. Flag security issues, unexpected destroys, or missing tags. Output as markdown."

In GitHub Actions, the official anthropics/claude-code-action posts review comments directly on pull requests. This catches misconfigurations before they reach production.

Bulk operations work the same way:

git diff main --name-only -- '*.tf' | claude -p "Review these changed Terraform files for security issues and best practice violations"

Pipe changed files, log output, or test results into Claude Code and it processes them in a single pass. If you need to install Terraform on your Linux workstation first, handle that before trying the pipeline demos. The Claude Code in GitHub Actions guide covers the full CI/CD integration.

Setting Up CLAUDE.md for Infrastructure Projects

This section separates “tried it once” from “my team uses it safely every day.” The CLAUDE.md file sits in your project root and tells Claude Code what it can and cannot do. For infrastructure repos, the rules must be explicit. For a full breakdown of the directory structure, see the .claude directory guide.

Here is a production-grade CLAUDE.md for a DevOps repository:

# Infrastructure Repo Rules

## Safety (NON-NEGOTIABLE)
- NEVER run `terraform apply` without `terraform plan` first
- NEVER run `terraform destroy` without explicit user confirmation
- NEVER run `terraform state rm` or `terraform state mv` without confirmation
- NEVER run `kubectl delete namespace` or `kubectl delete pv` without confirmation
- NEVER disable SELinux (`setenforce 0`) on any server
- ALWAYS run `ansible-playbook --check` before the real run on first execution
- ALWAYS use `--dry-run=client` with `kubectl apply` on first attempt
- NEVER hardcode credentials, API keys, or passwords in any file

## Terraform
- State files are in remote backend. Never modify state directly
- Run `terraform validate` after every .tf change
- Tag all resources: environment, project, managed-by=terraform
- Always plan with `-out=tfplan` and apply from the saved plan

## Ansible
- Hosts defined in inventory files, not hardcoded
- Use `ansible-vault` for sensitive variables
- Playbooks must be idempotent (use built-in modules, not shell/command)
- Include a verification task at the end of each play

## Kubernetes
- Always specify resource requests and limits
- Never deploy to default namespace
- Include liveness and readiness probes
- Never use `latest` tag in production manifests

## Docker
- Multi-stage builds to minimize image size
- Run as non-root user
- Never copy secrets into images

Claude Code reads this file at the start of every session. If you ask it to run terraform destroy, it refuses. If you ask it to write an Ansible playbook, it uses proper modules and role structure. These are strong guidelines, not OS-level enforcement. For hard enforcement, use permission rules.

Permission Lockdown with settings.json

Claude Code reads permissions from two files. Global settings protect every project. Project settings fine-tune what’s allowed in a specific repo.

File	Scope	What Goes Here	Shared?
`~/.claude/settings.json`	All projects	Universal safety: `rm -rf /`, `mkfs`, piping curl to bash	No (personal)
`.claude/settings.json`	This repo only	Project-specific Terraform, Ansible, kubectl rules	Yes (commit to Git)

Global (~/.claude/settings.json). Set once, protects every project:

{
  "permissions": {
    "deny": [
      "Bash(rm -rf /*)",
      "Bash(mkfs*)",
      "Bash(dd if=*of=/dev/*)",
      "Bash(chmod -R 777*)",
      "Bash(curl*|*bash*)",
      "Bash(wget*|*bash*)"
    ]
  }
}

Project (.claude/settings.json). Committed to Git, inherited by every team member who clones:

{
  "permissions": {
    "allow": [
      "Bash(terraform init*)",
      "Bash(terraform validate*)",
      "Bash(terraform plan*)",
      "Bash(terraform fmt*)",
      "Bash(terraform show*)",
      "Bash(ansible-playbook --check*)",
      "Bash(ansible-lint*)",
      "Bash(kubectl get*)",
      "Bash(kubectl describe*)",
      "Bash(kubectl logs*)",
      "Bash(docker build*)",
      "Bash(docker ps*)",
      "Bash(helm list*)",
      "Bash(helm template*)"
    ],
    "deny": [
      "Bash(terraform destroy*)",
      "Bash(terraform apply -auto-approve*)",
      "Bash(terraform state rm*)",
      "Bash(kubectl delete namespace*)",
      "Bash(kubectl delete pv *)",
      "Bash(helm uninstall*)"
    ]
  }
}

Read-only operations run without prompting. Destructive operations are blocked entirely. Everything in between asks for approval. Deny rules from either file stack, so global catches universally dangerous commands while project rules handle tool-specific restrictions.

Validation Hooks

Hooks run shell commands before or after Claude Code actions. For infrastructure, the most valuable hook auto-validates Terraform files after every edit. Add this to your project .claude/settings.json:

{
  "hooks": {
    "PostToolUse": [
      {
        "matcher": "Edit|Write",
        "hooks": [
          {
            "type": "command",
            "command": "if echo \"$CLAUDE_FILE_PATHS\" | grep -q '\\.tf$'; then terraform validate 2>&1 || true; fi"
          }
        ]
      },
      {
        "matcher": "Edit|Write",
        "hooks": [
          {
            "type": "command",
            "command": "if echo \"$CLAUDE_FILE_PATHS\" | grep -qE '\\.ya?ml$'; then yamllint -d relaxed \"$CLAUDE_FILE_PATHS\" 2>&1 || true; fi"
          }
        ]
      }
    ]
  }
}

Every time Claude Code edits a .tf file, the hook runs terraform validate automatically. YAML files (Ansible playbooks, Kubernetes manifests) get linted. If validation fails, Claude Code sees the error and fixes it immediately. For the full hooks API, see the official hooks documentation.

Security: What You Must Know

Claude Code inherits every credential in your shell environment. This is what makes it powerful for DevOps, and also what makes it risky without guardrails.

Use credential managers, not environment variables. Instead of exporting AWS_ACCESS_KEY_ID in your shell profile, use aws-vault to inject temporary credentials into the Claude Code session:

aws-vault exec staging -- claude

This starts Claude Code with temporary AWS credentials scoped to the staging environment. They expire automatically when the session ends.

Audit MCP servers before installing. MCP (Model Context Protocol) extends Claude Code with external tool integrations. A February 2026 study by Snyk found that 13.4% of public MCP skills had critical vulnerabilities, with 76 confirmed malicious payloads. Only install skills from trusted sources and review their code first.

Know when NOT to use Claude Code:

Production Terraform state manipulation: state corruption is painful and Claude does not fully understand complex dependency graphs
Secrets rotation: use Vault, AWS Secrets Manager, or sealed-secrets instead
Compliance environments: SOC 2, HIPAA, PCI DSS need audit trails that terminal history doesn’t satisfy
Multi-tenant credential access: if one credential set touches multiple customer environments, the blast radius is unacceptable
Disaster recovery procedures: one wrong command in a cascade scenario can make things worse

What Claude Code Gets Wrong

No honest guide skips the limitations. After extensive testing with infrastructure tasks:

Complex Terraform module composition across environments with shared state. Single modules work well, but wiring many modules with cross-references sometimes produces circular dependencies
Provider-specific edge cases. IAM policies with complex conditions, advanced VPC peering, and multi-region setups sometimes produce configs that validate but don’t behave as intended
Ansible idempotency. Generated playbooks occasionally use command or shell modules where a proper built-in module exists. This works once but isn’t idempotent on re-runs
Kubernetes networking. Network policies, service mesh configurations, and cross-namespace communication rules need careful review
SELinux contexts. On RHEL/Rocky systems, Claude Code sometimes forgets to set proper SELinux contexts after creating files or changing ports. Always verify with ausearch -m avc -ts recent

The rule: always review the plan, always test in staging, never auto-apply to production. Claude Code accelerates the work, but the engineer makes the final call.

Quick Reference: When to Use It

Task	Use Claude Code?	Safety Approach
Generate a new Terraform module	Yes	Review plan before applying
Debug a failing Ansible playbook	Yes	Run with `--check` first
Write a Dockerfile from scratch	Yes	Review image, scan for vulnerabilities
Create Kubernetes manifests	Yes	`--dry-run=client` then review
SSH diagnostics across servers	Yes	Read-only commands by default
Set up a CI/CD pipeline	Yes	Review YAML before merge
Rotate production secrets	No	Vault or Secrets Manager
Modify Terraform state	No	Manual with backup
Deploy without review	No	Always review first
Multi-tenant credential work	No	Isolate per tenant

The Hands-On Series

This guide gives you the foundation and safety configuration. Each tool gets a dedicated deep-dive with full demos on real infrastructure:

Guide	What You Will Build	Tested On
Claude Code + SSH	Provision servers, diagnose failures, multi-server health checks	OpenStack VMs
Claude Code + Terraform	Generate modules, deploy real infra, import resources, safety hooks	OpenStack / AWS
Claude Code + Ansible	Generate playbooks, convert scripts to roles, debug SELinux denials	Rocky Linux 10 + Ubuntu 24.04
Claude Code + Docker	Multi-stage builds, Compose stacks, optimization, container debugging	Docker on Linux
Claude Code + Kubernetes	Manifests, Helm charts, pod log analysis, debugging CrashLoopBackOff	Minikube / real cluster
Claude Code + GitHub Actions	Automated PR review, Terraform validation pipeline	GitHub repository

Every guide in the series follows the same pattern: real prompt, real generated code, real execution, verified result. When commands differ between Rocky Linux and Ubuntu, both variants are shown. Every guide uses real prompts, real infrastructure, and verified output.

Frequently Asked Questions

Can Claude Code manage multiple servers at once?

Yes. Claude Code SSHes into servers sequentially, runs commands on each, and aggregates the output into a single report. It can also spawn subagents that work on different servers in parallel. For inventory gathering, health checks, and log collection across a fleet, it replaces opening multiple terminal tabs. The limitation is that each SSH command must be non-interactive (no vi, no interactive prompts).

Does Claude Code work with private Git repos and internal tools?

Yes. It inherits your Git configuration, SSH keys, and any CLI tools in your PATH. Private repositories work if your SSH agent or credential helper is configured. Internal CLIs and APIs work as long as they have non-interactive modes. MCP servers can extend Claude Code’s reach to internal services like Jira, Confluence, or custom monitoring dashboards.

How do I prevent Claude Code from accessing production?

Three layers work together. First, use separate kubeconfig and AWS profile files per environment and set the env var to staging before starting Claude Code. Second, add production hostnames and cluster contexts to your deny rules in ~/.claude/settings.json. Third, run Claude Code in a devcontainer or isolated shell that only mounts staging credentials. The global deny rules persist across every session and every project.

What happens if Claude Code breaks something?

Recovery depends on the tool. Terraform: the saved plan file and state history protect you. Ansible: idempotent playbooks can re-run safely. Kubernetes: kubectl rollout undo reverts deployments. File edits: Claude Code’s session history shows every change and you can ask it to revert. The key is always having a rollback path before letting Claude Code execute anything destructive.