OpenStack instances sometimes get stuck in unexpected states – “resized,” “error,” or other conditions where normal operations like stop, start, or delete refuse to work. This guide covers the root causes, the correct fix commands, and additional troubleshooting techniques including rescue mode and console log analysis.

Understanding the “vm_state resized” Problem

When you resize an instance in OpenStack, the process involves multiple steps: the scheduler picks a host, the instance migrates, and the system waits for you to confirm or revert. If something goes wrong during this process – network interruption, compute node failure, or a timeout – the instance can get stuck with vm_state set to resized and task_state that never clears.

In this state, you will see errors like:

Cannot 'stop' instance while it is in vm_state resized

The instance is effectively locked. Normal actions through Horizon or the CLI will fail because Nova enforces state machine rules – you cannot stop, reboot, or delete an instance that is mid-resize.

Diagnosing Instance State

Before applying any fix, check the actual state of the instance:

openstack server show <instance-id> -f value -c status -c vm_state -c task_state -c power_state

This gives you the four key state fields. A healthy running instance shows:

status: ACTIVE
vm_state: active
task_state: None
power_state: Running

A stuck resize typically shows:

status: VERIFY_RESIZE (or RESIZE)
vm_state: resized
task_state: None (or resize_finish)
power_state: Running

Fix 1: Confirm or Revert the Resize

If the instance is in VERIFY_RESIZE status, the resize actually completed and is waiting for your confirmation. Try confirming it:

openstack server resize confirm <instance-id>

Or revert to the original size:

openstack server resize revert <instance-id>

Verify the state returned to ACTIVE:

openstack server show <instance-id> -c status

If confirm or revert also fails, move to the next approach.

Fix 2: Reset the Instance State with Nova

When the instance is truly stuck and normal operations fail, you can force-reset the state. This tells the Nova database to change the instance state without actually performing any operation on the hypervisor.

openstack server set --state active <instance-id>

Or using the older nova CLI (still available on many deployments):

nova reset-state --active <instance-id>

After resetting the state, verify:

openstack server show <instance-id> -c status -c vm_state -c task_state

The instance should now show ACTIVE status. Try stopping or rebooting it to confirm operations work again:

openstack server stop <instance-id>

Keep in mind that resetting the state does not fix any underlying issue. If the instance was mid-migration, the actual VM process might be on a different host than what the database records. Check that the instance is reachable and functioning correctly after the reset.

Fix 3: Reset to Error and Then Delete

If the instance is beyond repair and you just need it gone, reset it to error state first, then delete:

openstack server set --state error <instance-id>
openstack server delete <instance-id>

If a normal delete still fails, use force-delete:

nova force-delete <instance-id>

Or with the OpenStack CLI:

openstack server delete --force <instance-id>

Verify the instance is removed:

openstack server list --all-projects | grep <instance-id>

Common OpenStack Instance States and Fixes

The resize issue is just one of many state problems. Here are the most common ones and how to handle them.

Instance Stuck in BUILD State

This happens when the scheduler assigns the instance but the compute node never finishes spawning it. Common causes: image download failure, insufficient resources on the compute node, or a networking issue during port binding.

openstack server set --state error <instance-id>
openstack server delete <instance-id>

Check the nova-compute log on the target host for the root cause:

grep <instance-id> /var/log/nova/nova-compute.log | tail -50

Instance Stuck in DELETING State

Occurs when the compute node is unreachable or Cinder cannot detach a volume. Force-delete is the fix:

openstack server set --state error <instance-id>
openstack server delete --force <instance-id>

Instance in ERROR State After Spawn

When an instance lands in ERROR immediately after creation, check the fault message:

openstack server show <instance-id> -c fault

Common faults include “No valid host was found” (scheduler cannot place the instance) and “Build of instance aborted” (resource exhaustion). Fix the underlying issue, delete the failed instance, and rebuild.

Instance Stuck in SHUTOFF but Won’t Start

If openstack server start fails on a SHUTOFF instance, the compute host may be down or the instance’s backing storage may be missing. Check:

openstack server show <instance-id> -c OS-EXT-SRV-ATTR:host
openstack compute service list

If the compute host is down, you can evacuate the instance to a different host:

nova evacuate <instance-id>

Using Instance Rescue Mode

Rescue mode boots an instance from a temporary image while keeping its original volumes attached. This is similar to booting a physical server from a live CD – useful when the instance’s OS is broken and you need to repair filesystem issues, fix configuration files, or recover data.

Put an instance into rescue mode:

openstack server rescue <instance-id>

By default, it uses the same image the instance was created with. To use a different rescue image:

openstack server rescue --image <rescue-image-id> <instance-id>

The rescue command outputs a temporary password. Use it to access the instance via console. The original root disk is attached as a secondary device – typically /dev/vdb. Mount it, make your fixes, then unrescue:

openstack server unrescue <instance-id>

The instance reboots from its original disk with your changes in place.

Console Log Debugging

When an instance is unreachable via SSH and you need to see what happened during boot, the console log is your best tool:

openstack console log show <instance-id>

This outputs the serial console output, including kernel boot messages, systemd service startup, cloud-init output, and any errors. Pipe it through tail to see the most recent output:

openstack console log show <instance-id> --lines 100

For interactive debugging, get a VNC console URL:

openstack console url show <instance-id>

Open the URL in a browser to get a live console session. This works even when networking is misconfigured inside the instance.

Common things to look for in console logs:

  • Kernel panic – Usually indicates a corrupted image or incompatible kernel. Rescue the instance or rebuild from a known-good image.
  • cloud-init errors – Metadata service unreachable, bad user-data scripts. Check the Neutron networking and metadata agent.
  • Filesystem mount failures – Corrupted volume or wrong fstab entry. Use rescue mode to fix.
  • DHCP timeout – The instance cannot get an IP. Check the Neutron DHCP agent and the port status.

Conclusion

Stuck instance states in OpenStack are frustrating but fixable. In most cases, openstack server set --state active gets you out of trouble. For instances beyond recovery, resetting to error and force-deleting clears them out. Rescue mode and console logs give you the diagnostic access you need when SSH is not an option. The key is understanding the state machine – once you know what state the instance is in and why it got there, the fix is usually straightforward.

LEAVE A REPLY

Please enter your comment!
Please enter your name here