Recover oVirt disk snapshots marked as "illegal"

Kuko Armas <kuko@canarytek.com>
| ,

How to recover an oVirt virtual machine with disk snapshots marked in illegal status

NOTE: BE EXTREMELLY CAREFULL doing what I describe here. Messing with the underlying storage systems without oVirt’s control can make you loose all your data. I only did it because the situation was critical: we had already lost one disk and we had a lot of critical VM’s that probably wouldn’t boot if we stopped them (or they crashed)

You have been warned, don’t blame me…

Introduction

Recently we have been experiencing problems with our oVirt Storage Domains. Everytime we try to create a disk snapshot, we get an error and if we try to delete the snapshot it fails and the disk snapshots gets marked as illegal

With these snapshots in “illegal” status, if we stop the VM it will not boot again, and if it’s a critical machine we could be in deep trouble

Why it happens

Well, I still don’t know why all snapshop operations are failing in all our storage domains. When I find out, I will document in another post. UPDATE: I finally found out. This is why it happened

What I will explain in this section is why the disk snapshot is marked as illegal and why the VM will not boot.

Since a “live removal” of a disk snapshot is a critical operation, when oVirt starts one, the first thing it does is mark the snaphot as illegal. If the operation completes, oVirt will mark it as OK and remove the snapshot from the engine database. But if something happens and the removal does not finish, the disk snaphot will remain in “illegal” status, and you have to manually check the status of the snapshot and fix it if needed. The fact that an oVirt VM with a disk snapshot marked as illegal will not boot, is a feature (not a bug). It’s to make sure that VM will not corrupt the data (because oVirt is not sure what’s the real status of that disk)

Checking the status of a snapshot in the engine database

To fix the situation, first we need to check the real status of the snapshot

In oVirt, a given VM may have different VM snapshots, each VM snapshots will have disks snapshots (for every disk)

First, we check the snapshots for our vm (vm_id 343db85c-64bc-4f0c-b9a0-4ca8d129e0c3)

engine=# select snapshot_id,snapshot_type,status,description from snapshots where vm_id='343db85c-64bc-4f0c-b9a0-4ca8d129e0c3';
             snapshot_id              | snapshot_type | status |                description
--------------------------------------+---------------+--------+-------------------------------------------
 3d1eaf0a-49b3-45be-a104-f5ceebe52540 | ACTIVE        | OK     | Active VM
 cb8672bb-38d3-47ee-a498-4b403fc7d8db | REGULAR       | OK     | Auto-generated for Live Storage Migration
(2 rows)

As we can see, we have two snapshots. Well, we really only have one, but oVirt shows the “current status” as a snapshot, that’s why we always have at least unoe snapshot named “Active VM” for any VM

What we need to do now is try to find out what oVirt “thinks” about the affected VM’s disk snapshots.

The first thing we need to do is look for the affected disks and write down their ids (in my case, it’s only one disk, with id “6cf2c490-784b-437f-8305-1bed40dc9c9d”)

Now, in the “engine” postgres database (in the engine host), we query for all disk images associated to the previous disk

engine=# select image_guid,parentid,imagestatus,vm_snapshot_id,volume_type,volume_format,active from images where image_group_id='6cf2c490-784b-437f-8305-1bed40dc9c9d';
              image_guid              |               parentid               | imagestatus |            vm_snapshot_id            | volume_type | volume_format | active 
--------------------------------------+--------------------------------------+-------------+--------------------------------------+-------------+---------------+--------
 b7af66ad-d27b-4087-9c33-11625912a45f | 00000000-0000-0000-0000-000000000000 |           4 | cb8672bb-38d3-47ee-a498-4b403fc7d8db |           1 |             5 | f
 7f14ae53-feac-4088-9560-c77a16dcd5e3 | b7af66ad-d27b-4087-9c33-11625912a45f |           1 | 3d1eaf0a-49b3-45be-a104-f5ceebe52540 |           2 |             4 | t
(2 rows)

As we can see, we have 2 images for that disk:

  1. The first one is the main disk (parentid zero). It’s marked as illegal (imagestatus 4) and is not active
  2. The second one is a COW snapshot that uses the previous disk as base image (parentid is the previous image’s image_guid). This image is marked as “active”

The fact that the second image is the one marked as “active” means that oVirt will use the second image as disk for the VM when it’s booted

Note that the full disk is assigned to the snapshot, and the COW layer is assigned to the “Active VM” snapshot. This is so because snapshots are implemented creating a COW layer on top of the previous “snapshot” to store the differences.

Checking the status of a snapshot in the storage domain

Now we need to check if the REAL status of the disks is the one we saw in the database

The procedure shown here is for LVM based block storage (iSCSI and FiberChannel). For NFS it’s much simpler, just look for files with the image id as name

The LVM backend stores the data in Logical Volumes (LV). oVirt tags all LV with the disk id that uses that LV.

So, to find out the LV used for my disk “6cf2c490-784b-437f-8305-1bed40dc9c9d” we need to find all LV tagged with that id.

We connect to the host that is running with role SPM, and force LVM to rescan all PV, so we make sure it can see all LV

pvscan --cache

Now we can scan for LV with the disk tag:

[root@blade9 ~]# lvs -o+lv_tags | grep 6cf2c490-784b-437f-8305-1bed40dc9c9d
  b7af66ad-d27b-4087-9c33-11625912a45f 37996bcb-853b-4a3f-b273-094a116e6318 -wi-------  30.00g                                                     IU_6cf2c490-784b-437f-8305-1bed40dc9c9d,MD_10,PU_00000000-0000-0000-0000-000000000000          

As we can see, we have just one LV with that tag. The LV name is b7af66ad-d27b-4087-9c33-11625912a45f (the same as the first image’s image_guid in the database), and the size is 30 GB. The VG that contains this LV is “37996bcb-853b-4a3f-b273-094a116e6318”.

So, oVirt thinks (the database) that it has 2 images (LV) for that disk: a base image and a COW file that uses the base disk a base image. But we only have the base disk.

To make sure the LV we have is a whole disk (not a COW layer) we can use qemu-img in it:

[root@blade9 ~]# qemu-img info /dev/37996bcb-853b-4a3f-b273-094a116e6318/b7af66ad-d27b-4087-9c33-11625912a45f
image: /dev/37996bcb-853b-4a3f-b273-094a116e6318/b7af66ad-d27b-4087-9c33-11625912a45f
file format: raw
virtual size: 30G (32212254720 bytes)
disk size: 0
[root@blade9 ~]# lvchange -an /dev/37996bcb-853b-4a3f-b273-094a116e6318/b7af66ad-d27b-4087-9c33-11625912a45f

As we can see, it’s a raw disk with a size of 30GB

Solving the problem

From the previous sections, we see that the problem is an inconsistency of what oVirt has in the database, and what we really have in the storage domain

To fix the problem, we need to change the database to let oVirt know that we only have one image and it should use it as disk. So, we need to:

  • Remove snapshot_id cb8672bb-38d3-47ee-a498-4b403fc7d8db
  • Remove the COW layer image (image_guid 7f14ae53-feac-4088-9560-c77a16dcd5e3)
  • Link the full disk image (image_guid b7af66ad-d27b-4087-9c33-11625912a45f) to snapshot “Current VM” (3d1eaf0a-49b3-45be-a104-f5ceebe52540)
  • Mark the correct snapshot as correct and active

Since we will be doing changes in the engine database, first we will make a backup and stop the engine process

pg_dump engine > engine.dump
systemctl stop engine
  • Remove snapshot_id cb8672bb-38d3-47ee-a498-4b403fc7d8db
engine=# delete  from snapshots where snapshot_id='cb8672bb-38d3-47ee-a498-4b403fc7d8db';
DELETE 1
  • Remove the images linked to the deleted snapshot (image_guid 7f14ae53-feac-4088-9560-c77a16dcd5e3)
engine=# delete from images where image_guid='7f14ae53-feac-4088-9560-c77a16dcd5e3';
DELETE 1
  • Link the correct image (image_guid b7af66ad-d27b-4087-9c33-11625912a45f) to snapshot 3d1eaf0a-49b3-45be-a104-f5ceebe5254
engine=# update images set vm_snapshot_id='3d1eaf0a-49b3-45be-a104-f5ceebe52540' where image_guid='b7af66ad-d27b-4087-9c33-11625912a45f';
UPDATE 1
  • Mark the correct snapshot as correct and active
engine=# update images set active='t' where image_guid='b7af66ad-d27b-4087-9c33-11625912a45f';
UPDATE 1
engine=# update images set imagestatus='1' where image_guid='b7af66ad-d27b-4087-9c33-11625912a45f';
UPDATE 1 
  • Make sure everything is ok:
engine=# select snapshot_id,snapshot_type,status,description from snapshots where vm_id='343db85c-64bc-4f0c-b9a0-4ca8d129e0c3';             snapshot_id              | snapshot_type | status | description 
--------------------------------------+---------------+--------+-------------
 3d1eaf0a-49b3-45be-a104-f5ceebe52540 | ACTIVE        | OK     | Active VM
(1 row)

engine=# select image_guid,parentid,imagestatus,vm_snapshot_id,volume_type,volume_format,active from images where image_group_id='6cf2c490-784b-437f-8305-1bed40dc9c9d';
              image_guid              |               parentid               | imagestatus |            vm_snapshot_id            | volume_type 
| volume_format | active 
--------------------------------------+--------------------------------------+-------------+--------------------------------------+-------------
+---------------+--------
 b7af66ad-d27b-4087-9c33-11625912a45f | 00000000-0000-0000-0000-000000000000 |           1 | 3d1eaf0a-49b3-45be-a104-f5ceebe52540 |           1 
|             5 | t
  • Check that you only have one image, pointing to the only snapshot “Active VM”, and the image should be active and imagestatus=1

  • If everything is OK, start ovirt-engine and you should be able to boot the VM… GOOD LUCK!

Other things that could be wrong

In the previuos example, we had a very simple problem, but we could have a different problem. In the following sections we will analyze some of them

Make backups of your disks!!!

Well, this is not really a problem, but I want to mention it first because you should always make a backup before trying the following actions

To create a new disk from an existing one, find the las COW layer and use “qemu-img convert”

qemu-img convert -O raw -p /dev/146dca57-05fd-4b3f-af8d-b253a7ca6f6e/ee733323-308a-40c8-95d4-b33ca6307362 /mnt/BACKUP_VM/full-disk1.raw

Wrong reference to base image in a COW layer

When we create a snapshot, oVirt “freezes” the base disk and creates a copy on write (COW) layer on top to store the changes.

The COW layer is implemented as a qcow snapshot, that stores a reference to the base image that contains the original disk (or previous layer) If, for any reason, this reference doesn’t point to the correct image, the virtual disk will not work and will be marked as illegal

I’ve seen this problem happen with live migration snapwhots. When a disk is migrated to a different storage domain with the VM running (storage live migration), oVirt creates a snapshot. I’ve seen COW layers pointing to the base disk in the wrong (old) storage domain

In this case, when you check the cow layer wirh qemu-img, you may see something like this:

[root@blade6 ~]# qemu-img info /dev/308cefe0-a09f-404c-a4e8-8d6294274143/e3c34b8d-b787-48ee-9432-65467ff41406
image: /dev/308cefe0-a09f-404c-a4e8-8d6294274143/e3c34b8d-b787-48ee-9432-65467ff41406
file format: qcow2
virtual size: 50G (53687091200 bytes)
disk size: 0
cluster_size: 65536
backing file: ../a9d47b31-6e5b-4bfb-a887-0b6f07442a6b/../a9d47b31-6e5b-4bfb-a887-0b6f07442a6b/../a9d47b31-6e5b-4bfb-a887-0b6f07442a6b/3feb3905-f956-43c9-aace-90ab572e40cd (actual path: /dev/308cefe0-a09f-404c-a4e8-8d6294274143/../a9d47b31-6e5b-4bfb-a887-0b6f07442a6b/../a9d47b31-6e5b-4bfb-a887-0b6f07442a6b/../a9d47b31-6e5b-4bfb-a887-0b6f07442a6b/3feb3905-f956-43c9-aace-90ab572e40cd)
backing file format: raw
Format specific information:
    compat: 0.10
    refcount bits: 16

As you can see, the “backing file” parameter contains a “strange” psth. If you check the disk, you will see an error because it can not find the backing store

[root@blade6 ~]# qemu-img check /dev/308cefe0-a09f-404c-a4e8-8d6294274143/e3c34b8d-b787-48ee-9432-65467ff41406
qemu-img: Could not open '/dev/308cefe0-a09f-404c-a4e8-8d6294274143/e3c34b8d-b787-48ee-9432-65467ff41406': Could not open backing file: Could not open '/dev/308cefe0-a09f-404c-a4e8-8d6294274143/../a9d47b31-6e5b-4bfb-a887-0b6f07442a6b/../a9d47b31-6e5b-4bfb-a887-0b6f07442a6b/../a9d47b31-6e5b-4bfb-a887-0b6f07442a6b/3feb3905-f956-43c9-aace-90ab572e40cd': No such file or directory

The problem in this case was that the backing store was already moved to storage domain 308cefe0-a09f-404c-a4e8-8d6294274143, so the correct path should be /dev/308cefe0-a09f-404c-a4e8-8d6294274143/3feb3905-f956-43c9-aace-90ab572e40cd

If you are completely sure that you know the correct backing store, you can change it with “qemu-img rebase -u”. For example, in this case, I fixed it with

rebase -u -b /dev/308cefe0-a09f-404c-a4e8-8d6294274143/3feb3905-f956-43c9-aace-90ab572e40cd /dev/308cefe0-a09f-404c-a4e8-8d6294274143/e3c34b8d-b787-48ee-9432-65467ff41406

After doing this change, the previous “qemu-img check” should work

Wrong metadata for disk

On some VM, even after fixing the data on the engine database and marking the disk as OK, oVirt didn’t boot the VM and in the logs I saw that the disk was still marked as illegal. I double checked the database and the status was OK.

In order to have additional information on the storage domains, oVirt caches the disks metadata un a “metadata” volume on the storage domain. It allows oVirt to import a storage domain even if it does not have any previous information about it.

I guessed that oVirt was probably reading the cached metadata from the metadata volume, and the disk was still marked as illegal in this cache. I looked in the logs, and several lines before the failure I saw this operation:

/usr/bin/dd iflag=direct skip=10 bs=512 if=/dev/37996bcb-853b-4a3f-b273-094a116e6318/metadata count=1

Just to test, I run it and I got something similar to this:

DOMAIN=37996bcb-853b-4a3f-b273-094a116e6318
CTIME=1500417613
FORMAT=RAW
DISKTYPE=2
LEGALITY=ILLEGAL
SIZE=10485760
VOLTYPE=LEAF
DESCRIPTION={"DiskAlias":"somehost_Disk1","DiskDescription":"somehost-disk1"}
IMAGE=1f90f5a2-6fbb-4474-868a-313c6288633a
PUUID=00000000-0000-0000-0000-000000000000
MTIME=0
POOL_UUID=
TYPE=PREALLOCATED
EOF

It seems oVirt stores the disk information in 512 bytes blocks on the metadata volume, and my disk happened to be on slot 10. I tried to read other slots (11, 12, etc), and in all of them I got similar information for other virtual disks on that storage domain.

I didn’t find an operation to force oVirt to refresh the cached information in the metadata volume, so I decided to play on the safe side. I dumped that block to a file, changed ILLEGAL for LEGAL with vim, and wrote the fixed information to “slot 10” with this command:

/usr/bin/dd iflag=direct skip=10 bs=512 if=/dev/37996bcb-853b-4a3f-b273-094a116e6318/metadata count=1 of=wrong.dd
# Fix wrong info and save as fixed.dd
/usr/bin/dd oflag=direct seek=10 bs=512 if=fixed.dd of=/dev/37996bcb-853b-4a3f-b273-094a116e6318/metadata count=1

After this change, all VMs booted…