Cannot import diskgroup - No Configuration Copies

I came across a rare occurance the other day with a customer whereby where all copies of Veritas configuration database were corrupted.

Why did this happen ?

The private data regions are accessed frequently by the Veritas kernel, it is always updating status information in the private regions. We cannot say exactly what causes this, neither can we definitively state that all the private regions were totally bad.

As I am a member of Sun's Data Storage team I have come across many an issue, but in this case what I can say is that this is very rare.

We have not been able to reproduce it within the lab, and talking with engineering and Veritas directly it isn't easy to pinpoint for a complete fix that would prevent it from ever happening again.

The only way we can reproduce it is to physically overwrite all the private regions on all the disks.

If even one disk is not overwritten, it can and will pull in the diskgroup.

How do we over come the problem?

The following procedure needs an existing rootdg and volume manager running, so it is not applicable for a lost rootdg.

To reconstruct the configuration

To get the configuration back we would like to use a copy of the configuration that was taken when the system was running ok with the commands.

# vxprint  -hmQqspv -D - > /<directory path>/<filename>
# vxdisk list

This last command gives you a list of access name and media name pairs.

If this happens and the customer does not have copies of the configuration when the system was good, we have in tha past run the following command. If this method is used the customer has to be able to check the configuration and data as there is no garrantee that this is latest configuration.

# /etc/vx/diag.d/vxprivutil dumpconfig /dev/rdsk/cXtXdXs2 | \
vxprint  -hmQqspv -D - > /<directory path>/<filename>

NB. You may well have to grep out the access name and media name pairs from the output file.

Once you have the configuration from above you can create the group as follows.

Create the new disk group

NB. Disk names MUST correspond to the same disks exactly as they were originally (see you backup copy vxdisk list output). We also need a list of the names which equate to the physical disks, this can be obtained be keeping the output from vxdisk list or it will have to be grep'ed out of the temporary file.

Initalise the diskgroup, using one of the disks from the lost disk group.

# vxdg init <dg> <access name>=<medianame>

NB. Substitute the disk group, disk access name and media name from the saved vxdisk list output. For example:

# vxdg init datadg disk01=c2t0d0s2

Add in the rest of the disks.

# vxdg -g new_data_dg adddisk <access name>=<medianame> \
     [other disks]

Recreate the volume(s) configuration from the configuration file.

# vxmake -g <dg> -d /<directory path/<filename>

NB. If this fails saying input file too large, then split the file up and run the above for each one. In most cases it works, its just for very large configurations and then we have only split it into two pieces.

You can use the above command with the -V option which will go through the motions but not actually do anything.

Now bring the volumes back online.

# vxvol -g <dg> startall