Sun Storage T3 DMP Failover FAQ

This FAQ should answer some of the basic questions on how failovers work on the StorEdge T3 array, as well as how Veritas's DMP works to manage paths during the failover.

What amount of controller failover exists in a single-T3 array configuration?

None. If a controller fails, access to the array is lost. To get any type of failover (manual or automatic), you must connect the arrays in a 'partner group' configuration.


To answer the next few questions, let's set up an example of two T3 arrays as a partner group. After doing this, there is a 'master' array and a 'slave' array. The 'master' array is on the bottom, and the 'slave' array is on top. Each array has 1 LUN, giving us 2 LUNs total. We will refer to the LUN on the master array as LUN-M, and the LUN on the slave array as LUN-S.

There are two fibre-channel cables connecting this partner group to the host computer; one fibre cable comes from the master array and the other comes from the slave array.

On the host, when I run 'format' or look in /dev/dsk, it appears that there are 4 LUNs. Why?

We already know that we really have only 2 LUNS. Each LUN has 2 paths to the host: a primary and a secondary (or 'failover') path. Since there are 2 paths to each of the 2 LUNS, Solaris mistakenly thinks that there are 4 LUNs.

Which path is the primary path, and which one is the secondary path?

It depends on which LUN you are talking about. There is not one single primary path and one secondary path.

The primary path to LUN-M goes through the fibre cable which is connected to the master array, and all data going to LUN-M goes through that cable. The other cable, the one connected to the slave (top) array, is only there as a 'failover' path, meaning that it is not used unless something happens to the primary path. Normally, no data going to or from LUN-M goes over the secondary (failover) path.

The primary path to LUN-S goes through the fibre cable which is connected to the slave array. All data going to LUN-S goes through that cable. The other cable, the one connected to the master (bottom) array, is only there as a 'failover' path, meaning that it is not used unless something happens to the primary path. Normally, no data going to or from LUN-S goes over the secondary (failover) path.

As you can see, each LUN has a primary path. The primary path is defined as the path going into the array where the LUN lives. All data travels on the LUN's primary path. The secondary path is not used for data transfer for that particular LUN. For our example, the 'top' cable (path) acts as primary path for LUN-S and secondary path to LUN-M. The 'bottom' cable (path) acts as primary path for LUN-M and secondary path to LUN-S.

Without DMP, I find that I can access a LUN using either of the two paths it presents to Solaris. I thought you just said that you only have one primary path.

You DO have only one primary path. The 'tricky' part, though, is that you CAN access the LUN through the secondary path if you want. However, if you do this, you will unwillingly be causing a LUN failover to occur. Let me explain...

LUN-M, which we defined above as the LUN on the master (bottom) array, uses the bottom path as it's primary path and top path as it's secondary path. From the Solaris level, the primary path is c3t2d0 and the secondary path is c4t1d0. You MUST access the LUN using its primary path (c3t2d0). That means you must select THAT path when running 'format', or when running 'newfs' or 'fsck' or 'mount', or even when running 'prtvtoc'.

Without DMP, what happens if I inadvertently use the secondary (c4t1d0) path when running 'format' or 'mount' or something?

The expectation is that this secondary path will only be used if the primary path is no longer accessible. If the array detects I/O for a LUN traveling on it's secondary path, it assumes that something bad must have happened to the primary path, so the primary path is disabled and a controller failover occurs. Yes... believe it or not, simply running a 'prtvtoc' on the secondary path will cause a LUN failover!

As you can see, if you are not using DMP, it is essential for you to access the LUN *ONLY* through it's primary path.

Given two paths (i.e., c3t2d0 and c4t1d0) to a LUN, how can you determine which is the primary and which is the secondary?

It's a little tricky, but not TOO hard... The function of the path (primary or secondary) may be found by using the 'format -e' command. The '-e' option gives you access to the "scsi" command. By selecting 'scsi' and then doing an 'inquiry', you will get a list of bytes. If you look at the 6th byte (starting with 0 as the first), you will see either a "10" or a "30". A "10" indicates this is the primary path; a "30" indicates this is the secondary path.

For example, run format -e on one of the two paths:

# format -e c3t2d0

Once in 'format', select the "scsi" command, and then, from the 'scsi' prompt, select the 'inquiry' command. This produces the following:

                              ||
	Inquiry:              vv
	    00 00 03 12 5b 00 10 02 53 55 4e 20 20 20 20 20
	    54 33 30 30 20 20 20 20 20 20 20 20 20 20 20 20
	    30 31 30 31 30 30 30 30 31 36 32 30 31 30 00 00
	    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
	    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
	    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

See the arrows above pointing to the 6th byte? It's a "10"? This indicates that this path (c3t2d0) is the primary path.

Doing the same set of commands on "c4t1d0" yields the following:

	                      ||
	Inquiry:              vv
	    00 00 03 12 5b 00 30 02 53 55 4e 20 20 20 20 20
	    54 33 30 30 20 20 20 20 20 20 20 20 20 20 20 20
	    30 31 30 31 30 30 30 30 31 36 32 30 31 31 00 00
	    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
	    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
	    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

which indicates that this is the secondary path.

How does DMP help with managing the primary/secondary paths?

DMP is smart enough to look at the 2 paths to a particular LUN and determine which is the primary path. DMP is also smart enough to use ONLY that primary path for I/O destined for that LUN. Therefore, it is impossible for data to travel down the secondary path and cause an inadvertent LUN failover (as was described above).

DMP presents only ONE path to the user. If you run 'vxdisk list', you will see only one path to each LUN, unlike Solaris, which shows you both paths. You can list out the path information for a particular LUN using the command "vxdisk list <c#t#d#s2>". For example, we run:

# vxdisk list c3t2d0

The last few lines show the following:

	numpaths:   2
	c3t2d0s2  	state=enabled	type=primary
	c4t1d0s2	state=enabled	type=secondary

As you can see here, there are 2 paths, and it shows you which is primary and which is secondary.

Without DMP, if a controller or path REALLY fails, how do I get the paths switched around?

When a controller or primary path fails, you will have to manually attempt to use the secondary path. For example, this means that if you can no longer access your data using the c3t2d0 path, you can try accessing it using c4t1d0. Obviously, this is a manual procedure, since you might have to 'umount' filesystems currently using c3t2d0 and 'mount' them again using the c4t1d0 path.

With DMP, if a controller or path REALLY fails, how do I get the paths switched around?

It's done automatically. There is no need for you to umount and mount your data using the secondary path. DMP will automatically switchover (internally) all I/O to the secondary path. You may experience a slight delay during this time, but no data will be lost.

I thought DMP did load balancing? What do I have to do to get DMP to perform load-balancing between the two paths?

DMP does NOT load balance on the StorEdge T3 array. DMP only performs the tasks necessary to failover from the primary to secondary path. This array is physically unable to have load balancing. Remember, any I/O found going over the secondary path will cause an undesireable LUN failover. DMP can and does perform load balancing on other arrays, just not on this one.