Maximising NetWorker data throughput

A question I've been asked many a time in NetWorker land falls into one statement:

How can I push enough data to my tape drives to keep them constantly writing data (streaming) without the drives going into a start/stop state?

When a tape drive enters a constant start/stop state (commonly known as shoeshining) this drastically reduces the data transfer to a tape drive.

Therefore, the right question to ask would be:

How can I maximise data throughput to my Backup Server utilising all my tape drives?

The answer to this question is not a simple one , as there are several areas within NetWorker that needs reviewing:

  • Server Parallelism
  • Client Parallelism
  • Target Sessions
  • Network Bandwidth
  • Server Capacity

In the following section we take a look at each of these areas in a little more detail and finally look at some tuning guidelines that will aid in your decision making.


Server Parallelism

One of the main tunable parameters under NetWorker is "server parallelism". Server parallelism allows an administrator to specify the maximum number of backup jobs (streams) that the NetWorker server will allow to be run concurrently. If you specify a value too low, you will not be able to start enough concurrent backup sessions, thus preventing your drive from achieving its optimal streaming. On the other hand, if you specify a value too high, you could overload your CPU resources on your NetWorker server.

In order to view or change the current server parallelism value , perform the following:

  • From your NetWorker Administration GUI (nwadmin),
  • click on Server -> Server Setup,
  • You may need to scroll down the fields to see the current Server Parallelism value.
nwadmin - server parallelism settings
Figure 1 - NetWorker Server Parallelism settings

Client Parallelism

Another important NetWorker tunable parameter available is the "Client Parallelism". This setting is established within each client resource on the NetWorker server.

Let's assume you have an NetWorker client with 6 file systems (/ /usr /opt /export/home /usr/openwin /var). NetWorker will create a saveset for each of the file systems when the client is backed up. Client parallelism determines how many of these file systems for this client will run concurrently.

An example would be, if we set the client parallelism to 6 for this client, all six savesets would start concurrently when the NetWorker server initiates this clients backup, assuming that we had at least 6 server parallelism slots open. If on the other hand, we set the client parallelism to 4, the NetWorker server would start 4 savesets for this client. When one of the savesets completes, the NetWorker server would start another saveset for the client. This would continue until all 6 savesets have completed. In other words, all 6 savesets would still complete, but they would only be run 4 at a time.

Note: The general rule of thumb when setting the client parallelism , is to set this value to the number of physical file system on the client under UNIX or the number of physical drives under a Windows environment.

To view or change the client parallelism for a given client, perform the following:

  • From within nwadmin, click on Client,
  • Click Client Setup,
  • Click on View -> Details
  • Select a client
  • Scroll down to the Parallelism field
viewing client parallelism
Figure 2 - Viewing the client parallelism

Target Sessions

Target sessions is another tunable parameter within NetWorker, but this parameter is a little different than that of the other settings discussed earlier. To understand target sessions, you must understand the way NetWorker multiplexes data to a single tape device.

By default the NetWorker architecture was designed to send multiple save sets to each tape drive in order for the attached tape drives to achieve a constant stream of data. Initially designed by the architects in light of potential bottlenecks such as network bandwidth and slow clients.

Here's an example. Let's say that you have an NetWorker server with a DLT7000 drive attached. Your server parallelism is set to 4, and all of your clients have a client parallelism of 1. You start 4 client backups, the NetWorker server will then start one save set on each client. As the clients transmit there blocks of data over the network to the NetWorker server, the server writes these blocks into memory. It may get three blocks from clientA, then two from clientB, one block from clientC, then one block from clientD, and another block from clientB. At this point the NetWorker server will amalgumate all of these blocks, in the order they were received and send them to the tape drive. The pyhsical layout of the blocks on the tape will looks similar to the diagram show in Figure 3.

Figure 3 - Physical layout of a NetWorker tape

Multiplexing is both good and bad. The good thing about mulitplexing is that it gives the NetWorker server a much better chance of keeping the tape drive streaming. The bad news is when its time to use the tape media to perform a restore or even a cloning operation.

During a restore or cloning operation, the NetWorker server must untangle (de-multiplex) the tape volume. For example, when recovering the saveset for clientA, the NetWorker server has to "pick out" all the blocks for clientA. If the tape volume is multiplexed, this could greatly increase the time required to recover savesets.

If at this stage your thinking that target sessions are the answer to this dialema, you'd be wrong. The target sessions as it's name implies is a target. It's a hard and fast setting.

For example,

Let's assume you have an NetWorker server with the server parallelism value of 8. We have four clients all with a client parallelism set to 2. The NetWorker server has two tape drives with their target sessions set to 2. If we start all four clients concurrently, the NetWorker server will start 8 concurrent save streams, two from each client. The NetWorker server will direct the first two save streams to the first tape drive and two save streams to the second tape drive. We now have four sessions running, but we still have 4 server parallelism slots to fill. The NetWorker server will accept more save streams from the clients, assigning one to the first tape drive.... etc.

The server continue to accept the client save streams and add them in a round robin fashion to the tape drives until either the total number of server parallelism or the client parallelism are reached. In the scenario above we are running 4 sessions on each tape drive, even though our target sessions are set at 2.To view or change the target sessions parameter, perform the following:

  • From the NetWorker administration utility, click on Media,
  • Click Devices drive,
  • View or set the Target Sessions field
Figure 4 - Viewing/Setting Target Sessions

Network Bandwidth

General concensus dictates that you must have a robust network environment in order to backup clients quickly. Most large scale networks are carved into segments, as shown in figure 5. In the example there are four network segments including the backbone network. In order to achieve optimum backup performance, four network adapters are installed in the NetWorker server, one for each network segment. The backups are scheduled in a manner where several clients on each network segment run concurrently. This allows for more data to be pushed through the network at a given period of time. The more data we can push through the network, the better our chances are of keeping the tape drives streaming.

Figure 5 - A typical segmented network

Server Capacity

Another factor that must be taken into consideration when tuning an NetWorker server is the physical servers capacity. Connecting network adapters and tape drives to you server consumes CPU resources. Remember, the primary goal is to keep all of the tape drives streaming all of the time. If your NetWorker servers CPU(s) are overloaded your tape drives will not be able to maintain its streaming mode, as they will compete for CPU cycles.

You can monitor CPU utilisation using the sar(1m) command under Solaris or other publicly available like proctool or top.


Tuning Guidelines

In this section, we will explore some general tuning guidelines. Every backup environment is different, so one should experient with these guidelines until you find the settings that best suit your environment. The aim is to keep the tape drives moving to maintain maximum throughput. If your drives are not performing at a high data rate, you must find the bottleneck that is preventing this. Most bottlenecks are around the NetWorker servers SCSI configuration, overloaded server CPUs, insufficient network bandwidth, network performance, client problems or improper server and/or client parallelism settings. Resolving bottlenecks can be very difficult and time consuming, but the payback is well worth the effort.

  • Experiment with your target sessions for a single tape drive until you find a setting that keeps the drive running at 5MB/sec or a bit higher. Once you achieve this optimal setting, apply the same value to all you tape drives.
  • Set your Server Parallelism using the following equation: Server Parallelism = Number of Tape Drives * Target Sessions
  • Once the above settings have been made, monitor the servers CPU cycles. If your CPU are over-committed and you notice that your tape drives falls below your target rate, then try disabling one or more tape drives to see if this will help relieve the burden on your CPUs, yet keep all of the enabled tape drives streaming.
  • Set your client parallelism equal to the number of file systems (UNIX) or physical drives (Windows) connected to you client.
  • Implement multiple network adapters in you NetWorker server, preferably, one network interface for each network segment. Schedule client backups so that several clients run on each network segment concurrently.
  • Consider the implementation of Storage Nodes to distribute the load of your backups. For more information on Storage nodes, refer to the NetWorker Administration and Install guides.
  • NetWorker can use a substantial amount of memory. Use sar(1m) to obtain a benchmark of the memory utilisation of your NetWorker server.
  • Implement one SCSI adapter for every two tape drives. Even maybe, implement a dedicated SCSI adapter for each tape drive.
  • Consider using a staging environment (file type devices) for your backups. File type devices do not have the streaming problem associated with tape drives. This could complicate your environment with this type of setup, as you must then move the data from the disk to your tape volumes.
  • If your network is a bottleneck, consider the implementation of a Storage Area Network (SAN). This will move the data traffic off your server.
  • If a SAN is too costly for your organisation, consider implementing a dedicated ethernet network for your backup traffic.
  • Consider upgrading to faster tape devices.
  • Remember, as you increase the multiplexing to your tape drives you are increasing backup performances, but also decreasing your recovery and cloning performance.
  • User network monitors to monitor your network bandwidth utilisation.

Conclusion

Throughout this document, we have suggested that you try to maintain a 5 MB/sec. throughput rate on your tape drives. The drives in your environment may be able to sustain a higher rate, or in fact may not be capable of hitting this 5MB/sec. rate.

The NetWorker Power Edition Performance Tuning Guide available on the NetWorker media CD or online documents at http://legato.com, provided a method that you can use to determine the optimal drive throughput rate in your environment. This guide also provides other tuning techniques for tuning your SCSI adapters and for network performance.