NetWorker checkpoint restart backups

The checkpoint restart feature allows a failed backup operation to restart at a known good point prior to the point-of-failure during the backup. A known good point is defined as a point in the backup data stream where the data was successfully written to tape and that data can be located and accessed by subsequent recovery operations.

This feature allows client backups that are part of a scheduled backup to be restarted, if they fail while running. This prevents the files and directories that have already been backed up from being backed up again.

IMPORTANT Checkpoint restart is only supported by both NetWorker 7.6 SP1 server and clients and above, previous versions of NetWorker software do not support this feature. If either the NetWorker server or client is an earlier version, then the checkpoint restart feature is not supported.

Backup failures can occur for various reasons. The most common reasons include hardware failures, loss of network connectivity, and primary storage software failures. The NetWorker server and storage node components must remain running to manage the client failure and to create a partial save set. If the NetWorker server or storage node components fail during a backup, partial save sets will not be created. In this case, backup for the checkpoint-enabled client would have to be started from the beginning.

If the checkpoint restart feature has not been enabled, a failure encountered during a scheduled backup operation might require a re-run of an entire backup tape set. This can be costly when a limited backup window of time is available, as a significant portion of the backup data might have been successfully transferred to tape, and NetWorker cannot resume a save set from the point of interruption.

For example, when performing a 800 GB backup that requires approximately 10 hours to complete and spans 6 tapes, if a failure occurs while writing to the last tape the previous 5 tapes representing 9 hours of backup time may need to be re-run. As data sets continue to increase in size, so does the impact of backup failures.

Note: Configuring a client as checkpoint-enabled might impact the backup speed. This is dependent upon the data zone environment and configuration.

Checkpoint enabled clients provide the following enhancements:

  • Failed save sets are marked as partial; not as aborted.
  • Restarted save sets have a new SSID and savetime.
  • Partial save sets are indexed.
  • Partial save sets are not removed from the index, the media databases, and media such as AFTD.
  • The Checkpoint option will be ignored for index and bootstrap save sets.

Note: The checkpoint restart feature is not enabled by default. If a NetWorker client is not configured as checkpoint enabled and a backup fails, the next time when the group is run, the software creates a new save set from the beginning.