Troubleshooting NetWorker Inactivity Timeout errors
NetWorker provides the 'Inactivity Timeout' attribute in the Group resource for the ability to timeout and abort client savesets. The 'Inactivity Timeout' indicates the number of minutes NetWorker software waits for a NetWorker client saveset to respond back to the NetWorker server, before determining the specified client's saveset is unavailable for backup and aborts the saveset. The save set, once abandoned by NetWorker, may still complete after being aborted but NetWorker will not receive or report success of the saveset. A message reporting a timeout event appears in the final savegroup completion report.
In general, an inactivity timeout could be triggered by any one of many reasons. One of the most common reasons an inactivity timeout error may occur is the NetWorker client's save program is traversing a large filesystem to perform an incremental backup and if only a few files have changed there may be a large delay before the NetWorker client save process sends to the NetWorker server the next savestream of data. This condition is easy to resolve by increasing the value of the 'Inactivity Timeout' attribute for the NetWorker Group resource reporting the error. You can also set the value to 0 (zero) to indicate not to timeout and then time how long the saveset will require to complete and set the inactivity timeout attribute to that number of minutes or more.
If the timeout is not due to the reason listed above, then confirm the following items:
- Check NetWorker client has not been turned off and the the network cable is attached.
- Retries of backups always fail.
- All name resolution between the NetWorker client and backup server are successful.
- Verify all known aliases for the NetWorker client are entered in the alias attribute of the NetWorker client resource.
- The NetWorker server's hostname or domain name changed and the NetWorker client's daemon (nsrexecd) was restarted after the new NetWorker server hostname (and FQDN) was added to the Default_location\nsr\res\servers file, or for Unix, if not using the servers file, the 'nsrexecd -s backup_server' statement in the NetWorker startup script was updated.
- All network cards and switches have the same Duplex, Speed, MTU and other settings.
- Verified adjustable TCP/IP parameters.
If all the above statements are true then check the following additionally known contributing factors that can generate this problem:
- When a firewall is in-between the NetWorker server and the NetWorker client, then ensure that enough ports are open in the firewall within the NetWorker connection port range. The NetWorker server may be able to contact the NetWorker client but the NetWorker client might not be able to open any new ports to respond to the NetWorker server. We have also seen this problem due to failing (broken) firewalls dropping packets.
- NOKIA: In the case of the Nokia firewall it automatically denies service due to a number of ports being opened at once. NetWorker will open many ports at one time depending on the number of savesets being backed up --- thus considered an incompatibility between the Nokia firewall and with NetWorker.
- It has been seen where an improperly configured 'Execution Path' attribute for the NetWorker client's resource has caused these Inactivity Timeout errors. Configure the 'Execution Path' attribute correctly or leave blank.
- This issue as been seen due to corruption of files on disk. Verify the checksum and/or date of the files to ensure their integrity. Remove, move or. fix the corrupted file or use NetWorker directive to skip it.
- Corruption of the NetWorker resource file, especially the group resource, causes inactivity timeout errors on occasion. Try deleting the failing group's resource and recreating it. You might also try recovering the NetWorker resource file from a time before the errors started.
- Corruption of NetWorker Client File Indexes or Media database is know to result in inactivity timeout errors. Run nsrck -L6 to attempt to repair indexes or run nsrck -L7 to recover indexes. For the media database, corruption might be fixed by recovering an older media database.
When all above recommendations have been exhausted, and before continuing with other troubleshooting, it is highly recommended to:
- Recycle NetWorker daemons, will clear caches that may be out of date.
- Reboot the server and client which will clear memory from any fragments of dead code and re-initialize all applications and the OS.