Troubleshooting the ZFS Storage Appliance Network Configuration Problems
Networking problems exhibit themselves in many forms, ranging from network connectivity to network congestion problems. So before starting to change various configuration settings, it is important to understand the nature of the problem and have an indication of where it originates. The ZFSSA offers status and diagnostic information on various levels, and its logs are one of the first resources to consult.
Other sources of information are event logs and system messages on the client side. Check for any network-related errors or warning messages that might give a clue about the issue you are investigating.
From the servers/clients side, check simple connectivity problems using the well-known
traceroute utilities. They can reveal connectivity issues, DNS, and/or routing problems. The
ping utility also can help investigation of any latency-related performance problems.
Performance problems often turn out to be more difficult to deal with. Start walking through the whole connectivity chain to see if there is an element that could act as a bottleneck in the I/O traffic between the data repository and the client's application. Check for any element that could be “oversubscribed” in terms of network bandwidth capacity. The ZFSSA Analytics option is a good tool to use. Start investigating the main components in the system, like load patterns per share, per client, or per network connections. Is a client or client's application behaving abnormally? Are all network ports fully utilized? Are storage pools reaching 80 percent to 90 percent usage levels?
Information to answer any of these questions can be quickly found using analytics.
For detailed performance analysis, Oracle's performance tools like
Vdbench, a feature of ZFS Storage Appliance, and SWAT can be used as load generation and analysis tools in combination with the appliance analytics.
Vdbench can be used to set up specific workload definitions to be used on specific data shares on the ZFSSA to analyze system behavior.
The following flowchart provides some common issues and resources to consult related to these issues. Oracle's support website has numerous documents related to specific types of network problems, and describes how to diagnose and resolve them.