Introduction to Troubleshooting SAM-FS/QFS

This article provides a basic overview of SAM-FS/QFS Components, basic SAM-FS/QFS Troubleshooting and a list of SAM-FS/QFS resouces.

What this article is not is an end-to-end tutorial on the SAM-FS/QFS product suite.

Areas covered are:

  • Introduction
  • Product Overview
  • SAM-FS/QFS Components
  • Basic Troubleshooting
  • Resources

Introduction

What is SAM-FS/QFS?

The Sun Storage Archive Manager (SAMFS/QFS) product suite enables you to archive file system data. The SAMFS/QFS environment includes a storage and archive manager along with Sun QFS file system software. The SAM-QFS software enables data to be archived to automated libraries at device-rated speeds. In addition, data can be archived to files in another file system through a process known as disk archiving. You can archive the data on an as-needed basis, or you can define policies that determine when data should be archived. You can also set specific schedules for when to archive data. You are presented with a standard file system interface and can read and write files as though they were all on primary disk storage.

Brief history of SAM-FS/QFS

  • SAMFS/QFS, was originally developed by LSC Inc
  • LSC was acquired by Sun Microsystems in May 2001
  • The latest SAM-FS/QFS 4.0 was released in August 2002

Product Overview

SAM-FS/QFS has three base product configurations:

  1. SAM-FS — Storage Archive Manager integrated with a base file system.
  2. QFS — High performance full featured file system
  3. SAM-QFS — Storage Archive Manager software which is integrated with the QFS filesystem.

QFS

QFS is an enhancement over the basic filesystem. It provides:

  • Separate partition for metadata
  • Shared filesystem (distributed) in a SAN environment.
  • Multi-reader capability
  • Advanced tunables
  • Available standalone or with integrated Storage Archive Manager (SAM)

The Storage Archive Manager (SAM)

The three primary components of SAM:

  1. Archiving — creating tape or disk tar copies Not data migration!
  2. Releasing — freeing “disk cache” space Not file deletion!
  3. Staging — retrieving file data from archive copy Uses existing inode

Components

Where are the SAM-FS/QFS components?

  • Executables:
    • /opt/SUNWsamfs/sbin
    • /opt/SUNWsamfs/bin
  • Configuration — /etc/opt/SUNWsamfs
  • Log — /var/opt/SUNWsamfs & /var/adm
  • License — /etc/opt/SUNWsamfs
  • Example Configuration — /opt/SUNWsamfs/examples

Configuration & Installation Troubleshooting

Are configuration files properly modified?

  • Configuration files:
    • /kernel/drv/st.conf
    • /kernel/drv/samst.conf
    • /etc/opt/SUNWsamfs/inquiry.conf
  • master configuration file (mcf) errors

Using sam-fsd command to find problems

Are tape drives “matched” properly?

  • Tape Drive order must be defined in the “mcf” file the same as the robot controller defines them.
  • Tape Drive order seen by robot's controller may not be the same as the order of the SCSI targets.
  • Tape Drives will not “come ready” if they are defined incorrectly in the “mcf” file.
  • Tape Drives may “fail to unload “ if they are defined incorrectly in the “mcf” file.

License related problems?

Samfs-QFS 4.0 requires a new license key:

  • A license for SAM-FS/QFS 3.3 or 3.5 will not work on 4.0.
  • License key information must be stored in the /etc/opt/SUNWsamfs/LICENSE.4.0 file
  • The License key information must match the correct robot and media type in the configuration.
  • The License file must NOT contain any extra lines or control characters.

Troubleshooting SAM-FS Operations

  • Identifying the problem(s)?
    • Log Files
      • /var/opt/SUNWsamfs
      • /var/adm/messages
      • Samu display (CLI Interface) samcmd
    • Are there any debug options in Samfs that could assist in problem determation?
    • Is the license key expired?
    • Are there any hardware errors?
  • Troubleshooting Daemon Operations
    • What daemons are running?
      • sam-initd, sam-archiverd, sam-arfind, sam-stagerd, sam-genericd, sam-robotsd,sam-ftpd, sam-fsd,sam-scannerd, and sam-stagealld
    • Are there any duplicate or defunct processes?
    • Enable daemon tracing
      • /etc/opt/SUNWsamfs/defaults.conf
      • /var/opt/SUNWsamfs/trace

Troubleshooting the Archiver

For archiving issues, ask the following:

  • Are there errors in the archiver.cmd file?
  • Is media available?
  • Is the archiver running?
  • Does the site have a valid license?
  • Are there files in the file system that need to be archived?
  • Do files have to be staged before archive copies can be made? Is the file system full? Are any of the drives/robot/media reporting problems?
  1. Run the command archiver -lv to check the /etc/opt/SUNWsamfs/archiver.cmd file for possible misconfiguration in syntax.
  2. Check the sam-log for a report of errors. Default location of the file is /var/adm/sam-log
  3. Monitor via the samu 'a' display.
  4. Truss the sam-archiverd process.
  5. Monitor via the samu 'l' display for possible license errors.

Troubleshooting the Releaser

  • Are files being archived?
  • Is the file system large enough to accommodate active files?
  • Is associative staging set?
  • Are the high/low watermarks set correctly?
  • Are large files busy?
  1. Use sfind command to find files
  2. Enable releaser logging /etc/opt/SUNWsamfs/releaser.cmd
  3. Check samu displays:
    • samcmd a
    • samcmd m
  4. Check the archiver trace file /var/opt/SUNWsamfs/sam-archiverd

Troubleshooting the Stager

  • Are the stager daemons running?
  • Have archived copies been marked damaged?
  • Is the VSN present in the robot?
  • Can file be accessed using disaster recovery techniques?
  • Are any of the drives/robot/media reporting problems?
  • Has the VSN been drained of all archived copies?
  • Is the archived copy associated an orphan file?
  • Is an archive copy marked damaged?
  • Check the recycler log file for possible errors?
  • Use the unrearch command
  • Is the stage queue large enough (maxactive) /etc/opt/SUNWsamfs/stager.cmd

Troubleshooting the File System

  • What if the filesystem(s) become corrupt?
    • What are the steps to recover your data?
  • How do you restore files and/or directories?
  • How do you correct damaged files?

File system tools:

  • samfsck — file system checking/repair
  • samncheck — find a filename for inode #
  • samfsinfo — basic file system info
  • samfsconfig — search for superblocks
  • samtrace — file system tracing

Disaster Recovery

Recommended Disaster Recovery techniques:

  • Obtain position from sls -D display?
  • Use the archiver log to obtain the position?
  • To retrieve file use the needed samfs commands? (for example: dd, star, recover.sh, restore.sh, tarback.sh, samfsdump, samfsrestore, qfsdump, qfsrestore
  • Retrieve the file(s) into a Samfs Filesystem?
  • Place the file in a temporary directory?

Additional Resources

Summary

In this article, we gave a brief history of SAM-FS/QFS. We learned the basic product components and covered basic troubleshooting and avilable resources.