Sun SAM-FS / QFS - Archiver Configuration overview

Default archiver behavior

  • Archive age of four minutes
  • One archive copy
  • Archive copies written to any available media
  • Interval is 10 minutes

The sam-archiverd program is responsible for automatically archiving files in a SAM-FS file system. The archiver is started when a SAM-FS file system is mounted. Without the intervention of an administrator, the archiver will archive all files resident under SAM-FS mount points.

By default, the archiver will make one archive copy of any file that reaches an archive age of four minutes. The archive age of a file is based on the last modification time.

In addition to archiving data files, the archiver also copies the data necessary for SAMFS file system operation. This data consists of directories and symbolic link information. This information is referred to as “file system data”. File system data consists of a portion of the information necessary to recover a SAM-FS file system from a disaster.

Archiver Processes

The archiver consists of three programs: sam-archiverd, sam-arfind, and sam-arcopy. The sam-archiverd process is responsible for scheduling archiving activities. sam-arfind identifies files to be archived, groups the files according to archiving policies, and associates the groups of files with removable media. sam-arcopy copies the files from disk cache to selected media.

The sam-archiverd is started by sam-init. It reads the archiver.cmd file (if one is available) and builds the tables necessary to control archiving. The sam-archiverd starts a sam-arfind process for each mounted file system and then monitors sam-arfind and processes signals from an operator or other processes.

sam-arfind examines the file system, identifying appropriate groups for each file. The archive status and file’s archive age are used to determine if the file should be archived. sam-arfind creates a list of files, associated with their groups, to be archived.

sam-arfind sorts the list of files by size, largest to smallest and creates the archive images, the batch of files that will be written to removable media. sam-arfind also associates VSNs for each archive image and determines if a drive is available for that VSN. If so, an sam-arcopy is started for that archive image, using the assigned VSN. sam-arfind transfers the list of files to be archived to sam-arcopy.

archiver.cmd

The archiver.cmd file

  • Controls action of the archiver
  • Stored in /etc/opt/SUNWsamfs/archiver.cmd
  • Checked by archiver(1M) once per minute

The archiver.cmd defines:

  • General Commands
  • Archive Sets and Archive Copies
  • Archive Set Parameters
  • Volume Serial Name Associations
  • Volume Serial Name Pools

Without action of the administrator, the sam-archiverd will archive all files under all SAM-FS mount points. By default, the archiver will make one archive copy of any file that reaches an archive age of four minutes. However, the samarchiverd can be controlled by configuring the file /etc/opt/SUNWsamfs/archiver.cmd. This file also defines the files in the SAMFS file systems that should be grouped together for archiving, called archive sets, how many copies of to make of each archive set, and to what media the archive sets should be copied.

archiver.cmd - Types of lines

Table 1 — Type of lines in the archiver.cmd file
General commands Defines overall rules for archiver to follow such as file systems to archive, time between archive operations, and name of file to log archiving activity. General commands are identified by the “=” character in the second field or by no additional fields
Archive Set Assignments Defines characteristics of the files that should be grouped together for archive purposes. The statements that describe file characteristics are patterned after the UNIX ‘find’ command. There may be several Archive Set assignments per file system.
Archive Copy definitions Determines when the archive copies are made for the files matching the Archive Set file characteristics. Lines begin with a digit (1, 2, 3, or 4) which is the copy number. Archive age is based on a file’s modification date and time.
Archive set parameters Additional parameters that control the processing of specific Archive Sets. The Archive Set parameters will be discussed as part of the advanced archiver commands.
Volume Serial Name associations Defines the type of media to which a specific archive set copy will be copied. VSN associations are defined after all archive sets are defined.
VSN pool definitions Defines a named collection of VSNs. Pools are useful for defining a group of media that may be available to an Archive Set. Pools can act as a buffer for assigning VSNs to Archive Sets.

archiver.cmd - General commands

Example #1

wait
fs = samfs1
interval = 30m
logfile = /var/adm/archive/sam1.log
< information deleted >
fs = samfs2
interval = 60m
logfile = /var/adm/archive/sam2.log

Delaying Archiver Startup

By default, the sam-archiverd begins archiving when started by sam-init(1M). The wait command causes the sam-archiverd to wait for a SIGUSR1 signal to begin normal archiver operations.

Specifying an Archive Interval

The sam-archiverd executes periodically to examine the status of all mounted SAMFS file systems. The timing is controlled by the archive interval. The archive interval is the time between complete archive operations, scanning and copying to removable media, on each file system. In the example above, the interval has been changed to 30 minutes. The default interval is 10 minutes.

archiver.cmd - General commands

Specifying an Archiver Log File

The sam-archiverd can produce a log file that contains information about each file archived. The log file is a continuous record of archival action and may be used to locate earlier copies of files for traditional backup purposes. In the example, the log file has been set to /var/tmp/archive.log. By default, this file is not produced.

Controlling Archiving for a Specific File System

By default, archiving controls are global, applying to all file systems. However, the system administrator can confine some controls to individual file systems. In the example, the file system, samfs1, has been selected. The archive interval, archive log, and archive set association commands that occur after this command applies only to the specified file system until another ”fs =” command is encountered. A command for a specific file system will always override a general command that applies to all file systems listed in the archiver.cmd file.

archiver.cmd - Archive set assignments

Syntax

archive_set pathname [search_criteria]

Search criteria

File Size (-minsize or -maxsize)
Owner and Group
Filenames via Regular Expressions
-name

The Archive Set assignments group files with similar characteristics into archive sets. The syntax is patterned after find command. If a file in or below the stated pathname matches the search criteria, it becomes a member of that archive set. The archive set name is the first field, followed by the pathname of a directory.

An archive set name is required. Archive set names are site defined but generally indicate the characteristics of the files that belong to the archive set. Archive set names are restricted to letters in the alphabet, numbers, and the underscore character, “_”. The first character in an archive set name must be an alpha character.

There is a special reserved archive set name called “no_archive”. Files that belong to a no_archive archive set will not be archived. You can specify multiple no_archive sets, each with different pathnames and search criteria. For example:

no_archive tmp

pathname

The pathname is required. It is relative to the mount point of the file system and indicates to the archiver where to start its search for files that match the search criteria for this archive set. Files in the directory specified by pathname and all subdirectories are considered for inclusion in the archive set. If the path is to include all of the files in a file system, use a period (.) for the pathname. Absolute pathnames are not allowed.

search_criteria

If a file within the specified pathname matches the search criteria, it becomes a member of that archive set.

archiver.cmd - Available search criteria

Table 2 — archiver.cmd Available search criteria
-minsize The minimum size of a file may be used to determine archive set membership using the –minsize characteristic. The file size may be specified using the suffix letters b (bytes), k(Kbytes), m(Mbytes), g(Gbytes), or t(Tbytes). For example:
big_files . -minsize 500m
In this example, all files in a file system that are at least 500 Mbytes belong to the archive set “big_files”.
-maxsize The maximum size of a file may be used to determine archive set membership using the –maxsize characteristic.For example:
small_files . -maxsize 10m
In this example, all files in a file system that are at 10 Mbytes or less belong to the archive set “small_files”.
-owner The owner of a file may be used to determine archive set membership. The –owner characteristic refers to a user’s login id. For example:
adm_set . -owner sysadmin
In this example, all files in a file system that are owned by user “sysadmin” belong to the archive set “adm_set”.
-group The group membership of a file may be used to determine archive set membership using the –group characteristic. The –group refers to a group name. For example:
marketing_set . -group marketing
In this example, all files in a file system that are belong to the group marketing belong to the archive set “marketing_set”.
-name Names of files to be included in an archive set may be specified using regular expressions. The –name characteristic specifies that any file matching a regular expression is a member of the archive set. For example:
images . -name \.gif$
In this example, all files in a file system that end with “.gif” belong to the archive set “images”.

archver.cmd - Example #2

# wait
fs = samfs1
interval = 30m
logfile = /var/tmp/archive.log
# archive set assignments
no_archive tmp
big_files . -minsize 2g
marketing . -group marketing
gif . -name \.gif$
all .

archiver.cmd - Member set conflicts

It is possible for a file to match more than one of the archive set search criteria. Here are the rules used to determine to which archive set the file belongs:

  1. The file can only belong to one archive set. Once it belongs to this archive set it cannot belong to another archive set.
  2. The file belongs to the first archive set in which the file meets the search criteria.
  3. Local archive set search criteria (defined after an fs = statement) will be evaluated before global archive set search criteria.

archiver.cmd - Archive copy definitiions

Syntax

copynum archiver-age

Example:

big_files . -minsize 2g
1 10m
2 10m
marketing . -group marketing
1 10m
2 1h

The Archive Copy definition tells the archiver how many copies it should make of files in an archive set. Archive Copy definition also specifies the archive age files should reach before being candidates for archiving. The archive age is the time since last modification. Up to four archive copies can be made. If you do not specify any archive copies, a single archive copy is made when files in the archive set reach an archive age of four minutes.

If you need more than one archive copy, all copies including the first copy must be specified using archive copy definitions. These definitions are placed on a separate line immediately after the archive set to which they will be applied.

Specifying the Copy Number

Archive copy definitions begin with numerals, (1 - 4), that indicate the copy number.

Setting the Archive Age

Setting an archive age allows a file to be completely closed prior to attempting to archive it. The archive age tells the sam-archiverd(1M) how soon to archive the files. The archive age follows the archive copy number. It is a numeral followed by suffix character that indicates the unit of time. The archive age may be specified with the unit s(seconds), m(minutes), h(hours), d(days), w(weeks), or y(years). In this example, the files in the directory data will be archived when they reach an archive age of one hour.

ex_set data
1 1h

More than one Copy of File System Data

If more than one copy of file system data is required, copy definitions may be placed in the command file immediately after an ”fs =” command.

fs = samfs1
1 4m
2 1h

Copy 1 of the file system data for the samfs1 file system will be made when the files reach an archive age of four minutes. Copy 2 will be made when files reach an archive age of one hour.

archiver.cmd - VSN associations

      Introduced by

vsns

      and ends with

endvsns

  • Associates Archive Sets and Archive Copies with a Media Device
  • Uses Regular Expressions

Syntax:

archive_set.copy_no media_type vsn_exp

The Volume Serial Name (VSN) is written on the media with the commands tplabel or odlabel. The VSN on tape must be one to six characters using upper case letters, 0-9 or any of the following special characters: !”%&’()*+,-./:;<->?_. The VSN on optical media can be anything from one to 31 characters.

The archive set to VSN associations are defined after all other commands. The commands are introduced by the vsns keyword and ended with the endvsns keyword. An association requires at least three fields: the archive set name and copy, the media type, and at least one VSN.

Example:

vsns
samfs1.1 mo OD032[0-9]
big_files.1 d3 0000[0-5][0-9]
big_files.2 lt 0007[0-9][0-9]
marketing.1 lt 1000[0-5][0-9]
marketing.2 lt 2000[0-5][0-9]
endvsns

VSNs are noted by the vsn_exp which is a regular expression as described in regexp(3). (Regular expressions do not follow the same conventions as wildcards.) When removable media are needed by the archiver for the archive set, each VSN of the selected media in all robots (and manually mounted drives) is examined to determine if it would satisfy any VSN expression. The archiver selects the first VSN that matches the expression and contains enough space for the archive copy operation.

Usually the first association listed is the VSN association for the file system. This will hold all file system structure information not included in any of the archive sets/copy sets.

archiver.cmd - VSN pools

  • Introduced by vsnpools and ends with endvsnpools
  • Defines a named collection of VSNs available to one or more archive sets
  • Uses Regular Expressions

Syntax:

vsn_pool_name media_type vsn_exp

A VSN pool is a named collection of VSNs. Pools are useful for defining a group of media that may be available to an archive set. As VSNs are required for archiving, they are removed from the VSN pool. As such, VSN pools provide a useful buffer for assigning VSNs to archive sets

VSN pools can be used to define separate groups of VSNs to be used by departments within an organization, by users within a group, by data types, or any other site-defined grouping. Pools are assigned a name, media type, and set of VSNs. A scratch pool is a special kind of pool that is a catch-all set of VSNs that can be used when specific VSNs in a VSN association are exhausted or when another VSN pool is exhausted.

Example:

vsnpools
users_pool lt C0151[3-9] C0152[0-9] C0153[0-6]
data_pool lt C0037[1-9] C003[8-9][0-9] C00[4-5][0-9][0-9]
proj_pool lt A0066[7-9] A006[7-9][0-9] A007[0-5][0-9]
scratch_pool lt A0013[1-9] A001[4-9][0-9] A002[0-9][0-9]
endvsnpools

Associating VSN pools with archive sets

As discussed in the topic on VSN associations, collections of VSNs are associated with archive sets via the archive set name and copy number. VSN pools are associated with archive sets in a similar manner, using the –pool parameter.

Following is an example vsns section from an archiver.cmd file. This example uses the four VSN pools defined in the example above. Note that if one of the three specific pools runs out of VSNs, the scratch_pool VSNs will be selected:

vsns
users.1 lt –pool users_pool –pool scratch_pool
data.1 lt –pool data_pool –pool scratch_pool
proj.1 lt –pool proj_pool –pool scratch_pool
endvsns

archiver.cmd - More examples

# wait
logfile = /var/adm/archiver.log
interval = 30m
fs = samfs1
1 1m
2 1m
arset1 testdir0
1 1m
2 3m
arset2 testdir1
1 1m
2 3m
vsnpools
scratch_pool ib 0000[1-2][0-9]
arset1_pool ib 0000[3-4][0-9]
endvsnpools
vsns
samfs1.1 ib A125.*
samfs1.2 ib 96[0-9]
arset1.1 ib -pool arset1_pool
arset1.2 ib 12345
arset2.1 ib .*47 -pool scratch_pool
arset2.2 ib 00123[0-9] -pool scratch_pool
endvsns

archiver.cmd - Advanced media commands

  • archmax = media target_size
  • drives = robot count
  • ovflmin = media minimum_size
  • trace = filename [event …]

Setting the maximum size of an archive image

The size of an archive image (the batch of files written to removable media) can affect the performance of a device (tape or optical drive) and how efficiently media is used (minimizing wasted space at the end of tape). A site can balance these two factors (speed of writing archives and full utilization of media space) with the archmax parameter.

The syntax of archmax is:

archmax = media target_size

media is two-character mnemonic from the mcf file. target_size is the maximum size, in bytes, of the archive image for that media. Files to be archived will be placed on media in a single archive image of a length that is less than or equal to target_size. If a single file is larger than the target_size, then this restriction does not apply.

Limiting drive resources used for archiving

This parameter allows a site to limit the number of drives (on a per robot basis) that will be used for archiving. The archiver will use only the number of drives specified to create archive copies. This parameter prevents the archiver from seizing all a robot’s drive resources, thereby interfering with staging. The syntax of the drives command is:

drives = robot count

robot is the robot family set name from the mcf file. count is the maximum number of drives (in this robot) that will be used for archiving.

Enabling volume spanning

The ovflmin parameter sets the minimum size of a file that will be allowed to span multiple VSNs. Files to be archived that are smaller than the ovflmin will be placed on a single VSN. Files larger than ovflmin will be allowed to span up to 16 VSNs. The syntax for ovflmin is:

ovflmin = media minimum_size

Tracing the activities of the archiver

The trace command allows you to specify the location of the archiver trace file. The archiver trace file contains a line for each event being traced. The “event” is the event to trace. The syntax for trace is:

trace = filename [ event …]

archiver.cmd - Advanced release and stage commands

Setting release attributes

The –release command sets the release attributes for all files that match the archive set characteristics. The syntax is:

archive_set pathname [- search_criteria] –release attributes

The attributes may be any of “a” upon archiving, “n” never, “p” partial.

Setting stage attributes

The –stage commands sets the stage attributes for all files that match the archive set characteristics. The syntax is:

archive_set pathname [- search_criteria] –stage attributes

The attributes may be “a” associative or “n” never.

Cause releasing

The –release command causes the disk space to be released for all files that match the archive set characteristics immediately after the specified archive copy is made.

big_files . –minsize 10G
1 –release 1h

In this example, all files in the file system that are 10Gbytes or larger will belong to an archive set call “big_files”. Files in this archive set will be archived when they reach an archive age of one hour. They will be released immediately after archiving.

Prevent releasing

The –norelease command prevents disk space from being released for all files that match the archive set characteristics until after the specified archive copy is made.

big_files . –minsize 10G
1 –norelease 1h
2 –norelease 1d

In this example, two archive copies will be made of files in the archive set “big_files”. The first archive copy will be made when the files reach an archive age of one hour. The second archive copy will be made when the files reach an archive age of one day. Release of files in this archive set is prohibited until after both archive copies have been made.

Specify an unarchive age

Unarchive deletes archive entries for all files that match the archive set characteristics for the specified archive copy when the unarchive age is reached.

big_files . –minsize 10G
1 1h 1w
2 1d
3 1w

In this example, three archive copies will be made of files in the archive set “big_files”. The first copy when files reach an archive age of one hour, the second copy when files reach an archive age of one day, and the third copy when files reach an archive age of one week. The first archive copy will be unarchived when the files reach an archive age of one week.

archiver.cmd - Archive set parameters

  • Introduced by params and ends with endparams
  • Assign multiple drives to an archive set
  • Specify associative archiving

Syntax:

archive_set.copy - param value

Archive Set Parameters control processing of archive set copies. The archive set parameters are a separate section in the archiver.cmd file, introduced by the keyword params. The section is ended by the endparams keyword.

Table 3 — archiver.cmd archive set parameters
-drives number
-drivemin min_size
-fillvsns No value- toggle on or off
-tapenonstop No value- toggle on or off
-join none | patch
-sort none | age | path | priority | size
-offline_copy none | stageahead | stageall | direct

-drives number

The archiver will usually use only one removable media drive to archive files in an archive set. When an archive set has a large number of files or large files, it may be advantageous to use more than one drive to archive the files. A site can control the number of drives used with the -drive parameter.

-drivemin min_size

The –drivemin parameter controls when the archiver will utilize multiple drives (as specified with –drives) to copy data to removable media drives. The min_size value indicates that multiple removable media drives will be used only if the amount of data to be archived is greater than the min_size value. The number of drives to be used in parallel will be the lesser of –drives or total_size/min_size.

-fillvsns

The default behavior of the archiver indicates that when a group of files is to be archived at the same time, a VSN with enough space for all the files will be selected. This default behavior may cause VSNs to not be filled to capacity. Selecting the –fillvsns parameter causes the archiver to attempt to fill VSNs by separating the group of files into smaller groups (smaller archive files).

-tapenonstop

By default, the archiver closes the removable media file between each archive file. This action causes the tape subsystem to write a tape mark followed by an EOF1 label and two tape marks. Before another archive file can be written, the tape must be positioned backward over the EOF1 label.

Using –tapenonstop eliminates repositioning and speeds writing archive files to tape

-join method

When an archive file is written to a VSN, files are written to an archive file in a manner that most efficiently packs the VSN with user files. The user files are sorted by size, largest first. Subsequently, when accessing files that are associated together in the same directory, you may see delays as the stage process repositions through a VSN to read the next file. To alleviate these delays, you may wish to archive files from the same directory paths contiguously within an archive file. This method of overriding the space efficiency algorithm to archive files from the same directory together is called associative archiving.

Associative archiving is useful in situations when you know that the contents of files will not be changing and you wish to access the group of files together at the same time. For example, you might use associative archiving at a hospital for accessing medical images. Images associated with the same patient may be kept in a directory and the doctor may wish to access all of these images together at one time. These static images can be more efficiently accessed if you archive them contiguously based upon their directory location rather than the size of the files. The -join path parameter allows these files to be archived contiguously within an archive set copy. For example:

patient_images.1 -join path

The -join path parameter guarantees that files in a directory will be archived contiguously within an archive set copy. This parameter overrides the archmax parameter and prohibits scanning subdirectories.

-sort method

It is possible to sort the files within an archive set copy by directory path, age or size (the age or size options are mutually exclusive). To sort an archive set, use the –sort parameter with the argument path, age or size, as follows:

radiology.1 –sort path
cardiac.2 -sort age
catscans.3 -sort size

The first example forces the archiver to sort an archive set copy called radiology.1 by the directory path of the files. This method has a benefit over the –join path parameter in that the archiver will traverse subdirectories and honor the archmax setting. The second example forces the archiver to sort an archive set copy called cardiac.2 by the age of the file, youngest to oldest. The third example forces the archive set copy called catscans to be sorted by the size of the file, largest to smallest.

-offline_copy method

This parameter specifies the method to be used when making archive copies of files that have been released from the disk cache. The default method is none.

  • none: The archiver requests files to be staged as needed for each archive file.
  • direct: Copy files directly from one VSN to another VSN without staging the file data back into the disk cache. The source VSN and destination VSN are different. This method requires two removable media drives.
  • stageahead: The archiver attempts to overlap archiving and staging operations. While one archive file is being written, the archiver is requesting that files be staged for the next archive file. Requires that enough space is available in the disk cache to support the staging activity and that two removable media drives are available.
  • stageall: The archiver stages all files before archiving. While only one removeable media drive is required, the system must have enough space to stage all files to be archived.
Table 4 — archiver.cmd example criteria for reserving VSNs
Reserve VSNs for every archive set reserve - set
Reserve for specific archive sets archive_set.copy –reserve set
Reserve by directory archive_set.copy –reserve dir
Reserve by owner (user id) archive_set.copy –reserve user
Reserve by group membership archive_set.copy –reserve group
Reserve by filesystem reserve = fs
Reserve by file system for specific archive sets (useful if an archive set name appears in more than one file system) archive_set.copy –reserve fs
Reserve by file system and archive set for a specific group archive_set.copy –reserve set –reserve group –reserve fs