Scheduling cleaning on a Data Domain system - Best Practices
EMC recommends running a clean operation after the first full backup to a data domain System. The initial local compression on a full backup is generally a factor of 1.5 to 2.5. An immediate clean operation gives additional compression by another factor of 1.15 to 1.2 and reclaims a corresponding amount of disk space.
A default schedule runs the clean operation every Tuesday at 6 a.m. (tue 0600) ) with 50% throttle.
To increase file system availability, and if the data domain System is not short on disk space, consider changing the schedule to clean less often.
Issues that can affect the cleaning process:
- If system is filling up, changing default values to more frequent or aggressive cleaning cycles should not be used to compensate this. Running cleaning every day will fragment the data. E.g. read speeds can be severely impaired. Global compression algorithm is dependent on good locality during writes so too frequent clean cycle will in addition bring de-duplication numbers down.
- Cleaning is a filesystem operation that will impact overall filesystem performance while it is running. Changing cleaning throttle higher from default of 50 will have impact performance during active cleaning cycle as the cleaning process will consume more resources.
- Changing the local compression algorithm will cause following cleaning cycle to run significantly longer as all existing data needs to be read, uncompressed and compressed again.
- Any operation that shuts down the data domain System filesystem or powers off the device (a system power-off , reboot or filesystem disable- command) stops the clean operation. The clean does not automatically continue when the system and file system starts again.
- Replication between data domain systems can affect filesys clean operations. If a source data domain system receives large amounts of new or changed data while disabled or disconnected, resuming replication may significantly slow down filesys clean operations.
- If the directory replication is running behind e.g. due insufficient network bandwidth between the replication pairs (resulting to a replication lag) cleaning may not be able to run fully. This condition requires either replication break (and resync once cleaning has ran) or replication lag to catch up (e.g. increasing network link or writing less new data to source directory).
A data domain system that is full may need multiple clean operations to clean 100% of the file system, especially when one or more external shelves are attached. Depending on the type of data stored, such as when using markers for specific backup software (filesys option set marker-type ... ), the file system may never report 100% cleaned. The total space cleaned may always be a few percentage points less than 100.
With collection replication, the clean operation does not run on the destination. With directory replication, the clean operation needs to be run on both the source and destination data domain systems.
Viewing current scheduled cleaning
To display the current date and time for the clean operation, use the filesys clean show schedule operation:
# filesys clean show schedule
Welcome to Data domain OS 22.214.171.124-71058
sysadmin@ddtst01# filesys clean show schedule
Filesystem cleaing is scheduled to run "Sun, Tue" at "0600".
To display the throttle setting for cleaning operations, use the filesys clean show throttle operation. Changes to the throttle setting will take effect without restarting cleaning:
sysadmin@ddtst01# filesys clean show throttle
50 Percent Throttle
Changing the scheduled cleaning
To change the date and time when clean runs automatically, use the clean set schedule operation. The default time is Tuesday at 6 a.m. (tue 0600).
NOTE: The operation is available only to administrative users.
- Daily runs the operation every day at the given time (Not recommended).
- Monthly starts on a given day or days (from 1 to 31) at the given time.
- Never turns off the clean process and does not take a qualifier.
- With the day-name qualifier, the operation runs on the given day(s) at the given time. A day-name is three letters (such as mon for Monday). Use a dash (-) between days for a range of days.
For example: tue-fri.
- Time is 24-hour military time. 2400 is not a valid time. mon 0000 is midnight between Sunday night and Monday morning.
- The most recent invocation of the scheduling operation cancels the previous setting.
The command syntax is:
filesys clean set schedule daily time
filesys clean set schedule monthly day-numeric-1 [,day-numeric-2,...]time
filesys clean set schedule never
filesys clean set schedule day-name-1[,day-name-2,...]timeFile System Management 223 Clean Operations
For example, the following command runs the operation automatically every Tuesday at 4 p.m.:
# filesys clean set schedule tue 1600
sysadmin@ddtst01# filesys clean set schedule tue 1600
Filesystem cleaning is scheduled to run "Tue" at "1600".
To run the operation more than once in a month, set multiple days in one command. For example, to run the operation on the first and fifteenth of the month at 4 p.m.:
# filesys clean set schedule monthly 1,15 1600
Filesystem cleaning is scheduled to run "1, 15" at "1600".
To set the clean schedule to the default of Tuesday at 6 a.m. (tue 0600), the default throttle of 50%, or both, use the filesys clean reset operation.
# filesys clean reset all
Filesystem cleaning throttle reset to default.
Filesystem cleaning schedule reset to default.