Source:Configuring and Administering Si3 Deduplication Store by using CLI: Difference between revisions

From SEPsesam
m (Added comment/info about transclusion to {{anchor|sm_dedup}})
(Marked this version for translation)
Line 131: Line 131:
<translate>===={{anchor|sm_dedup}}sm_dedup_interface ==== <!--T:37-->  
<translate>===={{anchor|sm_dedup}}sm_dedup_interface ==== <!--T:37-->  


<!--T:113-->
<!--This part is now transcluded to the Si3 Deduplication Troubleshooting article. Plz check if this makes sense.-->
<!--This part is now transcluded to the Si3 Deduplication Troubleshooting article. Plz check if this makes sense.-->



Revision as of 11:05, 17 May 2022

Other languages:

Template:Copyright SEP AG en

Docs latest icon.png Welcome to the latest SEP sesam documentation version 4.4.3 Beefalo/5.0.0 Jaglion. For previous documentation version(s), check documentation archive.


Overview

SEP sesam provides a target-based (Si3T) and source-based deduplication (Si3S). For details on the deduplication concept and recommendations, see Deduplication.

Both, Si3T and Si3S require a configured Si3 deduplication store. As of SEP sesam v. 5.0.0 Jaglion, a new generation Si3 NG deduplication store can be used. Compared to the "old" Si3 store type, Si3 NG offers significantly higher performance for backup, restore and migration, as well as direct backup to S3 cloud, resulting in improved performance, scaling and resource savings.

Typically, only one Si3 deduplication store can be configured on a server. However, since a direct upgrade from the old Si3 to Si3 NG is not supported, you can replicate from Si3 to Si3 NG. For this purpose, you can also configure a new Si3-NG and an old Si3 in parallel on the same host by enabling the key enable_gui_allow_multi_dedup. For details, see Enabling Si3 NG setup on the same host.

  • A valid licence is required for each Si3 deduplication store.

Prerequisites

For the minimum Si3 hardware requirements that apply to SEP sesam Si3 deduplication server, see Hardware requirements. Keep in mind that these requirements represent the demand for deduplication only. In addition, the amount of memory for the operating system and other services should be taken into account.

In addition, the following prerequisites must be met to configure a Si3 deduplication store.

Si3 Deduplication Hardware Requirements/en

Using CLI for Si3 data store configuration

SEP sesam provides command utilities for configuring and managing Si3 data stores. The following section provides some examples of commands and syntax.

Information sign.png Note
You must have SEP sesam administrator privileges to run SEP sesam CLI commands and use the command prompt as an administrator. All commands are run from the <SESAM_ROOT>/bin/sesam/ directory. If you want to execute SEP sesam commands globally (and not from the actual run directory), set the SEP sesam profile as described in What happens when I set a profile?.

The index size (max_pages) and Java's RAM requirements are important parameters for the operation of a Si3 data store as both parameters are used during its creation.

The sm_dedup_interface command is used to configure the hardware for a Si3 data store server. More details are available below in the section sm_dedup_interface.

stpd_conf

The Si3 and stpd configuration is stored in an .ini file in the directory gv_rw_ini:stpd_conf.

The file name is derived from the hw_drives.device (DS@ds1_2), as with any other DS device. Some information is duplicated because it is used both by both the Si3 server and stpd.

bigsrv1:/var/opt/sesam/var/ini/stpd_conf # cat ds1_2.ini
 [DEDUP]
 Backend=dedup
 Hostname=localhost
 defaultRepoPath="/datastore/ds1/ds1"
 maxPages=481900
 port=11703
 sds_jvm_options="-Xmx1032M -XX:MaxDirectMemorySize=1355M"

 [DISK_STORE]
 Storage_Location=/datastore/ds1/ds1
 Size=1000GB
 backend=dedup
 hostname=localhost
 port=11703
sm.ini

The RAM parameters for Java can be manually set in the sm.ini file. They override the automatically generated parameters from the drive .ini file. The recommended -Xmx value is ¼ (one quarter) of the available RAM. For example, if 16 GB is available, at least 4 GB (4096 MB) should be configured for the Si3 data store.

To obtain the default parameters used by Java on the target system, run the command java -XX:+PrintFlagsFinal and search for MaxHeapSize (-> Xmx) and InitialHeapSize (-> Xms).

max_pages

The second parameter (max_pages) is directly related to the Java memory parameter. The RAM must hold the entire index (described by max_pages) in memory. The MaxDirectMemorySize depends directly on max_pages.

The max_pages value is stored in the SEP sesam database in the drive's hw_drives.block_size field and is dynamically increased whenever necessary. The parameter is calculated with (hw_drives.block_size (*100)) and then copied to the drive configuration .ini file. If you have problems with the index, please contact SEP sesam support.

Advanced CLI Administration

Si3's main maintenance tasks are garbage collection (gc) and file system check (fsck) and run automatically. Garbage collection (gc) is started by sm_start during SEP sesam newday. The file system check (fsck) is carried out at regular intervals (again and again) and automatically with every backup.

The new generation of Si3 deduplication store, Si3 NG, has two types of file system check (fsck): object check (occk), which checks if the Si3 data part is still readable, and page check (pcck), which checks the physical data on the disk. All processes (gc, occk and pcck) can run simultaneously.

You can check their status or start/stop the tasks manually.

sm_dedup_interface

This is the main utility for configuring and managing data stores. Below is a list of some commands and their usage.

Information sign.png Note
Depending on the deduplication store used, Si3 or Si3-NG, some of the commands may be slightly different. When relevant, both command versions are described.
sm_dedup_interface -d <datastore> <command>
  - purge
  - objectinfo <remote filename>
  - put <input filename> <dest filename>
  - get <remote filename> <dest filename> [<bytes skipped then> [<bytes read at beginning>]]
  - delete <remote filename> [<filename 2>]*
  - getlabel
  - getuuid
  - list
  - fsck [start|stop|autopurge|status|incremental|purge now|dump status into <file>|fsck incr start from <file>]
  - gc <start|stop|status|result>
  - key <set <key> <value>|get <key>|list>
  - log@server <msg>
  - propose serverconfig <repository netto GiB>
  - propose jvmconfig <repository netto Gib> (for Si3 store; slightly different usage for Si3 NG, see Notea)
  - snapshot
  - replicate from [-f] <remote hostname> <remote port> <remote filename>
  - replicate show
  - replicate abort <task id>
Notea

Depending on the deduplication store used, Si3 or Si3-NG, the command to find out how much RAM is needed at what capacity of Si3/Si3-NG differs slightly. Example:

Si3-NG
Use the command sm_dedup_interface -T dedup2 propose jvmconfig <Si3_capacity>.
Si3
Use the command sm_dedup_interface propose jvmconfig <Si3_capacity>.

The output of MaxDirectMemorySize is the required RAM value.
Note, however, that SEP sesam calculates the RAM consumption and uses these commands in the background. It is usually not needed to set the values manually. These manual changes are overwritten with the next drive configuration.
The index calculation is also associated with the command. If the index grows and is 95% full, backups can no longer be performed. The RAM must hold the entire index (described by max_pages) in memory. The MaxDirectMemorySize depends directly on max_pages. To solve the problems with the growing index, refer to Si3 Deduplication Troubleshooting.

Specific options

Most of the parameters are for internal use only.

status

Provides information about used space, stored data, label uuid and running processes (gc or fsck), etc.

The value Overall dedup ratio shows by how many percent the stored data has been reduced.

gc start
  • Starts the garbage collection.
  • Identifies unreferenced chunks and moves them to the trash.
  • Is started by SEP sesam with sm_start.
gc stop
  • Stops the garbage collection.
  • Can be restarted later.
gc status
Si3 NG gc status output example
sm_dedup_interface -d 3 gc status
Current gc status:
 State:                       Finished
 Started:                     2022-03-07 08:10:56
 Ended:                       2022-03-07 10:00:15
 Message:                     Sweep Phase: swept 97124/97124 pages [deleted=2194,rewritten=13611,skipped=79550,locked=1769,missing=0]
STATUS=SUCCESS MSG=Sweep Phase: swept 97124/97124 pages [deleted=2194,rewritten=13611,skipped=79550,locked=1769,missing=0]
get
  • Reads an object (file, saveset) from the deduplication store.
  • '-' can be used to specify STDIN.
put
  • Writes an object (file, saveset) to the deduplication store.
  • '-' can be used to specify STDOUT.
fsck
  • Starts a data store check.
  • Must be started manually.
  • If the parameter autopurge is set, all corrupted objects are deleted.
fsck status
Displays the current state or the state of the most recent data store check.

The Si3 NG deduplication store has two types of fsck: object check (occk), which checks if the Si3 data part is still readable, and page check (pcck), which checks the physical data on the disk. All processes (gc, occk and pcck) can run simultaneously.

purge
  • Deletes all pages marked as obsolete (empty trash) by the last the last run of garbage collection (gc).
  • Is started by sm_start after a SEP sesam day change.
  • getlabel and getuuid can be replaced with status

Logging

The logging function uses a relatively powerful logback library. For more information, see Logback Project. Note that this information is intended for advanced users only.

Logging info
  • gv_rw_ini:sm_sds.xml (/var/opt/sesam/var/ini/sm_sds.xml)
  • /var/opt/sesam/var/log/sms contains two log files:
    • sm_dedup_server_info-<drive>.log: INFO level and higher.
    • sm_dedup_server-<drive>.log: DEBUG and higher. This file will become quite large.
    • sm_dedup_gc-<drive>.log: garbage collection log.
    • sm_dedup_fsck-<drive>.log: file system check log.
  • Auto rotation if the log file size reaches 100 MB.

Files and directories

Objects

For every SEP sesam saveset, three objects (files) are stored in the Si3 store:

  • <ssid>.data
  • <ssid>.info
  • <ssid>.info2

The .data and .info files are identical to those of a normal data store. The .info2 file is required for the data to be appended to a Si3 object. All database information that is not available before a backup is completed is written to this file.

Directories

The path <repo root path>/Si3-POOL/Si3-POOL00001/ is a legacy SEP sesam data store path and has nothing to do with the Si3 store. It will be removed the future.

What is next?

After configuring the Si3/Si3 NG deduplication store, first configure the media pool(s) and then set up your backup strategy.