4 4 3:Configuring and Administering Si3 Deduplication Store by using CLI

From SEPsesam
Jump to: navigation, search
Other languages:
Deutsch • ‎English

Copyright © SEP AG 1999-2019. All rights reserved.

Any form of reproduction of the contents or parts of this manual is allowed only with the express written permission from SEP AG. When compiling and designing user documentation SEP AG uses great diligence and attempts to deliver accurate and correct information. However, SEP AG cannot issue a guarantee for the contents of this manual.

Docs latest icon.png Welcome to the latest SEP sesam documentation version 4.4.3/4.4.3 Beefalo. For previous documentation version(s), check documentation archive.


Overview

SEP sesam provides a target-based (Si3T) and source-based deduplication (Si3S). For details on deduplication concept and recommendations, see Deduplication.

Both, Si3T and Si3S require a configured Si3 deduplication store. Only one Si3 deduplication store can be configured on a server. A valid licence is required for each Si3 deduplication store. You can also configure an Si3 deduplication store by using GUI. For details, see Configuring Si3 Deduplication Store.

You can download SEP Tachometer to analyse the structure of your data and calculate potential savings with SEP sesam Si3 deduplication. Check SEP Tachometer.

Prerequisites

For the minimum Si3 hardware requirements that apply to SEP sesam Si3 deduplication server, see Hardware requirements. Keep in mind that these requirements represent the demand for deduplication only. In addition, the amount of memory for the operating system and other services should be taken into account.

In addition, the following prerequisites must be met to configure an Si3 deduplication store.

Additional RAM / CPU requirements

  • For details on the required Java version, see Java Compatibility Matrix. Si3 is not mandatory, therefore there is no dependency rule in the RPM/DEB packages for it.
  • When estimating the maximum size for a deduplication store, you have to ensure that there is enough space available for dedup trash or the deduplication store will run out of space. You should calculate the required disk space based on the representative sample of your full backup and add the amount of extra space equal to approx. 50% of the representative full backup.

Restriction

To avoid issues arising from combination of too large Si3 deduplication stores and inefficient hardware, the maximum initial Si3 deduplication store size is restricted to 40 TB since Tigon V2 (4.4.3.46). This restriction is valid when creating a new Si3 deduplication store in GUI. Note that customers with special requirements for larger Si3 deduplication store should contact SEP support to be able to increase the value up to an optimum size for their specific environments.

Required additional amount of RAM and CPU cores

The following tables show the required additional amount of RAM and CPU cores for one Si3 data store. The TB value is the capacity of the Si3 data store.

Information sign.png Note
It is not recommended to run Si3 deduplication (SEP sesam Server or RDS) on a virtual machine. If this is the case, like evaluation or test, consider to limit the capacity of Si3 data store to 100 GB thus ensuring normal VM operation. Have in mind that deduplication consumes a lot of server resources for reading, processing, and writing deduplicated data, therefore you should be aware of running Si3 on a VM deployment limitation.
Si3 data store capacity (check initial size restriction) RAM
<20 TB 16 GiB
20-40 TB 32 GiB

To find out how much RAM is required by Si3 at which capacity, enter the command sm_dedup_interface propose jvmconfig <Si3-CAPACITY> at an admin command line (you must set sesam profile to run the command). The MaxDirectMemorySize output is the required RAM value.

The following table shows the amount of CPU cores required for one Si3 data store. The TB value is the amount of backed up data (before deduplication)!

Backed up data (before dedup) CPU cores
10 TB 4
20 TB 4
40 TB 8
Information sign.png Note
Keep in mind that the stated requirements represent the demand for deduplication only. In addition to these requirements, the amount of memory for the operating system and other services should be taken into account.

Configuring using CLI

The index size (max_pages) and Java's RAM requirements are important parameters for the operation of an Si3 data store as both parameters are used during its creation.

The sm_dedup_interface command is used to configure the hardware for an Si3 data store server. You will have to provide the data store capacity (data store partition) size.

sm_dedup

Command: sm_dedup_interface propose jvmconfig <value in GB>

For example, for a data store partition of 50 TB (50000 GB), run the command:

sm_dedup_interface propose jvmconfig 50000

bigsrv1:~ # sm_dedup_interface propose jvmconfig 50000

JAVA_OPTS=-Xmx1333M -XX:MaxDirectMemorySize=11724M

These Java RAM parameters are automatically copied to the Si3 data store drive configuration file, which resides in the folder <SESAM_VAR>/ini/stpd_conf.

stpd_conf

The Si3 and stpd configuration is saved in an .ini file in the directory gv_rw_ini:stpd_conf.

The file name is derived from the hw_drives.device (DS@ds1_2), as is the case for every other DS device. Some information are duplicated because they are used both by the Si3 server and stpd.

bigsrv1:/var/opt/sesam/var/ini/stpd_conf # cat ds1_2.ini

 [DEDUP]
 Backend=dedup
 Hostname=localhost
 defaultRepoPath="/datastore/ds1/ds1"
 maxPages=481900
 port=11703
 sds_jvm_options="-Xmx1032M -XX:MaxDirectMemorySize=1355M"

 [DISK_STORE]
 Storage_Location=/datastore/ds1/ds1
 Size=1000GB
 backend=dedup
 hostname=localhost
 port=11703
sm.ini

The RAM parameters for Java can be manually set in the sm.ini file. They will override the automatically generated parameters from the drive .ini file. The recommended -Xmx value is ¼ (one quarter) of the available RAM. If 16 GB are available, for example, then at least 4 GB (4096 MB) should be configured for the Si3 data store.

To obtain the default parameters used by Java on the target system, run the command java -XX:+PrintFlagsFinal and search for MaxHeapSize (-> Xmx) and InitialHeapSize (-> Xms).

maxPages

The second parameter (max_pages) is directly related to the Java memory parameter. The RAM is required to hold the entire index (described by max_pages) in memory. The MaxDirectMemorySize depends directly on max_pages.

The max_pages value is stored in the SEP sesam database in the drive's hw_drives.block_size field and is dynamically increased whenever necessary. The parameter is calculated with (hw_drives.block_size (*100)) and then copied to the drive configuration .ini file.

Advanced CLI Administration

The two main maintenance tasks – garbage collection (gc) and file system check (fsck) – run automatically. They are started by sm_start during SEP sesam newday. You can check their status or start/stop the tasks manually.

sm_dedup_interface

Valid commands and usage

sm_dedup_interface -d <datastore> <command>

status

  - purge
  - objectinfo <remote filename>
  - put <input filename> <dest filename>
  - get <remote filename> <dest filename> [<bytes skipped then> [<bytes read at beginning>]]
  - delete <remote filename> [<filename 2>]*
  - getlabel
  - getuuid
  - list
  - fsck [start|stop|autopurge|status|incremental|purge now|dump status into <file>|fsck incr start from <file>]
  - gc <start|stop|status|result>
  - key <set <key> <value>|get <key>|list>
  - log@server <msg>
  - propose serverconfig <repository netto GiB>
  - propose jvmconfig <repository netto Gib>
  - snapshot
  - replicate from [-f] <remote hostname> <remote port> <remote filename>
  - replicate show
  - replicate abort <task id>

Most of the parameters are for internal use or for future use only.

gc start
  • Starts garbage collection.
  • Identifies unreferenced chunks and moves them to the trash.
  • Is started by SEP sesam with sm_start.
gc stop
  • Stops garbage collection.
  • Can be restarted later.
get
  • Reads an object (file, saveset) from the deduplication store.
  • '-' can be used to specify STDIN
put
  • Writes an object (file, saveset) to the deduplication store.
  • '-' can be used to specify STDOUT
status

Provides information about used space, saved data, label uuid and whether gc or fsck are currently running.

 bigsrv1:/var/opt/sesam/var/ini/stpd_conf # sm_dedup_interface -d ds1_2 status

INFO Successfully initialized i2dedup library version v2.1.0-SNAPSHOT5

 Server Status:
 Repository information:
 Version:                   2.1.1
 UUID:                      3b9ec2ae-34e1-11e3-b88b-001b2146
 Label:                     ds1
 Max Pages:                 481900
 Max Pages recommended:     154100 (-Xmx1010M -XX:MaxDirectMemorySize=603M)
 GC process status:         not running: GC finished.
 Fsck process status:       not running: Fsck finished. Interrupted: false. Total Runtime: 1296.68s
 Bytes in repository:           259.02 GB
 Bytes delete pending:            9.18 GB
Object information:
 Objects stored:                   258
 Data before deduplication:    1541.56 GB
 Data after  deduplication:      58.94 GB
 Overall DeDup ratio:            96.18 %
Key-Values:
 No keys stored.

The value Overall DeDup ratio shows by what percentage the stored data has been reduced.

fsck
  • Starts a data store check.
  • Must be started manually.
  • If the parameter autopurge is set, all corrupted objects will be deleted.
  • Note: Sesam doesn't get this information until now.
fsck status

Displays the current state or the state of the most recent data store check.

si3fix:/var/opt/sesam/var/log/sms # sm_dedup_interface -d Si3_5 fsck status

 INFO  Successfully initialized i2dedup library version v2.0.0-beta2
 Current fsck status:
 Message:       Logfile check progress: Bytes: 1270925865083/1512422546049 Throughput: 91.25 MiB/s
 Running:       yes
 Started:       2013-05-29 20:57:17
 Ended:         -
 Bytes Checked: 0
 Bytes Lost:    0
 Objects checked:
purge
  • Deletes all pages marked as obsolete by the last garbage collection (gc) run (empty trash).
  • Is started by sm_start after a SEP sesam day change.
  • getlabel and getuuid can be replaced with status

meteorologix:/var/opt/sesam/var/ini # sm_main reload sds

 2013-05-28 16:42:06: sm_main[5697] started
 2013-05-28 16:42:06: Arguments: sm_main reload sds
 2013-05-28 16:42:06: SDS Server: "java" -Xmx1700M -XX:MaxDirectMemorySize=900M -classpath "/opt/sesam/bin/sds/i2dedup-server.jar" \
 -Dlogback.configurationFile=/va   /opt/sesam/var/ini/sm_sdslog.xml -Dgv_rw_stpd=/var/opt/sesam/var/log/sms/ -Ddrive_num=31 \
 -Dconfig.inifile="/var/opt/sesam/var/ini/stpd_conf/SI3_31.ini"    i2.dedup.streaming.BinaryProtocolServer
  Requesting server shut down...
 2013-05-28 16:42:07.587 [main] INFO  i.d.streaming.BinaryProtocolServer$ - Welcome to SEP DeDup Service. Loading configuration...
 2013-05-28 16:42:07.637 [main] DEBUG i2.dedup.streaming.ServerOptions$ - Loaded configuration from ini file: dedup {
 backend=dedup
 hostname=localhost
 defaultRepoPath=/srv/5tb/data/defaultrepo/SI3
 maxPages=262143
 port=11732
 }

Logging

The logging function uses a relatively powerful logback library. For more information, see Logback Project. Note that this information is only for advanced users.

Logging info
  • gv_rw_ini:sm_sds.xml (/var/opt/sesam/var/ini/sm_sds.xml)
  • /var/opt/sesam/var/log/sms contains two log files:
    • sm_dedup_server_info-<drive>.log: INFO level and higher.
    • sm_dedup_server-<drive>.log: DEBUG and higher. This file will become quite large.
    • sm_dedup_gc-<drive>.log: garbage collection log.
    • sm_dedup_fsck-<drive>.log: file system check log.
  • Auto rotation if the log file size reaches 100 MB.

Files and directories

Objects

For every SEP sesam saveset, three objects (files) are stored in the Si3 store:

  • <ssid>.data
  • <ssid>.info
  • <ssid>.info2

The .data and .info file are identical to those of a normal data store. The .info2 file is required for the data to be appended to an Si3 object. All database information not available before the completion of a backup will be written to this file.

Directories

The path <repo root path>/Si3-POOL/Si3-POOL00001/ is a legacy SEP sesam data store path and has nothing to do with the Si3 store. It will be removed in a future release.

What is next?

After configuring the Si3 deduplication store, configure the media pools first then set up your backup strategy.