Source:Deduplication: Difference between revisions

From SEPsesam
No edit summary
(Corrected links for list of licenses.)
(25 intermediate revisions by 2 users not shown)
Line 1: Line 1:
<noinclude>{{Copyright SEP AG|en}}
<noinclude><languages /><translate><!--T:1-->
{{Copyright SEP AG|en}}


{{Navigation_latest|release=4.4.3|link=[[SEP_sesam_Documentation#Previous versions|documentation archive]]}}<br />
<!--T:2-->
==Overview==
{{Navigation_latest|release=[[Special:MyLanguage/SEP_sesam_Release_Versions|4.4.3/4.4.3 ''Beefalo V2'']]|link=[[Special:MyLanguage/SEP_sesam_Documentation#previous|documentation archive]]}}</translate><br />
<div class="boilerplate metadata" id="Additional resources" style="background-color: #f0f0f0; color:#636f73; border: 1px ridge #cdd3db; margin: 0.5em; padding: 0.5em; float: right; width: 35%; "><center><b>Additional resources</b></center>
<translate>==Overview== <!--T:3--></translate>
<div class="boilerplate metadata" id="Additional resources" style="background-color: #f0f0f0; color:#636f73; border: 1px ridge #cdd3db; margin: 0.5em; padding: 0.5em; float: right; width: 35%; "><center><b><translate><!--T:4-->
Additional resources</translate></b></center>


{|style="margin: auto; margin-bottom:1em; width:100%; border:0px solid grey;"
{|style="margin: auto; margin-bottom:1em; width:100%; border:0px solid grey;"
| rowspan="2" style="padding:0px 10px 0px;" | [[File:SEP_next.png|45px|link=Configuring an Si3 Deduplication Store]]
| rowspan="2" style="padding:0px 10px 0px;" | <translate><!--T:5-->
| style="padding:0px 40px 0px 10px; color: grey; font-size: 90%; text-align:left;" | What is next?
[[File:SEP_next.png|45px|link=Special:MyLanguage/4_4_3_Grolar:Configuring_Si3_Deduplication_Store]]</translate>
[[Configuring an Si3 Deduplication Store]] – [[Source_Side_Dedup|Configuring Source-side Deduplication]] – [[Replication]]
| style="padding:0px 40px 0px 10px; color: grey; font-size: 90%; text-align:left;" | <translate><!--T:6-->
What is next?
[[Special:MyLanguage/SEP_sesam_Requirements#Si3_deduplication|Si3 Deduplication Hardware Requirements]] – [[Special:MyLanguage/4_4_3_Grolar:Configuring_Si3_Deduplication_Store|Configuring Si3 Deduplication Store]] – [[Special:MyLanguage/Source_Side_Dedup|Configuring Source-side Deduplication]] – [[Special:MyLanguage/Replication|Replication]]</translate>
|}
|}


{|style="margin: auto; margin-bottom:1em; width:100%; border:0px solid grey;"
{|style="margin: auto; margin-bottom:1em; width:100%; border:0px solid grey;"
| rowspan="2" style="padding:0px 10px 0px;" | [[File:SEP_Tip.png|45px|link=http://www.sep.de/products/deduplication/deduplication-analysis/#_]]
| rowspan="2" style="padding:0px 10px 0px;" | [[File:SEP_next.png|45px|link=https://www.sepsoftware.com/sep-sesam/si3-tachometer-analysis/]]
| style="padding:0px 40px 0px 10px; color: grey; font-size: 90%; text-align:left;" | See also: [http://www.sep.de/products/deduplication/deduplication-analysis/#_ SEP Tachometer] – [[List of Licenses]]
| style="padding:0px 40px 0px 10px; color: grey; font-size: 90%; text-align:left;" | <translate><!--T:7-->
See also: [https://www.sepsoftware.com/sep-sesam/si3-tachometer-analysis/ SEP Tachometer] – [[Special:MyLanguage/Licensing|Licensing]] – [[Special:MyLanguage/Seeding_Si3_Deduplication_Store|Seeding Si3 Deduplication Store]] – [[Special:MyLanguage/Encrypting_Si3_Deduplication_Store|Encrypting Si3 Deduplication Store]]</translate>
|}
|}


{|style="margin: auto; margin-bottom:1em; width:100%; border:0px solid grey;"
{|style="margin: auto; margin-bottom:1em; width:100%; border:0px solid grey;"
| rowspan="2" style="padding:0px 10px 0px;" | [[File:SEP Troubleshooting.png|45px|link=Troubleshooting_Guide]]
| rowspan="2" style="padding:0px 10px 0px;" | <translate><!--T:8-->
| style="padding:0px 40px 0px 10px; color: grey; font-size: 90%; text-align:left;" |Problems? Check the [[Troubleshooting Guide]].
[[File:SEP Tip.png|45px|link=Special:MyLanguage/FAQ|FAQ]]</translate>
| style="padding:0px 40px 0px 10px; color: grey; font-size: 90%; text-align:left;" | <translate><!--T:9-->
Check [[Special:MyLanguage/FAQ|FAQ]] to find the answers to most common questions.</translate>
|}
 
{|style="margin: auto; margin-bottom:1em; width:100%; border:0px solid grey;"
| rowspan="2" style="padding:0px 10px 0px;" | <translate><!--T:10-->
[[File:SEP Troubleshooting.png|45px|link=Troubleshooting_Guide#Si3_Deduplication]]</translate>
| style="padding:0px 40px 0px 10px; color: grey; font-size: 90%; text-align:left;" |<translate><!--T:11-->
Problems? Check the [[Special:MyLanguage/Troubleshooting_Guide#Si3_Deduplication|Troubleshooting Guide]].</translate>
|}</div></noinclude>
|}</div></noinclude>
<translate><!--T:12-->
When similar systems are backed up to the same data storage device, there exists the potential for redundancy within the backed up data. However, a data repository only needs to store one copy of the files to be able to restore them.
When similar systems are backed up to the same data storage device, there exists the potential for redundancy within the backed up data. However, a data repository only needs to store one copy of the files to be able to restore them.


'''Deduplication''' is a data compression technique that eliminates redundant blocks and thus reduces the size of the backed up data by not backing up duplicate data, which results in a reduction in required storage space. SEP sesam applies deduplication technique at block level, and offers a '''hybrid''' of both, '''target-based''' (Si3T) and '''source-based deduplication''' (SSDD) to provide the best possible backup-efficient scenarios for various environments. Both methods require a '''configured Si3 deduplication store''', for which a special license is needed. See [[List of Licenses]] for details.
<!--T:13-->
'''Deduplication''' is a data compression technique that eliminates redundant blocks and thus reduces the size of the backed up data by not backing up duplicate data, which results in a reduction in required storage space. SEP sesam applies deduplication technique at block level, and offers a '''hybrid''' of both, '''target-based''' (Si3T) and '''source-based deduplication''' (Si3S) to provide the best possible backup-efficient scenarios for various environments. Both methods require a '''configured Si3 deduplication store''', for which a special license is needed. See [[Special:MyLanguage/Licensing|Licensing]] for details.
===Si3 target deduplication===
 
Si3T is an '''inline, block-level data deduplication''' solution that writes data directly from the SEP sesam Server or Remote Device Server to the backup media. Backups are deduplicated on the fly as the data is written to the storage target. Since the data redundancies are transferred across the network unreduced and are deduplicated directly at the target, the network load is increased but the storage savings are huge.<br /> SEP sesam analyses blocks of data and determines whether the data is unique or has already been copied to the Si3 repository. Only single instances of unique data are sent to the repository while each deduplicated file is replaced with a stub file. This stub file points to the repository and is used to retrieve stored data.<br />
<!--T:14-->
<noinclude>See [[Configuring an Si3 Deduplication Store]] for details.</noinclude>
As of v. 4.4.3 ''Tigon'', deduplication store provides Si3 encryption. For details, see [[Special:MyLanguage/Encrypting_Si3_Deduplication_Store|Encrypting Si3 Deduplication Store]].
 
==={{anchor|target}}Si3 target deduplication (Si3T)=== <!--T:15-->
 
<!--T:16-->
Si3T is an '''inline, block-level data deduplication''' solution that writes data directly from the SEP sesam Server or Remote Device Server to the backup media. Backups are deduplicated on the fly as the data is written to the storage target. Since the data redundancies are transferred across the network unreduced and are deduplicated directly at the target, the network load is increased but the storage savings are huge.</translate><br /><translate><!--T:17-->
SEP sesam analyses blocks of data and determines whether the data is unique or has already been copied to the Si3 repository. Only single instances of unique data are sent to the repository while each deduplicated file is replaced with a stub object. This stub object points to the repository and is used to retrieve stored data. </translate><noinclude><translate><!--T:18-->
See [[Special:MyLanguage/4_4_3_Grolar:Configuring_Si3_Deduplication_Store|Configuring Si3 Deduplication Store]] for details.</translate></noinclude>
 
<translate>==={{anchor|source}}Source-side deduplication (Si3S)=== <!--T:19-->
 
<!--T:20-->
Si3S is introduced with '''SEP sesam version 4.4.3'''. Source-side deduplication means that data is deduplicated before it is sent across the network, therefore the backup is extremely bandwidth-efficient. During backup, SEP sesam calculates hashes of data to be backed up on the client and then transfers only changed or unknown blocks of the target<!-- target or source? --> Si3 dedup store to the backup server.</translate>  <noinclude><translate><!--T:21-->
See [[Special:MyLanguage/Source_Side_Dedup|Configuring Source-side Deduplication]] for details.</translate></noinclude>
 
<translate>===What works best?=== <!--T:22--></translate>
 
{{<translate><!--T:23-->
note</translate>|<translate><!--T:24-->
When choosing your deduplication method to eliminate redundant backup data, carefully analyze your existing infrastructure, network constraints and the type of data you want to protect.</translate>}}
*<translate><!--T:25-->
Typically, source-side deduplication is well suited for environments with a low LAN/WAN bandwidth and less amount of data. Another typical source-side dedup case is for remote data (ROBO) backup – for protecting and storing the data created by remote and branch offices.</translate>
 
*<translate><!--T:26-->
On the other hand, target deduplication might be more suitable for large data sets in a fast network, such as structured databases, that need to significantly reduce data volumes, or for data located on clients for which you do not want to increase CPU overhead.</translate>
 
*<translate><!--T:27-->
You should be aware of deduplication limitations before configuring it. For example, it does not make sense to deduplicate certain data, such as media files, which cannot be actively deduplicated because the files are unique and exist in compressed media formats. Among them are MP3, MP4, JPEG, PNG, zipped files etc.</translate>
 
*<translate><!--T:28-->
For different data types, different deduplication policy and method should be set up. E.g databases in one Si3 store and path backups in another Si3 store</translate>


===Source-side deduplication===
<translate><!--T:29-->
SSDD is introduced with '''SEP sesam version 4.4.3'''. Source-side deduplication means that data is deduplicated before it is sent across the network, therefore the backup is extremely bandwidth-efficient. During backup, SEP sesam calculates hashes of data to be backed up on the client and then transfers only changed or unknown blocks of the target<!-- target or source? --> Si3 dedup store to the backup server. <noinclude>See [[Sorce Side Dedup|Configuring Source-side deduplication]] for details.</noinclude>
Si3 deduplication can be used together with [[Special:MyLanguage/Replication|Si3 Replication]] to provide backup redundancy for disaster recovery and reduce the data transferred over the network. As of v. 4.4.3. ''Tigon''it is possible to use initial seed for setting up new Si3 deduplication store for the purpose of replication. For details, see [[Special:MyLanguage/Seeding_Si3_Deduplication_Store|Seeding Si3 Deduplication Store]].</translate>


===What works best?===
{{<translate><!--T:30-->
{{note|When choosing your deduplication method to eliminate redundant backup data, carefully analyze your existing infrastructure, network constraints and the type of data you want to protect.}}
tip</translate>|<translate><!--T:31-->
*Typically, source-side deduplication is well suited for virtual environments since a lot of data between virtual machines is redundant and bandwidth might be constrained. Another typical source-side dedup case is for remote data (ROBO) backup – for protecting and storing the data created by remote and branch offices.
You can download ''SEP Tachometer'' to analyse the structure of your data and calculate potential savings with SEP sesam Si3 deduplication. Check [https://www.sepsoftware.com/sep-sesam/si3-tachometer-analysis/ SEP Tachometer].</translate>}}
*On the other hand, target deduplication might be more suitable for large data sets in a fast network, such as structured databases, that need to significantly reduce data volumes, or for data located on clients for which you do not want to increase CPU overhead.
*You should be aware of deduplication limitations before configuring it. For example, it does not make sense to deduplicate certain data, such as media files, which cannot be actively deduplicated because the files are unique and exist in compressed media formats. Among them are MP3, MP4, JPEG, PNG, zipped files etc.
*For different data types different deduplication policy and method should be set up.


Si3 deduplication can be used together with [[Replication|Si3 replication]] to provide backup redundacy for disaster recovery and reduce the data transferred over the network.
<noinclude><translate>=={{anchor|best}}What is next?== <!--T:32-->


{{tip|You can download ''SEP Tachometer'' to analyse the structure of your data and calculate potential savings with SEP sesam Si3 deduplication. Check [http://www.sep.de/products/deduplication/deduplication-analysis/#_ SEP Tachometer].}}
<!--T:33-->
[[Special:MyLanguage/SEP_sesam_Requirements#Si3_deduplication|Si3 Deduplication Hardware Requirements]] – [[Special:MyLanguage/4_4_3_Grolar:Configuring_Si3_Deduplication_Store|Configuring Si3 Deduplication Store]] – [[Special:MyLanguage/Source_Side_Dedup|Configuring Source-side Deduplication]] – [[Special:MyLanguage/About_Replication|About Replication]]


<noinclude>==What is next?==
==See also== <!--T:34-->
[[Configuring an Si3 Deduplication Store]] – [[Source_Side_Dedup|Configuring Source-side Deduplication]] – [[Replication]]


==See also==
<!--T:35-->
[http://www.sep.de/products/deduplication/deduplication-analysis/#_ SEP Tachometer] –
[https://www.sepsoftware.com/sep-sesam/si3-tachometer-analysis/ SEP Tachometer] – [[Special:MyLanguage/Licensing|Licensing]] – [[Special:MyLanguage/Seeding_Si3_Deduplication_Store|Seeding Si3 Deduplication Store]] – [[Special:MyLanguage/Encrypting_Si3_Deduplication_Store|Encrypting Si3 Deduplication Store]]</translate></noinclude>
[[List of Licenses]]

Revision as of 08:26, 20 January 2021

Other languages:

Copyright © SEP AG 1999-2024. All rights reserved.

Any form of reproduction of the contents or parts of this manual is allowed only with the express written permission from SEP AG. When compiling and designing user documentation SEP AG uses great diligence and attempts to deliver accurate and correct information. However, SEP AG cannot issue a guarantee for the contents of this manual.

Docs latest icon.png Welcome to the latest SEP sesam documentation version 4.4.3/4.4.3 Beefalo V2. For previous documentation version(s), check documentation archive.


Overview

When similar systems are backed up to the same data storage device, there exists the potential for redundancy within the backed up data. However, a data repository only needs to store one copy of the files to be able to restore them.

Deduplication is a data compression technique that eliminates redundant blocks and thus reduces the size of the backed up data by not backing up duplicate data, which results in a reduction in required storage space. SEP sesam applies deduplication technique at block level, and offers a hybrid of both, target-based (Si3T) and source-based deduplication (Si3S) to provide the best possible backup-efficient scenarios for various environments. Both methods require a configured Si3 deduplication store, for which a special license is needed. See Licensing for details.

As of v. 4.4.3 Tigon, deduplication store provides Si3 encryption. For details, see Encrypting Si3 Deduplication Store.

Si3 target deduplication (Si3T)

Si3T is an inline, block-level data deduplication solution that writes data directly from the SEP sesam Server or Remote Device Server to the backup media. Backups are deduplicated on the fly as the data is written to the storage target. Since the data redundancies are transferred across the network unreduced and are deduplicated directly at the target, the network load is increased but the storage savings are huge.
SEP sesam analyses blocks of data and determines whether the data is unique or has already been copied to the Si3 repository. Only single instances of unique data are sent to the repository while each deduplicated file is replaced with a stub object. This stub object points to the repository and is used to retrieve stored data. See Configuring Si3 Deduplication Store for details.

Source-side deduplication (Si3S)

Si3S is introduced with SEP sesam version 4.4.3. Source-side deduplication means that data is deduplicated before it is sent across the network, therefore the backup is extremely bandwidth-efficient. During backup, SEP sesam calculates hashes of data to be backed up on the client and then transfers only changed or unknown blocks of the target Si3 dedup store to the backup server. See Configuring Source-side Deduplication for details.

What works best?

Information sign.png Note
When choosing your deduplication method to eliminate redundant backup data, carefully analyze your existing infrastructure, network constraints and the type of data you want to protect.
  • Typically, source-side deduplication is well suited for environments with a low LAN/WAN bandwidth and less amount of data. Another typical source-side dedup case is for remote data (ROBO) backup – for protecting and storing the data created by remote and branch offices.
  • On the other hand, target deduplication might be more suitable for large data sets in a fast network, such as structured databases, that need to significantly reduce data volumes, or for data located on clients for which you do not want to increase CPU overhead.
  • You should be aware of deduplication limitations before configuring it. For example, it does not make sense to deduplicate certain data, such as media files, which cannot be actively deduplicated because the files are unique and exist in compressed media formats. Among them are MP3, MP4, JPEG, PNG, zipped files etc.
  • For different data types, different deduplication policy and method should be set up. E.g databases in one Si3 store and path backups in another Si3 store

Si3 deduplication can be used together with Si3 Replication to provide backup redundancy for disaster recovery and reduce the data transferred over the network. As of v. 4.4.3. Tigon, it is possible to use initial seed for setting up new Si3 deduplication store for the purpose of replication. For details, see Seeding Si3 Deduplication Store.

SEP Tip.png Tip
You can download SEP Tachometer to analyse the structure of your data and calculate potential savings with SEP sesam Si3 deduplication. Check SEP Tachometer.

What is next?

Si3 Deduplication Hardware RequirementsConfiguring Si3 Deduplication StoreConfiguring Source-side DeduplicationAbout Replication

See also

SEP TachometerLicensingSeeding Si3 Deduplication StoreEncrypting Si3 Deduplication Store