Hypothesis

7 Matching Annotations

Feb 2018
github.com github.com

Inefficient Storage after Moving or Renaming Files? · Issue #334 · gilbertchen/duplicacy

6
1. tophee 21 Feb 2018
  
  in Public
  
  , I wonder if its possible to still process these small files in order, but skip storing them and instead put them to the side in memory until the next small file can be appended to it, doing this until it's chunk reaches the Min or Avg chunksize.
  
  duplicacy
2. tophee 21 Feb 2018
  
  in Public
  
  The theory behind splitting chunks using a hash function, is to consistently find boundaries where the preceding data looks a certain way. If you have similar or identical files being backed up from different sources, the chunk boundaries should fall at the same positions resulting in identical chunks that can be deduplicated.
  
  duplicacy
3. tophee 21 Feb 2018
  
  in Public
  
  I think inserting chunk boundaries at file boundaries would be very beneficial regarding deduplication. Consider a folder where some randomly chosen files are edited or added every day. File boundaries are very natural break points for changed data and should thus be utilized.
  
  duplicacy
4. tophee 17 Feb 2018
  
  in Public
  
  So your primary comparison should be official Duplicacy with 1M fixed chunks vs my branch with 1M variable chunks.
  
  duplicacy
5. tophee 17 Feb 2018
  
  in Public
  
  Duplicacy does not use file hashes at all to identify previously seen files that may have changed names or locations, but rather concatenates the contents of all files into a long data stream that is cut into chunks according to artificial boundaries based on a hash function.
  
  duplicacy important insights
6. tophee 17 Feb 2018
  
  in Public
  
  You'd expect that if a new/moved file is discovered on a subsequent backup run, and it has the 'exact' same File Hash, that you could effectively just relink it to the existing Chunks and boundaries.
  
  Yes, that's what I've been thinking all along!
  
  duplicacy
Visit annotations in context

Tags

duplicacy

important insights

Annotators

tophee

URL

github.com/gilbertchen/duplicacy/issues/334
github.com github.com

gilbertchen/duplicacy

1
1. tophee 19 Feb 2018
  
  in Public
  
  file chunks
  
  so what are file chunks?
  
  duplicacy
Visit annotations in context

Tags

duplicacy

Annotators

tophee

URL

github.com/gilbertchen/duplicacy/wiki/Cache

Tags

Annotators

URL

Tags

Annotators

URL