Whats the simplest way to explain the difference between compression and dedupe?
I would say compression over time. Compression is looking at a single instance of data to find data that is the same as other parts of the data that can be replaced with pointers. Deduplication is doing that but also comparing it to similar data that we’ve seen before. So it’s not only looking in the file for things it can get rid of, but it’s looking in that file for parts of the file or backup stream that have been seen by the dedupe device before. There are some bloggers out there that have simply said that dedupe is compression. In the beginning, for a definition of compression, I could have said anything that makes data smaller. I don’t like that definition. Compression is a very specific technology that works on a specific instance of data. Dedupe works on data over time. The more data, the more repetitive data that you send to a dedupe device, the better your dedupe ratio will actually get over time. Your dedupe ratio when you first start using a dedupe device is actually very d