What are hash collisions, and should my customer and I be worried about corruption due to hash collisions?
Some deduplication software and hardware use what’s call hashing to identify data that is duplicate data within the system. If the system finds a duplicate chunk of data, the duplicate is discarded and a small pointer is put in place. A hash collision occurs when a new chunk of data comes into the system and the hashing algorithm (typically SHA-1-based) finds a match and discards the data, even though there really was no match. With some really complex math, the probability turns out to be so infinitesimally small that you have a better chance that a cyclic redundancy check (CRC) sum will cause data to be stored incorrectly on disk than you are to have a hash collision. But I guess someone eventually wins the lottery. That being said, I’m not worried.
Related Questions
- I got "Invalid file hash (possible download corruption) -261" error during installation. How can I resolve this?
- What do I do if I get an "Invalid file hash (possible download corruption) -261 error" or "Download Error -228"?
- Since iSCSI is Ethernet based, should we be worried about collisions in the switch?