Is deduplication performance determined by the number of disk drives used?
In any storage system, the disk drives are the slowest component. In order to get greater performance it is a common practice to stripe data across a large number of drives so they work in parallel to handle I/O. If the system uses this method to reach performance requirements you need to ask what the right balance between performance and capacity is. This is important since the point of data deduplication is to reduce the number of disk drives. In Data Domain’s SISL implementation, an inline, CPU-centric approach, very few disk drives are needed, so its deduplication delivers on the expectation of a smaller storage system.