What algorithm is used for transposing matrix in PTRANS?
The detailed description of the matrix transposition algorithm used by PTRANS is available as LAPACK Working Note No. 65. To summarize what the above papers say: the dimensions Px and Py of the virtual process grid for PTRANS have to have small GCD (Greatest Common Divisor) and small LCM (Least Common Multiple) to achive good performance. The number of steps to do the transpose is LCM(Px,Py)/GCD(Px,Py). And the number of communicating pairs is GCD(Px,Py).