How does the Mlucas FFT compare to other high-performance FFT implementations, such as the FFTW package?
I have not had time or desire to package the FFT core of Mlucas into a form suitable for inclusion in the FFTW benchmarks, but my own comparisons indicate that the Mlucas FFT is typically about twice as fast as FFTW for the vector lengths of interest to Mersenne prime searchers (real vectors of length 128K and larger, where K=210=1024) running on comparable hardware.