Why do CULA functions take pointers to host memory instead of GPU memory?
For the vast majority of CULA functions, memory management accounts for an insignificant portion of total processing time. Our CULA framework has an optimized memory management system that attempts to maximize memory bandwidth by properly aligning memory boundaries. More information can be found in the “Compiling with CULA” section of the CULA Programmer’s Guide.