How can the topology-aware scheduling reduce the execution time by 50%?
In the modern multi-core processing, each socket CPU and each core has execution units, cache, memory channels, I/O channels. Under NUMA (Non-Uniform Memory Access) a processor can access its own local memory faster than non-local memory, that is, memory local to another processor or memory shared between processors. Topological Scheduling allows to schedule jobs at core level or CPU level according to its unique needs.