Does Cascade accept hand-optimized assembly code? Does it achieve the same performance improvements as with compiled code?
Cascade can accept well-behaved hand optimized assembly code, i.e. code that conforms to standard ABI rules for function parameter passing etc. Such code would achieve the same performance as compiled code with the same instruction level parallelism. Hand optimized assembler performance may actually be superior because its optimized register allocation may reduce the number of memory accesses.