Kappa can show an easy ten times speed increase on GF100 GPUs (compared to CUDA driver or runtime 3.1 API programs) Here are two Kappa scheduling scripts that only differ by the assignment of a stream to a kernel–one has the same stream assigned to all kernel executions so that the kernels execute sequentially and [...]

