Psi Lambda LLC ψ(λκ) Kappa Library User Guide

Kappa Core API

At its core level, Kappa provides objected-oriented encapsulation of the CUDA functionality. The purpose of this part of the Kappa library is to provide integrated encapsulation of the CUDA functionality. This means that it should setup reasonable defaults and fail-safe behavior so that fewer attributes need to be specified. It also means that most details of CUDA functionality should occur by default with minimal specification by the developer such as properly allocating and freeing memory, efficiently copying data, preparing kernels for launch and synchronizing those activities.

The core level of Kappa should never prevent access to CUDA functionality. It is not meant to hide the functionality of CUDA via encapsulation but rather to make the CUDA functionality integrate together in the manner that it was designed and documented it to do.

Actually, none of Kappa should prevent access to CUDA functionality. It is explicitly designed to be extensible with access to core Kappa and CUDA functionality. This is so that, if, by oversight, access is not available from some part of Kappa, an extension can be developed, by any adequate developer, that rectifies the shortcoming.

This following table provides a summary overview of the Kappa core functionality. Please refer to the Kappa Reference Manual for further details.

Kappa Core Classes



The main class for Kappa.


Provides access to CUDA GPU properties.


Main encapsulation for background execution, scheduling, and commands. This module creates and tracks Context objects.


Main interface for creating and retrieving core objects such as Variable, Array, Stream, Module (C and CUDA), and Kernel(C and CUDA) objects and for synchronization.


Provides encapsulation for host and device memory (which inherits functionality from LocalMemory, DeviceMemory, and DeviceTexture).


Provides access to the CUDA array functionality.


Encapsulates a CUDA stream.


Provides timer functionality based on CUDA events.


Provides the CUDA basis for timers and synchronization.


CUDA module compilation, loading, and access to Module variables, textures, and, kernels.


CUDA kernel call setup and launch.


C module loading and access to kernels.


C kernel call setup and launch.

Kappa Command Queue and ProcessControlBlock

The Kappa and ProcessControlBlock classes provide a background command queue with a separate execution thread for each GPU. The execution thread for each GPU has the association to a CUDA context and so each one of these can be considered a GPU/CPU process that runs CUDA and C kernels. Commands to schedule on these processes must inherit from either the kappa::Command or the kappa::command::Keyword classes. The scheduler for each process maintains a command input queue, a command paused queue, and a command running queue.

The kappa::Process class receives exception and status notifications from these schedulers and the command execution. The kappa::Process class can pass along these exception and status notifications. See the section on Errors and Testing or the Kappa Reference Manual for more information.

Kappa currently supports the following set of possible command statuses:

This core level of Kappa functionality does not provide any scheduling relationships between commands. The only scheduling provided at this level is the CUDA core functionality that constrains CUDA functions on the same stream to execute in order. To have commands works together to perform tasks in the right order, please use the Kappa Process class. It is unsupported by Psi Lambda to mix direct usage of the Kappa command queue with the use of the Kappa Process class.

An example program using the core Kappa functionality is documented in the Kappa Reference Manual. This example program also shows the usage of timers and tracking of Context memory usage.