Psi Lambda LLC ψ(λκ) Kappa Library User Guide

Kappa Process Language

The Kappa Process processes may be defined using either text statements or C++ Instruction objects. Since a simple and sensible way to get started with or to generate the C++ Instruction objects is to create Kappa Process processes using text statements and then convert them to projects of C++ source code using Instruction objects, this section will mainly cover the text statements method of defining processes.

Kappa Process Language tag structure

Kappa provides a language for defining processes that can combine CPU and GPU operations in dynamic, coherent ways to implement tasks. Kappa Process language statements must be enclosed in paired tags:

<kappa>

...Kappa language statements

</kappa>

Kappa Process will ignore anything that is not enclosed in tags. Kappa will ignore any tags that it does not recognize. Tags can not currently be nested. The current tags that Kappa Process will currently utilize are either "<kappa>" tags. Other tags will be added by Kappa in the future.

Attributes may be included in the opening kappa tag. If Kappa does not recognize an attribute, it will be ignored. The attributes that Kappa Process currently recognizes and uses are used to define the block of statements within the kappa tags as either a Kappa Process subroutine or function. If no subroutine or function attribute is present, then the block of statements within the kappa tags is considered to be statements to be prepared for execution:

<kappa>

...Kappa language statements to immediately prepare and queue for execution

</kappa>

RememberAnonymous and DoNotExecute

The kappa::Process.DoNotExecute and kappa::ProcessRememberAnonymous methods may be used to tell kappa::Process not to execute these statements and/or to remember these as a subroutine named “main”.

Subroutine and Function Definition

If the subroutine attribute (and subroutine name) are given, then the statements are not prepared for execution but are instead stored for later invocation as a subroutine:

<kappa subroutine=subroutine_name>

...Kappa language statements to store for later invocation as a subroutine

</kappa>

If the function attribute (and function name) are given, then the statements are not prepared for execution but are instead stored for later invocation as a function:

<kappa function=function_name arguments="arguments" map="map">

OutputRoutines, OutputRoutine, and LoadRoutine

The kappa::Process.OutputRoutines, kappa::Process.OutputRoutine, and kappa::Process.LoadRoutine may be used to output these subroutines and functions to C++ files and to load them from a compiled shared library. The kappa::Process.OutputRoutines and kappa::Process.OutputRoutine methods will overwrite any existing files of the same names in the specified output directory. Please note that Psi Lambda LLC considers the source code produced by the kappa::Process.OutputRoutines and kappa::Process.OutputRoutine methods to be copies of the Kappa Process statement files which retain the copyright of the original statement files. See the Kappa Reference Manual for more details.

Kappa Process function attribute arguments and input/output maps

The arguments and map attributes are optional but are usually necessary. The arguments are the comma separated list of local argument identifiers. They provide a local name for a Variable or Value. If they are a dynamic local Value, then they must contain (or start with) a pound sign: “#local_value”. If they are a local Variable name, then it should only consist of alphanumeric characters and the underscore character.

The input/output ”map” attribute value is a comma delineated list. Each Variable or Value name given in a map should match the name given as a argument. Each item in the list has items on the left of an equal sign that are for output and items on the right of the equal sign that are for input. If no equal sign is present then the items are for both output and input. In the following example:

A = B C, #area = #row #col, D

A, #area, and D are output items (Variable objects and Value objects) and B, C, #row, #col, and D are input items (Variable objects and Value objects). Yes, D was listed twice—it is both an input and output item (Variable in this case). A, B, C, D are Variable objects and #area, #row, and #col are Value objects.

Inputs for a function are copied from the corresponding external (calling) Variable or Value prior to function statements executing. Outputs for a function are copied to the corresponding external (calling) Variable or Value after function statements finish executing. So, to illustrate, the Variable B is copied from some (unknown at this point) Variable when the function is called and the Variable A is copied to some (unknown at this point) Variable prior to the function returning. The same for the remaining Variable objects and Value objects. For an input Value, the corresponding calling value could be a static string or numeric value—it does not have to be a dynamic Value.

(A local Variable or Value name could consist of a fully qualified namespace Variable or Value name with slash characters delineating the namespace path but that can not truly be considered to be a local Variable or Value. Consider carefully the effects if this is done—the calling Variable or Value may be copied to this “local” Variable or Value prior to the function statements executing and the “local” Variable or Value may be copied to a calling Variable or Value after the function statements have finished executing, depending on the contents of the input/output ”map”.)

Kappa Process Language comments

Kappa supports either 'C' style comments of the forms:

/* Comment

*/

or the newer single line comments:

// Comment

Kappa Process Instructions

This section may be skipped if a user or developer only wishes to define Kappa Process processes using statements.

All Kappa Process statements are parsed into Kappa Process Instruction objects. Kappa Process subroutines and functions are stored as std::vector objects of these Instruction objects. Kappa Process Instruction objects have a keyword, a name, Attributes, and Arguments. Kappa Process Instruction objects may also store the unparsed form of the arguments and map and the original, unqualified form of the name.

It is possible for a developer to directly create Kappa Process processes at the Instruction level. The simplest way to get started in this manner is to create a Kappa Process using statements and use the example program to output these to C++ source code. The kappa::Process.LoadRoutine and kappa::Process.ExecuteRoutine or kappa::Process.ExecuteFunction methods may then be used to load and execute these instructions as a subroutine or function.

An example of creating a routine (named “myroutine”) with an instruction is the following:

// Create the Attributes object and add the attributes to it.

Attributes *attribute_Context_context_1 = new Attributes();

(*attribute_Context_context_1)

.Add(string("CONTEXT_FLAGS"), (unsigned int)CU_CTX_SCHED_YIELD)

.Add(string("CUDA_CONTEXT_TYPE"), (unsigned int)CUDA_CONTEXT)

.Add(string("REPORT_MEMORY"), true);



// Unparsed arguments and Unparsed map

// These unparsed arguments and map may, for certain instructions,

// be used to create a unique name for the resulting Command

string upa_Context_context_1 = string("");

string upm_Context_context_1 = string("");

// Create the Arguments object and add the arguments to it.

Arguments *argument_Context_context_1 = new Arguments();



// Create the Instruction with the Attributes and Arguments

Instruction *instruction_Context_context_1 =

new Instruction(string("Context"), string("context"),

attribute_Context_context_1,argument_Context_context_1,

upa_Context_context_1,upm_Context_context_1);



Subroutine *routine = new Subroutine(string ("myroutine"));

(*routine).Add(instruction_Context_context_1);



Attributes *attribute_Value_MyOtherNamespace_value_1 = new Attributes();



string upa_Value_MyOtherNamespace_value_1 = string("42");

string upm_Value_MyOtherNamespace_value_1 = string("");

Arguments *argument_Value_MyOtherNamespace_value_1 = new Arguments();

(*argument_Value_MyOtherNamespace_value_1)

.In((unsigned int)42);

Instruction *instruction_Value_MyOtherNamespace_value_1 =

new Instruction(string("Value"), string("/My/Other/Namespace#value"),

attribute_Value_MyOtherNamespace_value_1,argument_Value_MyOtherNamespace_value_1,

upa_Value_MyOtherNamespace_value_1,upm_Value_MyOtherNamespace_value_1);



(*routine).Add(instruction_Value_MyOtherNamespace_value_1);



(*routines)[string ("myroutine")] = routine;

This illustrates creating a subroutine with an instruction to create a context and a Value. The equivalent statement (indeed—it is the source statement for the above C++) is:
<kappa>
!Context CUDA_CONTEXT_TYPE=%KAPPA{CUDA_CONTEXT}
CONTEXT_FLAGS=%CUDA{CU_CTX_SCHED_YIELD}
REPORT_MEMORY=true
-> context;
!Value -> /My/Other/Namespace#value = 42;
</kappa>

Kappa Process Language syntax

For any form of Kappa statement, anywhere there is a space, any other white space can occur such as a new line. Kappa statements currently have only two basic forms. The first form is for declarative statements and the following shows all of the valid declarative statement forms:

!Keyword attr="value" ... -> name@module(arguments) [map];

!Keyword -> name@module(arguments) [map];

!Keyword attr="value" ... -> name@module(arguments);

!Keyword attr="value" ... -> name@module = from;

!Keyword -> name@module = from;

!Keyword -> name@module(arguments);

!Keyword attr="value" ... -> name@module;

!Keyword attr="value" ... -> name(arguments) [map];

!Keyword -> name(arguments) [map];

!Keyword attr="value" ... -> name(arguments) = from;

!Keyword attr="value" ... -> name(arguments);

!Keyword attr="value" ... -> name = from;

!Keyword -> name(arguments) = from;

!Keyword -> name(arguments);

!Keyword -> name = from;

!Keyword attr="value" ... -> name;

!Keyword -> name;

!Keyword name(arguments);

!Keyword;

where 'attr=”value” ...' denotes that there are zero or more attribute/value pairs. The quotes around the attribute value are usually optional and sometimes undesired. The arguments and map are of the same format as discussed previously. However, the arguments and maps given in these declarative statements must have the parenthesis or brackets while parenthesis and brackets are not used for tag attributes.

In the declarative statements, the keyword is always mandatory. So the minimal declarative statement syntax possible is:

!Keyword;

To the Kappa parser this is all that is necessary but it parses the remaining elements, if they are present in the correct forms as shown above. If elements are present, but not in the correct form(s), then the Kappa parser will complain. Kappa keywords enforce their own syntax and complain if elements they require are missing.

The various forms shown above is for the convenience of the user—the number of different possibilities is actually less than shown. The “module” element show above, can be given as the attribute “MODULE” instead. The ”from” element constitutes the last of the (comma delimited) “arguments” element.

The second basic form is for decision statements:

? logical_expression -> arguments;

The arguments for a decision statement are a list of Variable objects and Value objects to make dependent on the decision. The logical_expression must only contain numbers, configuration values, Value objects, and logical or numeric operators. All configuration values and Value objects must be resolvable to numeric or boolean values. The logical_expression is evaluated and, if it is zero, the decision statement reports a CANCELED completion status. This cancels execution of all statements that are dependent on the Variable objects or Value objects in the arguments list. It reports a FINISHED completion status if the logical_ expression evaluates to nonzero. It reports a FAILED completion status (which will have the same effect as a CANCELED completion status) if the logical_expression can not be evaluated to an integer or boolean value (for example if configuration values or Value objects can not resolve or that resolve to text values that are not convertible to numeric values).

For boolean values for attributes or Value objects, “true” or “on” and “false” and “off” or numeric values may be given. For Indices Value objects, a comma delimited list of integer values with optional spaces may be given. For logical and numeric expressions, configuration values and Value object may be used, as long as they are resolvable. For logical and numeric expressions, parenthesis are supported and outer, enclosing parenthesis may be mandatory for proper classification.

An 'if' function is supported in expressions. It is currently the sole mechanism, within a Kappa process, to conditionally set a Value to different values. It may be used in a !Value statement of the following form:

!Value -> value_name = if( condition , value_if_true , value_if_false );



For logical and numeric expressions, the following binary operators are currently supported:

+ - * / ^ or and xor < > == <= >= !=

The following trigonometric functions are available: sin, cos, tan, asin, acos, atan, sinh, cosh, tanh, asinh, acosh, atanh. Also the following functions are available: log2, log10, log, ln, exp, sqrt, sign, rint, abs, min, max, avg, sum—where log is the same as log10, exp is e raised to the power of the argument, sqrt is the square root of the argument, sign returns -1 if the number is negative and +1 if the number is positive, and rint which rounds the argument to the nearest integer.

Built-in keywords

Kappa comes with a set of built-in keyword statements which are documented in the following sections. If a kappa::command::Keyword uses the same keyword as a built-in keyword statement, then it replaces the built-in keyword statement and correspondingly changes the Kappa Process functionality. Any such changes must be documented by the developers of the kappa::command::Keyword classes and can not be within the scope of any Psi Lambda Kappa documentation.

There are four main types of built-in keyword statements. They consist of resource creation, resource destruction, processing, and informational.

Resource Creation, Loading, or Access Keywords

Resource initialization statements create, load, or access a resource.

Context

Possible syntax:

!Context attributes -> context_name;

!Context context_name;

Possible attributes are:

USE_CURRENT

This may be either true or false (the default). This sets whether the Context statement should try to use an existing CUDA context or whether to always create a new one. If this is set to true, and an existing CUDA context is not available, then a CUDA context is created. Example:

USE_CURRENT=true

CUDA_CONTEXT_TYPE

This may be:

%KAPPA{CUDA_CONTEXT} which is an integer 0 (the default),

%KAPPA{OPENGL_CONTEXT},

%KAPPA{DIRECT3D9_CONTEXT},

%KAPPA{DIRECT3D10_CONTEXT},

%KAPPA{DIRECT3D11_CONTEXT}

This sets the type of CUDA context created. For graphics resource mapping and interoperability, the correct CUDA context type must be selected. Example:

CUDA_CONTEXT_TYPE=%KAPPA{OPENGL_CONTEXT}

CONTEXT_FLAGS

This is the CUDA context flags. You may use the cuda_translation.conf values in the form: ( %CUDA{CU_CTX_SCHED} | %CUDA{CU_CTX_MAP_HOST} ) to construct the flag value. The default value is shown in this example:

CONTEXT_FLAGS=( %CUDA{CU_CTX_SCHED} | %CUDA{CU_CTX_MAP_HOST} )

REPORT_MEMORY

This may be either true (the default) or false. When set to true, the context tracks device memory allocation requests and displays them to standard error if the “!Context context_name;” statement is subsequently given. See the Errors and Testing section for further information.

CACHE

Sets the preferred cache type for the CUDA context. Set this to CU_FUNC_CACHE_PREFER_SHARED to prefer more shared memory and set this to CU_FUNC_CACHE_PREFER_L1 to prefer more L1 cache.

CACHE=%CUDA{CU_FUNC_CACHE_PREFER_SHARED}

Please see the description of the cuCtxSetCacheConfig function in the CUDA Reference Manual for a description of setting the cache preference.

STACK_SIZE

Sets the stack size for each GPU thread.

Please see the description of the cuCtxSetLimit function in the CUDA Reference Manual for a description of setting the stack size.

HEAP_SIZE

Sets the size of the heap for the device malloc() and free() calls.

Please see the description of the cuCtxSetLimit function in the CUDA Reference Manual for a description of setting the heap size.

PRINTF_FIFO_SIZE

Sets the size of the FIFO buffer for the printf() device calls.

Please see the description of the cuCtxSetLimit function in the CUDA Reference Manual for a description of setting the FIFO buffer size.

Value

Create (or replace) a dynamic Value.

The possible syntax is:

!Value attributes -> value_name;

!Value value_name;

!Value attributes -> value_name = from;

!Value value_name= from;

where the value_name may be a Value name as discussed in the Overview or Namespace section and there are not currently any supported attributes. For extraction of values from a Variable, the possible syntax is:

!Value attributes -> value_name(slice_arguments) = variable_name;

!Value attributes -> value_name(slice_arguments, variable_name);

!Value -> value_name(slice_arguments) = variable_name;

!Value -> value_name(slice_arguments, variable_name);

The value_name may still be a Value name as discussed in the Overview or Namespace section. The variable_name is the name of an existing Variable that contains the values to extract.

For the first syntax form, where the values are not being extracted from a Variable, the from expression may be:

For the form where values are being extracted from a Variable, the possible attribute that is currently supported is:

FROM_FLOAT

This may either be false (the default) which means that the values to be extracted are integer values or may be true so that the values are extracted from the Variable are “float” values. If more than one value is extracted, then the resulting data type is Indices which is a vector of unsigned integer values and float values are rounded to unsigned integer. If one value is extracted and the FROM_FLOAT attribute is true, then the resulting data type is the float representation of the Variable value. Example:

FROM_FLOAT=true

Retrieving Value objects from a Variable without the correct FROM_FLOAT attribute or if the elements of the Variable are not “simple” multidimensional arrays of integer or float values will lead to incorrect Value objects, may lead to program crashes, and is not supported by Psi Lambda. NOTE: if the Variable contains complex data structures, data structures that are padded or offset or contain values that are not integer or float in sequential format or if the FROM_FLOAT attribute is not given when the Variable contains float data then this will probably result in Value(s) that are incorrect and might result in program memory access violations (segmentation faults or “bus” errors). The feature of retrieving Value objects and Value Indices from Variable objects (along with the decision statement) is meant to allow the calculations of CUDA kernels to have (indirect) control of the sizing and execution of Kappa Process statements. However, as with any powerful feature, it possible for it to be misused.

The “slice_arguments” for extraction from a Variable depends on how many dimensions the Variable has:

The slice argument may be a Value Indices of the proper length.

Examples for a linear memory (one-dimensional) Variable:

// Extract the full Variable as integer an (Indices) Value object

!Value -> ivalues() = MyVariableIndices;

// Extract three elements from the Variable as an integer (Indices) Value

!Value -> ivalues(3) = MyVariableIndices;

// Assuming at least two float elements, extract the second element from the Variable

// as a float Value

!Value FROM_FLOAT=true-> fvalues(1,1) = MyVariableFloat;

Examples for an array of integer memory (three-dimensional) Variable:

// Extract a full row from the Variable as an integer (Indices) Value

// This specifies the complete x row at the fourth (3) y index and the fifth (4) z index.

!Value -> ivalues(3,4) = MyVariable3DIndices;

// Extract part of a row from the Variable as an integer (Indices) Value

// This specifies 2 elements of the x row at the the fourth (3) y index

// and the fifth (4) z index.

!Value -> ivalues(2,3,4) = MyVariable3DIndices;

// Extract two elements of a row from the Variable as an integer (Indices) Value

// This specifies the second and third elements (1 and 2) of the x row

// at the the fourth (3) y index and the fifth (4) z index.

!Value -> ivalueIndices = [ 1, 2, 3, 4 ];

!Value -> ivalues(#ivalueIndices) = MyVariable3DIndices;

This last example illustrates how a Value Indices may be given as an argument anytime the Value Indices contains the proper number of and value of elements to correspond to the correct argument values. Since these Value Indices could have, in turn, been extracted from other Variable objects, this allows for true dynamic control of the Kappa Process—especially since Variable declaration arguments and CUDA/Kernel GRID and BLOCKSHAPE launch attributes may also be Value Indices.

Variable

Create a Variable. A Variable can be a “blob” of data—with no dimensional sizes given to Kappa (technically a ”blob” is a one-dimensional Variable with one element whose size is the amount of memory allocated and copied by the Variable—this means that Kappa is totally unaware of any internal structures) or a Variable can be declared as containing certain dimensions of certain sizes. Kappa supports Variable objects of any dimension greater than or equal to one and any dimension may be of any size supported by an unsigned integer value.

Kappa Variable objects may be assigned streams which are used to allow asynchronous copying.

To the Kappa Process, except for when Value objects are extracted from a Variable, Variable objects are treated as opaque. A Variable objects internal contents are not “visible” to Kappa Process statements—only to Kappa core library methods C and CUDA kernels, user IOCallbacks and kappa::command::Keyword objects, etc.

Possible syntax:

!Variable attributes -> variable_name(arguments);

!Variable variable_name(arguments);

!Variable attributes -> variable_name;

!Variable variable_name;

Possible attributes are:

VARIABLE_TYPE

This is the type of Variable to create and provide allocation and copying behavior for. The default is Local or zero. Possible values are:

Local=0

Provide for initial allocation using host memory but use CUDA memory allocation functions. This is “pinned” or “non-paged” memory. This type of memory is a scarcer resource than LocalOnly memory.

LocalAndDevice=1

Provide for initial allocation using integrated or mapped memory and use CUDA memory allocation functions. This will fallback to separate host and device allocation if the GPU does not support integrated or mapped memory. This type of memory is a scarcer resource than LocalOnly memory.

LocalOnly=2

Provide for initial allocation using host memory and use standard malloc memory allocation function. This is “non-pinned” or “paged” memory. This type of Variable will not copy to or from a GPU as fast as other VARIABLE_TYPE types.

LocalToDevice=4

Provide for initial allocation using host memory but use CUDA memory allocation functions with the WriteCombined flags set. This is”pinned” or “non-paged” memory. This type of memory is a scarcer resource than LocalOnly memory but is optimal for staging data from the host to the GPU.

Device=5

Provide for initial allocation using device memory and use CUDA memory allocation functions. If needed, host memory will be allocated as Local--see above.

DevicePitch=6

Provide for initial allocation using device memory and use CUDA memory allocation functions with pitch allocation: cuMemAllocPitch. If needed, host memory will be allocated as Local--see above.

DeviceOnly=7

Provide for initial allocation using device memory and use CUDA memory allocation functions. If needed, host memory will be allocated as LocalOnly--see above.

Example:

VARIABLE_TYPE=%KAPPA{Local}

COLUMNS

The number of (virtual) columns in the Variable. For a Variable that will hold a dataset with rows and columns and the columns are of different widths, this allows specifying the number of columns contained in the Variable. Example:

COLUMNS = 4

NULL_BLOCK_WIDTH

Used to change the block width used to store the NULL value mask for the Variable from 8 bit words to 32 bit words. Example:

NULL_BLOCK_WIDTH = 32

STREAM

The name of a stream to associate with the Variable for doing asynchronous copies (and asynchronous or concurrent kernel launches if the same stream name is also given to a CUDA/Kernel). The Kappa Stream object and CUDA stream and any necessary CUDA events are automatically created and managed. Example:

STREAM=mystream

DEVICEMEMSET

A boolean value specifying whether to perform a cuMemsetD* to clear the device memory. Default value is false. This assumes that it is desired to initialize the memory to zero and that either the dimensions and element size have been given or that the memory may be initialized to zero as 32 bit integer values. Example:

DEVICEMEMSET=true

The arguments consist of either the total size of the Variable (assumes a “blob” Variable of one dimension and element size of 1) or of the size of each dimension of the Variable followed by the element size (in bytes) of the Variable. Any number of dimensions greater than or equal to one is supported.

Array

Create an Array for use as a source for the Texture keyword statement, the Surface keyword statement, or with the CUDA cuMemcpy2D, cuMemcpy2DAsync, cuMemcpy2DUnaligned, cuMemcpy3D, cuMemcpy3DAsync, cuMemcpyAtoX, cuMemcpyXtoA, cuMemcpyAtoXAsync, or cuMemcpyXtoAAsync routines (where X is one of H (host), D (device), or A (array)).

Possible syntax:

!Array attributes -> array_name = source_name;

!Array array_name = source_name;

!Array attributes -> array_name;

!Array array_name;

Source name may be the name of a Variable that contains data used to initialize the array.

Possible attributes are:

FLAGS

The default value is zero. This may be set to one of the following:

CUDA_ARRAY3D_2DARRAY or CUDA_ARRAY3D_SURFACE_LDST.

This must be set to CUDA_ARRAY3D_SURFACE_LDST if the array will be used with a surface.

WIDTH

This is an unsigned integer specifying the width of the array and has a default width of 1. Example:

WIDTH=2

HEIGHT

This is an unsigned integer specifying the height of the array and has a default height of 1. If set to zero, then the Array is one-dimensional. Any nonzero value makes the Array to be at least two-dimensional—setting DEPTH to nonzero makes the Array three-dimensional and implies that height is nonzero. Example:

HEIGHT=2

ARRAY_FORMAT

This is the data format of the array. You may use the cuda_translation.conf values in the form: %CUDA{CU_AD_FORMAT_FLOAT} (the default value). Possible values are:

CU_AD_FORMAT_UNSIGNED_INT8=1

CU_AD_FORMAT_UNSIGNED_INT16=2

CU_AD_FORMAT_UNSIGNED_INT32=3

CU_AD_FORMAT_SIGNED_INT8=8

CU_AD_FORMAT_SIGNED_INT16=9

CU_AD_FORMAT_SIGNED_INT32=10

CU_AD_FORMAT_HALF=16

CU_AD_FORMAT_FLOAT=32

Example:

ARRAY_FORMAT=%CUDA{CU_AD_FORMAT_FLOAT}

NUMBER_CHANNELS

This is an unsigned integer specifying the number of packed components (vectors, such as float2 or float4) of the array and has a default value of 1. This may also be set to 2 or 4.

DEPTH

This is an unsigned integer specifying the depth of the array and has a default depth of 0. If set to zero, then the Array is one-dimensional (if height is zero) or two-dimensional (if height is nonzero). Any nonzero value makes the Array to be three-dimensional and implies that height is nonzero. Example:

DEPTH=1

STREAM

The name of a stream to associate with the Array for doing asynchronous copies (and asynchronous or concurrent kernel launches if the same stream name is also given to a CUDA/Kernel). The Kappa Stream object and CUDA stream and any necessary CUDA events are automatically created and managed. Example:

STREAM=mystream

Texture

Create a Texture for use by a CUDA/Kernel. The name of the texture must match the name declared in the CUDA module.

Possible syntax is:

!Texture MODULE=module_name attributes -> texture_name = source_name;

!Texture attributes -> texture_name@module_name = source_name;

!Texture texture_name@module_name = source_name;

!Texture MODULE=module_name attributes -> texture_name;

!Texture attributes -> texture_name@module_name;

!Texture texture_name@module_name;

Source name may be the name of a Variable or Array that contains data used to initialize the Texture, as appropriate. A Variable may be used to initialize a linear, one-dimensional Texture. An Array is necessary to initialize a two-dimensional or three-dimensional Texture. The module_name may either be given as the MODULE attribute or may be appended to the texture_name with an '@' delimiter.

Possible attributes, besides the MODULE attribute, are:

AddressMode0

Sets the addressing mode for the first (zero) dimension of the texture reference. Possible values are: CU_TR_ADDRESS_MODE_WRAP (the default), CU_TR_ADDRESS_MODE_CLAMP, or CU_TR_ADDRESS_MODE_MIRROR. Example:

AddressMode0=%CUDA{CU_TR_ADDRESS_MODE_WRAP}

AddressMode1

Sets the addressing mode for the second (one) dimension of the texture reference. Possible values are: CU_TR_ADDRESS_MODE_WRAP (the default), CU_TR_ADDRESS_MODE_CLAMP, or CU_TR_ADDRESS_MODE_MIRROR. Example:

AddressMode1=%CUDA{CU_TR_ADDRESS_MODE_WRAP}

FilterMode

Sets the filtering mode used when reading memory via the texture reference. Possible values are: CU_TR_FILTER_MODE_POINT and CU_TR_FILTER_MODE_LINEAR (the default). Example:

FilterMode=%CUDA{CU_TR_FILTER_MODE_LINEAR}

TextureFlags

Sets flags for how the data is returned via the texture reference. Possible flags are: CU_TRSF_READ_AS_INTEGER and CU_TRSF_NORMALIZED_COORDINATES (the default). Example:

TextureFlags=%CUDA{CU_TRSF_NORMALIZED_COORDINATES}

Note: do not forget to add the name of the texture to the map of any dependent CUDA/Kernel statements.

Surface

Create a Surface for use by a CUDA/Kernel. The name of the surface must match the name declared in the CUDA module. The CUDA module must be compile for compute architecture 2.0 or higher (add -arch=compute_20 to the NVCC_PTX entry in the kappa.conf configuration file).

Possible syntax is:

!Surface MODULE=module_name -> texture_name = source_name;

!Surface -> texture_name@module_name = source_name;

!Surface texture_name@module_name = source_name;

Source name must be the name of an Array to bind to the surface. The module_name may either be given as the MODULE attribute or may be appended to the texture_name with an '@' delimiter.

Note: do not forget to add the name of the surface to the map of any dependent CUDA/Kernel statements.

C/Module

Load a C or C++ module (shared library file or DLL).

Possible syntax is:

!C/Module attributes -> module_name = file_name;

!C/Module attributes -> module_name(file_name);

The file name must be the name of the shared library or DLL. In the kappa.conf file, in the “/Kappa/C/Module” section, the OBJ_PATHS labels may be used to set paths to search for C/Module file names. A %s in the value for the OBJ_PATHS is replaced by the path to the executable file, if it was given as an argument to Kappa. If the OBJ_PATHS mechanism is not used, then an absolute file path name must be given.

The SHARED_OBJECT_EXTENSION label sets the extension for the module file. For Microsoft Windows, the extension should be “.dll”, for Macintosh OS/X, the extension should either be “.dylib” or “.so”, for most other operating systems the extension should be “.so”.

There are no current attributes for the C/Module statement at this time.

CUDA/Module

Load a CUDA module.

Possible syntax is:

!CUDA/Module attributes -> module_name = file_name;

!CUDA/Module attributes -> module_name(file_name);

The file name must be the name of a CUDA module file. In the kappa.conf file, in the “/Kappa/CUDA/Module” section, the OBJ_PATHS labels may be used to set paths to search for CUDA/Module file names. A %s in the value for the OBJ_PATHS is replaced by the path to the executable file, if it was given as an argument to Kappa. If the OBJ_PATHS mechanism is not used, then an absolute file path name must be given.

The file name may be a “.cu” source file, a “.ptx” intermediate file, or a CUDA binary object file. If a “.ptx” file of the same age or newer as a “.cu” file is found, then the “.ptx” file is preferred.

For a “.cu” source file, proper values for the CUDA_PATH and NVCC_PTX labels must be given in the “/Kappa/CUDA/Module” section of the kappa.conf file. The typical value for CUDA_PATH is
“/usr/local/cuda”. Typical values for NVCC_PTX are either:

NVCC_PTX=/usr/local/cuda/bin/nvcc -m32 -I. -I/usr/local/cuda/include -O3 -o %s.ptx -ptx %s.cu

or

NVCC_PTX=/usr/local/cuda/bin/nvcc -m64 -I. -I/usr/local/cuda/include -O3 -o %s.ptx -ptx %s.cu

depending on whether the CUDA modules are compiled for 32 or 64 bit GPU address space. If the host machine is a 64 bit computer running a 64 bit operating system, then either value may be used.

(Kappa only supports the 32 bit GPU address space at this time.)

Also in the “/Kappa/CUDA/Module” section of the kappa.conf file are the default settings for JIT compilation. Settings these values in the kappa.conf file may make it unnecessary to set the attributes in the CUDA/Module statement. These labels and their default values are:

JITMaxRegisters=0

JITThreadsPerBlock=0

JITRecordCompileTime=false

JITOptimizationLevel=4

JITTargetFromContext=true

# The following %CUDA{} values come from [/Kappa/Translation/CUDA]

JITTarget=%CUDA{CU_TARGET_COMPUTE_13}

JITFallback=%CUDA{CU_PREFER_PTX}

Possible attributes are:

MODULE_TYPE

This sets the preferred module file type. Possible values are: CU_MODULE (the default), PTX_MODULE (the fallback if a ".cu" file is not found), ANY_MODULE, and FAT_MODULE. Example:

MODULE_TYPE=%KAPPA{CU_MODULE}

JITMaxRegisters

Please see the description of the cuModuleLoadDataEx function in the CUDA Reference Manual for a description of this attribute.

JITThreadsPerBlock

Please see the description of the cuModuleLoadDataEx function in the CUDA Reference Manual for a description of this attribute.

JITRecordCompileTime

Please see the description of the cuModuleLoadDataEx function in the CUDA Reference Manual for a description of this attribute which is named CU_JIT_WALL_TIME in that manual.

JITOptimizationLevel

Please see the description of the cuModuleLoadDataEx function in the CUDA Reference Manual for a description of this attribute.

JITTargetFromContext

Please see the description of the cuModuleLoadDataEx function in the CUDA Reference Manual for a description of this attribute which is named CU_JIT_TARGET_FROM_CUCONTEXT in that manual.

JITTarget

Please see the description of the cuModuleLoadDataEx function in the CUDA Reference Manual for a description of this attribute.

JITFallback

Please see the description of the cuModuleLoadDataEx function in the CUDA Reference Manual for a description of this attribute.

RECOMPILE

Specify true to force recompiling the ptx file even if it is newer. Example:

RECOMPILE=true

NVCC_OPTIONS

Set the string containing any additional nvcc options. The string will be stripped of anything that is not a dash, slash, underscore, alphanumeric, white space, equal sign ,dot, single quotes, or double quotes. Example:

NVCC_OPTIONS='-arch=compute_20'

CUDA/Variable/Module

Load a CUDA module variable.

Possible syntax is:

!CUDA/Variable/Module attributes -> variable_name@module_name = from_variable;

!CUDA/Variable/Module MODULE=module_name attributes -> variable_name = from_variable;

The variable_name must match the variable name declared in the CUDA module. The from_variable must be the name of a previously declared Kappa Variable.

There are no current attributes other than the MODULE attribute for the CUDA/Variable/Module statement at this time.

Note: do not forget to add the name of the variableto the map of any dependent CUDA/Kernel statements.

Keyword

Load module variable.

Possible syntax is:

!Keyword attributes -> new_keyword(shared_library) = factory_function;

!Keyword -> new_keyword(shared_library) = factory_function;

!Keyword attributes -> new_keyword(shared_library, factory_function);

!Keyword -> new_keyword(shared_library, factory_function);

By default, this statement is not enabled. See below for notes on why to not enable it and how to enable it anyway.

The new_keyword is registered as a new Kappa statement keyword. The shared_library may be an empty string, '', to try to located the factory_function in the main executable or its already loaded libraries. If the shared_library is not an empty string then it must be the complete file path name to the shared library containing the factory_function (and presumably to the kappa::command::Keyword class as well) for instantiating a kappa::command::Keyword subclass that implements the new keyword. The factory_function must be declared 'extern “C”' and return an instance of a subclass of the kappa::command::Keyword class cast to be type (void *).

Please see the keyword/CSV project source code files for implementation details. NOTE: using an external shared library that is dynamically loaded at runtime may cause the program to crash if the external shared library has not been compiled with the same ABI (application binary interface) version of the Kappa library. The Kappa library is ABI compatible throughout every major version number.. This means, for example, that all versions of the Kappa library 1.0 through 1.4 are ABI compatible. Using a shared library or DLL that is linked at program start up and setting the shared_library to an empty string or using the kappa::Process.RegisterKeywordCommand method provides safer functionality. Also, the kappa::Process.RegisterKeywordCommand method provides for passing a handle to user data or a pointer to a user object and is therefore the preferred method for adding Kappa Process keywords. (Psi Lambda LLC will not insist that you do not use this functionality but does highly discourage its use in uncontrolled or possibly insecure environments.) To enable this functionality, the kappa::Process.AddKeywordKeyword method must be called prior to use. Note that the kappa::Process.ProcessKeywordConfig method does not enable the Keyword statement (even though loading keywords from a configuration file suffers from the same problems). This a definite example of a situation where Psi Lambda LLC provides functionality but defaults to fail-safe settings and leaves it to the developer to ascertain whether and how the functionality should be used.

Possible attributes are:

REQUIRES_CUDA

Whether the new keyword implementation requires a functioning CUDA context in order to be functional. The default values is false. Example:

REQUIRES_CUDA=true

UNIQUE_NAME

Whether the new keyword implementation needs Kappa Process to implement a unique name for each command queue instance using all of the arguments and map. The default is false, that either the name given in the statement is sufficient or that it is desirable for the same named command to be (re)used. Example:

UNIQUE_NAME=true

Kappa/Routine

Load a Kappa compiled subroutine or function from a shared library.

Possible syntax is:

!Kappa/Routine -> Load('routine','shared_library');

CUDAConfig

This will initialize the CUDA configuration namespace Value objects, if not already done by calling the kappa::Process.InitCUDAConfig method.

Possible syntax is:

!CUDAConfig;

Resource Relinquishing Keywords

Resource relinquishing statements free, unload, or relinquish access to a resource.

ContextReset

Reset a context, freeing resources such as modules, variables, values, etc. This command will execute after all command dependencies, including secondary and further dependencies, on the current context finish. It is a very good idea to still explicitly invoke the specific resource relinquishing statements. This statement sets a CANCELED completion status to cancel any further dependent commands (commands invoked after this statement that uses the same resources).

Possible syntax is:

!ContextReset -> command_name;

There are no current attributes for the ContextReset statement at this time.

FreeValue

Free a Value. This command will execute after all command dependencies, including secondary and further dependencies, on the Value have finished. It will explicitly delete a Value from a Kappa Namespace. This statement sets a CANCELED completion status to cancel any further dependent commands (commands invoked after this statement that use the Value).

Possible syntax is:

!FreeValue -> value_name;

There are no current attributes for the FreeValue statement at this time.

Free

Free a Variable. This command will execute after all command dependencies, including secondary and further dependencies, on the Variable have finished. It will explicitly delete a Variable from a Kappa Context. This statement sets a CANCELED completion status to cancel any further dependent commands (commands invoked after this statement that use the Variable).

Possible syntax is:

!Free -> variable_name;

There are no current attributes for the Free statement at this time.

C/ModuleUnload

Unload a C/Module. This command will execute after all command dependencies, including and further secondary dependencies, on the C/Module have finished. It will explicitly delete a C/Module from a Kappa Context. This statement sets a CANCELED completion status to cancel any further dependent commands (commands invoked after this statement that use the C/Module). Note that the shared library may still stay resident in memory depending on the loading mechanism.

Possible syntax is:

!C/ModuleUnload -> module_name;

There are no current attributes for the C/ModuleUnload statement at this time.

CUDA/ModuleUnload

Unload a CUDA/Module. This command will execute after all command dependencies, including secondary and further dependencies, on the CUDA/Module have finished. It will explicitly delete a CUDA/Module from a Kappa Context. This statement sets a CANCELED completion status to cancel any further dependent commands (commands invoked after this statement that use the CUDA/Module).

Possible syntax is:

!CUDA/ModuleUnload -> module_name;

There are no current attributes for the CUDA/ModuleUnload statement at this time.

Stop

Inform the Kappa background process command queue processor to start exiting. This background process will try to finish any running commands but no further commands on the queue will be accepted. This command will execute after all prior command dependencies, including secondary and further dependencies, have finished. This and the Finish statement ensure that commands are processed prior to program exit.

Possible syntax is:

!Stop;

There are no current attributes for the Stop statement at this time.

Finish

Cause the Kappa Process thread execution to pause waiting for the background process to finish. This will shift thread scheduling priority to the background process. This command will execute after all prior command dependencies, including secondary and further dependencies, have finished. This and the Stop statement ensure that commands are processed prior to program exit.

Possible syntax is:

!Finish;

There are no current attributes for the Finish statement at this time.

PopContexts

Try to pop (and free) prior Kappa Context objects until the requested context name is current. This command will execute after all command dependencies, including secondary and further dependencies, on the Context have finished. f the requested context name can not found, then nothing happens. It will explicitly delete all intervening Kappa Context objects and all of their resources (Variable objects, Module objects, etc.). The named context and its resources become current. This statement sets a CANCELED completion status to cancel any further dependent commands (commands invoked after this statement that use the PopContexts).

Possible syntax is:

!PopContexts -> context_name;

There are no current attributes for the PopContexts statement at this time.

Processing Keywords

The processing statements launch C or CUDA kernels, invoke Kappa Process subroutines or functions, invoke callbacks to user functions, copy Variable objects, and invoke decision statements.

C/Kernel

Launch a C/Kernel.

Possible syntax is:

!C/Kernel attributes -> kernel_name@module_name(arguments) [map];

!C/Kernel MODULE=module_name attributes -> kernel_name(arguments) [map];

!C/Kernel -> kernel_name@module_name(arguments) [map];

!C/Kernel MODULE=module_name -> kernel_name(arguments) [map];

!C/Kernel kernel_name@module_name(arguments) [map];

The kernel_name must match the 'extern “C”' kernel function name declared in the C module or a FUNCTION attribute must be given that specifies the 'extern “C”' kernel function name declared in the C module. The module_name must be the name of a previously loaded C/Module. The arguments and the map may be optional, depending on the kernel being launched.

Possible attributes are:

MODULE

The module_name must be the name of a previously loaded C/Module.

FUNCTION

The 'extern “C”' kernel function name declared in the C module. If the FUNCTION attribute is given, then the kernel_name may be any unique text string identifier desired that does not conflict with other C/Kernel FUNCTION or kernel_name's. C/Kernel objects are assigned unique command names for command queuing that include their arguments and map.

The FUNCTION and kernel_name functionality may be further used to ensure unique command queue names and proper dependency processing. In other words, if you wish to launch multiple times the exact same kernel with the exact same arguments and map, use the FUNCTION attribute to specify the kernel function name and give a different kernel_name to each launch invocation.

Example:

FUNCTION=foobar

CUDA/Kernel

Launch a CUDA/Kernel.

Possible syntax is:

!CUDA/Kernel attributes -> kernel_name@module_name(arguments) [map];

!CUDA/Kernel MODULE=module_name attributes -> kernel_name(arguments) [map];

!CUDA/Kernel -> kernel_name@module_name(arguments) [map];

!CUDA/Kernel MODULE=module_name -> kernel_name(arguments) [map];

!CUDA/Kernel kernel_name@module_name(arguments) [map];

The kernel_name must match the kernel name declared in the CUDA module or a FUNCTION attribute must be given that specifies the kernel name declared in the CUDA module. The module_name must be the name of a previously loaded CUDA/Module. The arguments and the map may be optional, depending on the kernel being launched.

Possible attributes are:

MODULE

The module_name must be the name of a previously loaded CUDA/Module.

FUNCTION

The kernel name declared in the CUDA module. If the FUNCTION attribute is given, then the kernel_name may be any unique text string identifier desired that does not conflict with other CUDA/Kernel FUNCTION or kernel_name's. C/UDAkernels are assigned unique command names for command queuing that include their arguments and map.

The FUNCTION and kernel_name functionality may be further used to ensure unique command queue names and proper dependency processing. In other words, if you wish to launch multiple times the exact same kernel with the exact same arguments and map, use the FUNCTION attribute to specify the kernel function name and give a different kernel_name to each launch invocation.

Example:

FUNCTION=foobar

GRID

Specifies the grid for launching the CUDA kernel. See the CUDA Reference Manual cuLaunchGrid function for details. Specifying this attribute causes the GRIDX and GRIDY attributes to be ignored. Example:

GRID=[ 5, 8 ]

GRIDX

Specifies the grid width for launching the CUDA kernel. See the CUDA Reference Manual cuLaunchGrid function for details. Specifying the GRID attribute causes the GRIDX and GRIDY attributes to be ignored. Example:

GRIDX=5

GRIDY

Specifies the grid for launching the CUDA kernel. See the CUDA Reference Manual cuLaunchGrid function for details. Specifying the GRID attribute causes the GRIDX and GRIDY attributes to be ignored. Example:

GRIDY=8

BLOCKSHAPE

Specifies the block shape for launching the CUDA kernel. The default value, if a dimension is not specified, is one. See the CUDA Reference Manual cuFuncSetBlockShape function for details. Specifying the BLOCKSHAPE attribute causes the BLOCKSHAPEX, BLOCKSHAPEY, and BLOCKSHAPEZ attributes to be ignored. Example (letting the z dimension default to one):

BLOCKSHAPE=[ 16, 12 ]

BLOCKSHAPEX

Specifies the block shape x dimension for launching the CUDA kernel. The default value, if not specified, is one. See the CUDA Reference Manual cuFuncSetBlockShape function for details. Specifying the BLOCKSHAPE attribute causes the BLOCKSHAPEX, BLOCKSHAPEY, and BLOCKSHAPEZ attributes to be ignored. Example:

BLOCKSHAPEX=16

BLOCKSHAPEY

Specifies the block shape y dimension for launching the CUDA kernel. The default value, if not specified, is one. See the CUDA Reference Manual cuFuncSetBlockShape function for details. Specifying the BLOCKSHAPE attribute causes the BLOCKSHAPEX, BLOCKSHAPEY, and BLOCKSHAPEZ attributes to be ignored. Example:

BLOCKSHAPEY=12

BLOCKSHAPEZ

Specifies the block shape z dimension for launching the CUDA kernel. The default value, if not specified, is one. See the CUDA Reference Manual cuFuncSetBlockShape function for details. Specifying the BLOCKSHAPE attribute causes the BLOCKSHAPEX, BLOCKSHAPEY, and BLOCKSHAPEZ attributes to be ignored. Example:

BLOCKSHAPEZ=1

SHAREDMEMORY

Specifies the amount of dynamic shared memory available to each thread block when launching the CUDA kernel. The default value, if not specified, is zero. See the CUDA Reference Manual cuFuncSetSharedSize function for details. Example:

SHAREDMEMORY=( 12 * #BLOCK_SIZE * %sizeof{float} )

CACHE

Sets the preferred cache type for the CUDA kernel. Set this to CU_FUNC_CACHE_PREFER_SHARED to prefer more shared memory and set this to CU_FUNC_CACHE_PREFER_L1 to prefer more L1 cache.

CACHE=%CUDA{CU_FUNC_CACHE_PREFER_SHARED}

Please see the description of the cuFuncSetCacheConfig function in the CUDA Reference Manual for a description of setting the cache preference.

STREAM

The name of a stream to associate with the Kernel for doing concurrent kernel launches. The Kappa Stream object and CUDA stream and any necessary CUDA events are automatically created and managed. Example:

STREAM=mystream

Subroutine

Execute a previously loaded or defined Kappa Process subroutine.

Possible syntax is:

!Subroutine attributes -> subroutine_name;

!Subroutine -> subroutine_name;

!Subroutine subroutine_name;

The subroutine_name must match the subroutine name previously loaded or defined in a Kappa Process subroutine. See the section Subroutine and Function Definition for how to define subroutines and functions and see this section for how to output and load subroutines. Since subroutines operate in the namespace context they are invoked at, they effect Value objects and Variable objects as the subroutine's statements execute. In this sense, subroutines are not “atomic” with respect to the completion and cancellation of their statements and their effects on non subroutine Value objects and Variable objects.

Completion and cancellation of statements occurs throughout subroutine and function invocations. It is possible and normal for some statements within a subroutine or function to execute to completion while others are canceled or canceled prior to invocation.

Possible attributes are:

UNROLL

If set to false (the default), then subroutine loops are dynamic—there is a Namespace loop Value that is dynamically used for looping. If set to true, then the subroutine is unrolled as a macro for the number of times set by the LOOP attribute.

LOOP

If the UNROLL attribute is given and true, then this specifies how many times to expand the subroutine as if it is a macro—the default is to expand it once.

If the UNROLL attribute is not given or is false, then this sets the initial loop count for the subroutine. The default value is one. This is set into a Kappa Namespace Value with the name: "/kappa/routine/subroutine_name/loop". The subroutine's statements may modify this Value to lengthen or shorten the number of execution loops performed. At the end of the execution of the subroutine statements, the current value of this Kappa Namespace Value is decremented by one and then, if the Value is still greater than zero, the subroutine's statements are executed again.

Function

Execute a previously loaded or defined Kappa Process function.

Possible syntax is:

!Function attributes -> function_name(arguments) [map];

!Function attributes -> function_name;

!Function -> function_name(arguments) [map];

!Function -> function_name;

!Function function_name(arguments) [map];

!Function function_name;

The function_name must match the function name previously loaded or defined in a Kappa Process function. See the section Subroutine and Function Definition for how to define subroutines and functions and see this section for how to output and load functions.

Function arguments are copied on input and output as appropriate (depending on whether the argument is an input, output, or both). Literal values, such as a text string or a number, may only be inputs. Value objects and Variable objects may be both inputs and outputs. Since the function works on local copies of the Value objects and Variable objects, if the statements in the function cancel midway through the function, the external (to the function) Value objects and Variable objects are not effected. In this sense, functions are “atomic” with respect to the completion and cancellation of their statements and their effects on non function Value objects and Variable objects.

Completion and cancellation of statements occurs throughout subroutine and function invocations. It is possible and normal for some statements within a subroutine or function to execute to completion while others are canceled or canceled prior to invocation.

There are no current attributes for the Function statement at this time.

GetNullMask

Copy a Variable's NullMask into a separate Variable.

Possible syntax is:

!GetNullMask attributes -> new_variable_name = previous_variable_name;

!GetNullMask -> new_variable_name = previous_variable_name;

! GetNullMask new_variable_name = previous_variable_name;

There are no current attributes for the GetNullMask statement at this time.

SetNullMask

Use a NullMask (two dimensional) Variable to over write a Variable's NullMask.

Possible syntax is:

!SetNullMask attributes -> variable_name = null_mask_variable_name;

!SetNullMask -> variable_name = null_mask_variable_name;

! setNullMask variable_name = null_mask_variable_name;

There are no current attributes for the SetNullMask statement at this time.

CopyVariable

Copy one Variable to another Variable.

Possible syntax is:

!CopyVariable attributes -> new_variable_name = previous_variable_name;

!CopyVariable -> new_variable_name = previous_variable_name;

!CopyVariable new_variable_name = previous_variable_name;

There are no current attributes for the CopyVariable statement at this time.

Decision

Decide whether to continue or cancel execution for the dependency arguments given.

Possible syntax is:

? logical_expression -> arguments;

The arguments for a decision statement are a list of Variable objects and Value objects to make dependent on the decision. The logical_expression must only contain numbers, configuration values, Value objects, and logical or numeric operators. All configuration values and Value objects must be resolvable to numeric or boolean values. The logical_expression is evaluated and, if it is zero, the decision statement reports a CANCELED completion status. This cancels execution of all statements that are dependent on the Variable objects or Value objects in the arguments list. It reports a FINISHED completion status if the logical_expression evaluates to nonzero. It reports a FAILED completion status (which will have the same effect as a CANCELED completion status) if the logical_expression can not be evaluated to an integer or boolean value (for example if configuration values or Value objects can not resolve or that resolve to text values that are not convertible to numeric values).

For boolean values for attributes or Value objects, "true" or "on" and "false" and "off" or numeric values may be given. For Indices Value objects, a comma delimited list of integer values with optional spaces may be given. For logical and numeric expressions, configuration values and Value objects may be used, as long as they are resolvable. For logical and numeric expressions, parenthesis are supported and outer, enclosing parenthesis may be mandatory for proper classification. For logical and numeric expressions, the following binary operators are currently supported:

+ - * / ^ or and xor < > == <= >= !=

IO

This executes an IOCallback if one is found that is registered with the same callback_name as given in the IO statement. See the IOCallbacks section for further information on using the IOCallback functionality.

Possible syntax is:

!IO -> callback_name(variable_name) [ map ];

The map should be one of:

[ = variable_name ]

[ variable_name = ]

[ variable_name ]

or

[ variable_name = variable_name ]

There are no current attributes for the IO statement at this time.

Informational Keywords

Print

Print values or Value objects to standard output.

Possible syntax is:

!Print (arguments);

arguments is a list of values or Value objects to print.

There are no current attributes for the Print statement at this time.

Timer

The statement:

!Timer -> <timer_name>;

will, the first time it is given, create a timer. The subsequent times it it is given, it will output to standard error the elapsed time since the timer was last invoked:

Processing time: 42.984 (ms)

The times may include scheduling delays, system overhead and delays, etc.

Context

The statement:
!Context -> <context_name>;
will, if if is subsequently repeated in the same Kappa Process, report to standard error the device memory usage:
Device: Starting Free Memory: 514719744
Ending: Free Memory: 514719744
Difference Memory: 0
Total: 536674304
Used: 146748
When the “!Context” statement with the same context name is given (repeated), it is given a “final” dependency status. This means that it is deferred for execution until all other dependencies on that named context have finished.

DisplayAll

Displays, to standard output, the GPU properties and attributes.

Possible syntax is:

!DisplayAll;

CUDA/Kernel/Attributes

The attributes of a kernel, as it is actually compiled and loaded on the GPU to execute, can be retrieved by executing the statement:

!CUDA/Kernel/Attributes MODULE=<module_name> -> <kernel_name>;

where <module_name> is the module name given and <kernel_name> is the name of the kernel. This create the following Value objects:

/kappa/CUDA/<module_name>/<kernel_name>#MaxThreadsPerBlock

/kappa/CUDA/<module_name>/<kernel_name>#StaticSharedMemory

/kappa/CUDA/<module_name>/<kernel_name>#RegistersPerThread

/kappa/CUDA/<module_name>/<kernel_name>#ConstantMemory

/kappa/CUDA/<module_name>/<kernel_name>#ThreadLocalMemory

/kappa/CUDA/<module_name>/<kernel_name>#PTXVersion

/kappa/CUDA/<module_name>/<kernel_name>#BinaryVersion

Please refer to the CUDA Reference Manual or Programming Guide for further details on these attributes and their uses.

SQL Keyword

The SQL keyword currently has:

commands.

The SQL keyword can use the KappaAPRDBD database driver to use the Apache Portable Runtime (APR) database drivers. The KappaAPRDBD database driver is a subclass of the kappa::DBD class. Other database drivers may be used by properly subclassing the kappa::DBD class contained in kappa/DBD.h.

SQL keyword operations are kept in dependency order by database handle identifier (dbhandle). This means that commands on the same dbhandle are sequential. To have multiple, simultaneous, asynchronous database operations, use separate database handle identifiers.

SQL statements for the SQL keyword may consist of single or double quoted strings embedded in the scheduling script or may be retrieved from configuration files as configuration values. Currently, multi-line quoted strings are not supported in the scheduling scripts and must instead be placed in configuration files.

It may be helpful to refer to the Example usage of the SQL and Expand keywords.section while reading the following sections.

Format strings

Format strings are used to specify the binary data structure and, if necessary, any padding needed for reading and writing from Variables and the database. They are similar to the format strings used in the Apache Portable Runtime which, in turn, are similar to the format strings for the scanf function. The format will specify the data conversion to or from the database connection. For calculating the size taken in the binary format in the Variable, the sizes are rounded up to the next padding size of a multiple of 4.

The following are supported:

Meta specifiers

In addition, there are the following 'meta' specifiers:

If the format is for a STAR format operation, then the first format specifiers specify field conversions for dimension fields. The '+' specifier, designates that the specifiers that follow it are for measure fields. The '-' specifier, designates that the specifiers that follow it are not for either dimension or measure fields.

The '=' specifier, marks the next field as a field for a primary key id. In operations that support it, these primary key values are placed in separate Variables.



Assuming an example query where:

Conversions

Format specifiers will convert the database field to and from the data type requested, if possible. The format specifiers correspond to the binary data structure desired in the Variable--not necessarily to the data type of the database field.

Aligntest

There is an example program, aligntest, included in the Kappa share/extras for checking the alignment and padding of data structures on the host and GPU. See the INSTALL file for compilation instructions. Modify the aligntest.h, main.cpp, and aligntest.cu files as appropriate to test the data structure you wish to learn about. As distributed, the program puts out the following:

Offset 0: 4

Offset 1: 4

Offset 2: 4

Offset 3: 4

Offset 4: 4

Offset 5: 4

Offset 6: 4

Offset 7: 4

Offset 8: 8

Alignment mismatch 0: 0

Alignment mismatch 1: 0

Alignment mismatch 2: 0

Alignment mismatch 3: 0

Alignment mismatch 4: 0

Alignment mismatch 5: 0

Alignment mismatch 6: 0

Alignment mismatch 7: 0

Alignment mismatch 8: 0

Showing that the first six dimension unsigned structure fields (dima - dimf) have a size of 4 bytes and no padding, the unsigned measurea and float measureb have a size of 4 bytes each and no padding, and the double measurec has a size of 8 bytes and no padding to the end of the structure. This also shows (in this example output) that there are no difference between this structure on a 64 bit host system and the code on a GPU compiled for 32 bits.

This is also a good, simple example of the usage of the IO keyword and callback mechanism.

SQL keyword connect command

!SQL -> connect@dbhandle(driver_name,DBPARAMS);

The dbhandle identifies the database connection. SQL keyword operations are kept in dependency order by dbhandle--commands on the same dbhandle are sequential.

The driver_name is usually one of: pgsql, freetds, mysql, odbc, sqlite, or oracle.

On Linux, the RPM package names and descriptions are:

apr-util-freetds.x86_64 : APR utility library FreeTDS DBD driver

apr-util-mysql.x86_64 : APR utility library MySQL DBD driver

apr-util-odbc.x86_64 : APR utility library ODBC DBD driver

apr-util-pgsql.x86_64 : APR utility library PostgreSQL DBD driver

apr-util-sqlite.x86_64 : APR utility library SQLite DBD driver



apr-util-oracle is not generally available in distributions but may be built using the apr-util source code.

On Windows, the following are installed with Kappa:

apr_dbd_mysql-1.dll

apr_dbd_odbc-1.dll

apr_dbd_oracle-1.dll

apr_dbd_pgsql-1.dll

apr_dbd_sqlite3-1.dll

The DBPARAMS are:

username, password, appname, dbname, host, charset, lang, server

each followed by an equal sign and a value. Such key/value pairs can be delimited by white space, semicolon, vertical bar or comma.

host, port, user, pass, dbname, sock, flags, fldsz, group, reconnect

each followed by an equal sign and a value. Such key/value pairs can be delimited by white space, semicolon, vertical bar or comma.

datasource, user, password, connect, ctimeout, stimeout, access, txmode, bufsize

each followed by an equal sign and a value. Such key/value pairs can be delimited by white space, semicolon, vertical bar or comma.

user, pass, dbname, server

each followed by an equal sign and a value. Such key/value pairs can be delimited by white space, semicolon, vertical bar or comma.

The connection string is passed straight through to Pqconnectdb similar to the following:

host=myhost port=5432 dbname=mydb user=mypguser password=mypwd

The connection string is passed straight through to sqlite3_open.

SQL keyword disconnect, begin, commit, and rollback commands

!SQL -> disconnect@dbhandle();

!SQL -> begin@dbhandle();

!SQL -> commit@dbhandle();

!SQL IF_FAIL=true IF_CANCEL=true -> rollback@dbhandle();

By default the database connection defaults to the transaction control of the underlying database driver. For most drivers, this defaults to immediate transaction commits. To have an explicit, multiple command transaction, use the begin, commit, and rollback commands. A transaction includes the commands between a begin command and the next commit, rollback, or disconnect. A commit or rollback may cause the transaction mode to switch back to the default transaction mode until another begin command is given.

An implicit or explicit disconnect command, if proceeded by an open begin command, defaults to an implicit rollback. This means that, because commands are dependent on the dbhandle and that dependent commands are canceled on the failure of a command, that:

(all on the same dbhandle) causes the usual desired behavior of having the transaction commit

if all the commands are successfully executed and to have the transaction rollback if any fail.

SQL keyword select command

!SQL -> select@dbhandle(sql_select_statement, format_string, #num_rows, #num_cols, #row_size);

!SQL -> select@dbhandle(sql_select_statement_with_formatting, index, Categories_Variable, format_string, #num_rows, #num_cols, #row_size);

!SQL -> select@dbhandle(sql_select_statement_with_formatting, index, Categories_Variable, Categories_Variable_Key, format_string, #num_rows, #num_cols, #row_size);

The sql_select_statement or sql_select_statement_with_formatting are a SQL select or other SQL statement that results in a record set. If there is formatting in the statement, it is to specify the parameter binding for data contained in the Categories_Variable or (if a primary key format specification is given) the Categories_Variable_Key. The index is the row offset to use for the Categories_Variable and/or Categories_Variable_Key Variables when binding parameters for the statement.

The format_string specifies the conversion and formatting that will occur when reading data.

The #num_rows, #num_cols, and #row_size are Namespace Value names for the number of rows readable from this statement, the number of columns, and the row size (in bytes) respectively.

Possible attributes are:

ASYNC

If set to false (the default), then the statement is executed synchronously. If set to true, then the statement is executed on another host thread.

STAR

If the STAR attribute is given and true, then STAR database format processing is enabled. STAR database format processing converts dimension character fields that are not a form of integer to a corresponding integer translation value (specify a key Variable to record and keep the association between the integer value for a translated value that is usable on a GPU and the original value still in the database). Dimensions and measures are tracked separately.

RANDOM

If set to true (the default), then the cursor allows random access scrolling. If set to false, then the cursor is not set for random access scrolling and will work for database drivers that do not support random access cursor scrolling.

SQL keyword read command

!SQL -> read@dbhandle(Variable,#max_rows_to_read,#rows_read);

!SQL -> read@dbhandle(Variable, Keys_Variable, #max_rows_to_read, #rows_read);

The Variable is the name of the Variable to copy the data into. The Keys_Variable is the name of the Variable to copy the primary key id data into. The keys are useful for keeping the association between translated dimension field values and the corresponding record in the database. The keys are also useful for fast SQL update and delete operations using the SQL keyword write command or for selecting specific rows using the SQL keyword select command with bound Variables.

The #max_rows_to_read is the (maximum) number of rows to read in this operation. The #rows_read is the Namespace Value name for the number of rows actually read in this operation.

Numeric integer NULL values are read as zero, float or double NULL values are read as quiet NaN (quiet_NaN), and any NULL values that convert to character arrays convert to empty character arrays.

Possible attributes are:

ASYNC

If set to false (the default), then the statement is executed synchronously. If set to true, then the statement is executed on another host thread.

FAST

If the FAST attribute is given and true, then the Variable is not cleared with zeroes prior to the database read. If there are character arrays or NULL fields, then parts of these content locations in the Variable may be the previous content.

SQL keyword write command

!SQL -> write@dbhandle(sql_statement_with_formatting,Output_Variable, Output_Keys_Variable, #rows_to_write, #rows_affected);

!SQL -> write@dbhandle(sql_statement_with_formatting,Output_Variable, #rows_to_write, #rows_affected);

The sql_statement_with_formatting is a SQL statement with formatting in the statement, that specifies the parameter binding for data contained in the Output_Variable or (if a primary key format specification is given) the Output_Variable_Key. Usually the SQL statement would modify the database in some way but this command can be used for any SQL statement that needs to bind parameters but does not return a record set to be read.

The #rows_to_write is the (maximum) number of rows of the Output_Variable or (if a primary key format specification is given) the Output_Variable_Key to bind to the parameters in the SQL statement and execute in this operation. The #rows_affected is the Namespace Value name for the (total) number of database rows actually affected in this operation.

Possible attributes are:

ASYNC

If set to false (the default), then the statement is executed synchronously. If set to true, then the statement is executed on another host thread.

SQL keyword execute command

!SQL -> execute@dbhandle(sql_statement,#rows_affected);

The sql_statement is a SQL statement to execute. No parameter binding is done and it is assumed that there is no resulting record set.

The #rows_affected is the Namespace Value name for the (total) number of database rows actually affected in this operation.

Possible attributes are:

ASYNC

If set to false (the default), then the statement is executed synchronously. If set to true, then the statement is executed on another host thread.

Synchronize Keyword

!Synchronize name(args)

The Synchronize command halts preparation of instructions at the point it is encountered until the Process Notification Finish for it is encountered. This will make the foreground preparation of instructions stop until the Synchronization command is executed in the background—which will not happen until its dependencies are done.

Note that CUDA/Kernel GRID, BLOCKSHAPE, SHAREDMEMORY, and others that are not listed below do NOT need a !Synchronize; command--they are resolved on the background thread.

Note that Value can retrieve an Indice or a single number from a Variable. This means that an Indice or number can come from any source, GPU or CPU calculation, SQL query, etc.

The following are resolved at prepare time and need a !Synchronize (); command if they use a calculated Value (instead of a literal string, number, or truth value):

Expand Keyword

Assuming a subroutine with a labels attribute (and maybe a labelset attribute) similar to the following:

<kappa subroutine=mysub labels='$label'>

!Print ('label_$label ', #value_$label );

</kappa>

along with a statement:

!Expand -> mysub(label_limits);

allows replacing, in the subroutine, any occurrence of $label with 0..#number

where #number is the corresponding label limit argument in the Expand statement (use a Synchronize statement if the #number is calculated):

!Synchronize (#number);

The Expand statement allows creating subroutines as (tensor) indexed components that can be expanded dynamically at runtime. This gives a real, practical implementation that allows for true concurrent kernel execution and algorithm step sizing. Properly used, this allows maximum occupancy and use of GPU and CPU.

Labels can be placed in attribute values (such as stream ids), module names, kernel names, Value names, kernel arguments, and Variable names–among other places. These labels are then expanded (at runtime) with numeric ranges or Indices. Numeric ranges and Value Indices can be sliced from Variables using prior Kappa Library features. This allows for automatic parallelism and sizing using runtime data in a natural (tensor) index component manner. These labels can be used to create parallel execution dependency streams, vary across GPU or CPU, select/split/slice datasets for parallelism, select kernels, perform data parallel combinatoric expansions, etc.

The label_limits are arguments that are either unsigned integers or Indices. The number of arguments must match the number of labels given. If an argument to Expand is an unsigned integer, label_limit, then the label $label is replaced with 0 .. label_limit. If the argument is a Value Indices, then it is replaced with each value in the Indices. In either case, there is a separate expansion of the subroutine for each label replacement except for cases where a label occurs in the Value for a Subroutine statement LOOP attribute.

In the case where a label occurs in the Value for a Subroutine statement LOOP attribute, the label substitution that occurred in the LOOP Value is also used for the same label in the subroutine.

If a subroutine with a 'labels' attribute is called with !Subroutine; as the top level call, then each label is replaced with a literal zero (there is always a default action for any subroutine with labels).

Possible attributes are:

LABELSET

A labelset attribute may be given when defining a subroutine and the corresponding LABELSET attribute should be used for the Expand statement. Only subroutine labels with the matching labelset are expanded using the arguments from the Expand statement—default expansions are done for other labelset labels. Using a labelset is a good idea for anything that might become part of a library. This prevents labels from conflicting between libraries and user code.





Example usage of the SQL and Expand keywords.

The following example shows the combined usage of the SQL and the Expand keywords to dynamically size and run in parallel a task to retrieve data from a SQL data source for processing by a GPU. This example assumes a database table in standard star format named STAR_TABLE that has a a field, cat_pk_sid, that is usable for splitting the processing into parallel operations. This field would generally have a foreign key relationship to a master table that defines the permissible values for this field.

This example consists of two Kappa subroutines: sqlio and sqlprocess. The subroutine sqlio is unrolled within the sqlprocess subroutine using the Subroutine statement. The subroutine sqlprocess is expanded in the main Kappa scheduling script—it also expands the labels in the sqlio subroutine.

The SQL keyword read commands in the sqlio subroutine (and their corresponding select commands in the sqlprocess subroutine) are executed asynchronously. The CUDA/Kernel launches in the sqlio subroutine use the same stream identifier as the corresponding Variable creation statements in the sqlprocess subroutine and so they execute on the same CUDA streams as the Variables use for data transfer. Since the streams are expanded, the data transfers are overlapping with other data transfers and kernel launches and, if a suitable (GF100) GPU is being used, the kernel launches give concurrent kernel execution.

The SQL operations on the dbhandle_$a are expanded and so, if they have an ASYNC=true attribute, run asynchronously in parallel.

This example is able to execute the SQL operations in parallel and the CUDA kernels concurrently at very high speed on commodity multi-core CPU and GF100 hardware.



<kappa subroutine=sqlio labels='$a' labelset='sql'>

// The main IO loop

!SQL ASYNC=true FAST=true -> read@dbhandle_$a(OUT_$a, #chunk_size, #rows_read_$a);

!CUDA/Kernel STREAM=str_$a OUTPUT_WARNING=false -> sqltest@sqltest(OUT_$a, #rows_read_$a) [ = OUT_$a #rows_read_$a];

</kappa>



<kappa subroutine=sqlprocess labels='$a' labelset='sql'>

!SQL -> connect@dbhandle_$a('pgsql',{PGPARAMS});

!SQL ASYNC=true STAR=true -> select@dbhandle_$a('select pk_sid, dima, dimb, dimc, dimd, dime, dimf, measurea, measureb, measurec from star_table where cat_pk_sid= %u order by dima;', $a, Categories, '=%lu %u %u %u %u %u %u +%f %u %lf', #num_rows_$a, #num_cols_$a, #row_size_$a);



// Get the number of rows to process at once using an if evaluation.

!Value -> rows_allocate_$a = if ( ( #chunk_size < #num_rows_$a ) , #chunk_size , #num_rows_$a );

!Variable STREAM=str_$a VARIABLE_TYPE=%KAPPA{LocalToDevice} -> OUT_$a(#rows_allocate_$a, #row_size_$a);



// Calculate how many iterations based on the number of rows and

// how many rows to process at once.

!Value -> numloops_$a = ( #num_rows_$a / #chunk_size );



// Perform a synchronization so the #numloops_$a Value is ready

!Synchronize (#numloops_$a);

!Print ('number of loops: ', #numloops_$a, ' = ' , #num_rows_$a, ' / ' , #chunk_size );



!Subroutine LABELSET='sql' UNROLL=true LOOP=#numloops_$a -> sqlio;

!SQL -> disconnect@dbhandle_$a(); // disconnect dbhandle

</kappa>





<kappa>

// Setup the CUDA context and load the CUDA module

!Context -> context;

!CUDA/Module -> sqltest = 'sqltest/sqltest.cu';



//Set the size of the data to process at once

!Value -> chunk_size = 65536;



// Connect to the database and get the categories to use for splitting into parallel processes

!SQL -> connect@dbmain('pgsql',{PGPARAMS});

!SQL -> select@dbmain('select distinct cat_pk_sid from star_table;', '%u', #num_rows_cat, #num_cols_cat, #row_size_cat);

!Variable -> Categories(#num_rows_cat,#row_size_cat);

!SQL -> read@dbmain(Categories,#num_rows_cat,#rows_read_cat);

!SQL -> disconnect@dbmain();



!Value -> cat_indice = Categories;

!Print ( 'number of categories: ', #rows_read_cat, 'categories: ', #cat_indice);

// Synchronize the Value of how many categories so that Expand can use it as an argument

!Synchronize (#rows_read_cat);



// Expand and run the processing in parallel across the categories

!Expand LABELSET=sql -> sqlprocess(#rows_read_cat);



// Unload, cleanup, stop

!CUDA/ModuleUnload -> sqltest;

!ContextReset -> Context_reset;

!Stop;

!Finish;

</kappa>


The database schema for the previous example is shown in this image:

with a data structure similar to the following:

typedef struct {

unsigned dima;

unsigned dimb;

unsigned dimc;

unsigned dimd;

unsigned dime;

unsigned dimf;

float measurea;

unsigned measureb;

double measurec;

} TEST_STRUCT;

and with contents in the star_category table of the following:

cat_pk_sid | name | description

------------+------+-----------------

1 | CATA | First category

2 | CATB | Second category

3 | CATC | Third category

4 | CATD | Fourth category

Example program (Kappa Interpreter and "Compiler"): ikappa

Kappa comes with an example command line program to use for developing Kappa Process processes. It can be used interactively or with batch files for testing. Running it with the "-h" option makes it display its usage:

$ ikappa -h

Usage is: ikappa [options] [kappa_file.k ...]

The form of the options is: [-achilkns] [-d device# ] [-o output_directory] [-m module_file] [-f routine_function] [-e routine_name]

The options are:

-a Remember anonymous statements.

-c Process keyword config.

-d Use device number device#.

(This option may be given multiple times.)

-e Specifies the subroutine name to load and execute from a library.

(This option may be given multiple times.)

-f Specifies the function or subroutine name to load from a library.

(This option may be given multiple times.)

-i Register IOCallback.

-l Set CUDA address space to 64 bit. Assumes that '-m32' is not given to nvcc.

-k Add Keyword keyword.

-n Do not execute statements.

-s Do not read standard input if no file arguments are given.

-o Output routines to this output_directory.

-m Specifies the module library file for loading routines from.

-h Print this usage statement.

The statements may be ran on multiple GPUs by specifying the "-d" option for each GPU. The order of the specification of the "-d" option will select in what order the GPUs are used.

The "-a", "-n", and "-o" options are useful to process the statements from the Kappa Process statement files into C++ source code files in a project directory. Please note that the "-o" option will overwrite any files of the corresponding names. Created in the project directory is a file named "compile.sh" which contains the commands necessary to configure and compile a libKappaRoutines shared library on platforms that support the autoconf tools.

If no Kappa Process statement file is given, then it reads Kappa Process statements from the standard input. So, assuming there is a Kappa Process statement file named test.k, then the following two commands are different ways to execute that file:

$ ikappa test.k

$ ikappa < test.k

Multiple Kappa Process statement files may be given as long as only the last file contains the !Stop; and !Finish; statements.

Page 53