Distributed, massively parallel processing for PostgreSQL
Field Forge™ brings massively parallel processing (MPP) to PostgreSQL’s current single-threaded sessions. Field Forge™ utilizes the MPP power of the Kappa framework. The Kappa framework provides practical usage of CUDA GPU, OpenMP, and partitioned data flow scheduled processing. Field Forge™ make the Kappa framework from Psi Lambda LLC a new Language for defining Window and Table functions. These functions allow processing to be specified using SQL and index component notation for MPP using GPUs and CPUs. Within each Field Forge™ node, the Kappa framework passes (subsets) of the data sets between processing kernels and into and out of data sets. Field Forge™ also utilizes the Kappa framework’s Apache Portable Runtime (APR) database driver SQL connections to retrieve data fields from any database source (including other Field Forge sessions and nodes), process them using the MPP capabilities of the Kappa framework, and return them as PostgreSQL table or window fields returned from table or window functions respectively. This combination of features enables a Dataset Passing Interface (DPI) for distributed MPP. DPI leverages the existing skills, protocols, connectivity, and infrastructure of an organization.
For data in a database in star schema format or in an OLTP schema, you can use the Kappa framework for high speed, massively parallel processing of the data. The data is transferred in binary form using an extension of the Apache Portable Runtime parameter specification that also specifies the data structure layout for CUDA or OpenMP data structures. If your data is not in a database, you can use existing C, C++, Perl, or Python libraries within the Kappa framework to access the data.
The Kappa framework provides for:
* Transferring with as few processing steps as possible,
* (optional) primary key fields used to specify record selection for reading and updating,
* dimension and measure field support,
* the ability to normalize nonnumeric dimension discrete fields,
* and the ability to use database fields to split the task for parallel transfer and computation
provide for scalable, efficient, and high speed transfer and processing. These techniques provide for full, efficient usage of database server, bandwidth, and CPU and GPU capacity.
The Kappa Library uses producer/consumer data flow scheduling which maps well to database transactional processing. The data flow scheduling can be indexed and scaled using data from SQL operations or CPU or GPU calculations. GPU kernel launches can be dynamically sized from SQL or other data sources. The data flow scheduling is declarative using SQL and index component (tensor) notation. The data flow scheduling can be specified once and then automatically sized by the data contained in the database data set. The data flow scheduling can be compiled into shared libraries for distribution.
The Kappa Library currently has bindings for C/C++, SQL, Perl, Python, Ruby, Lua, and .Net. Java (and other language) bindings are available but not yet tested. Languages with bindings can be mixed together within a single processing task to implement different steps of the processing.
Currently, the KappaAPRDBD driver is available for use of the Apache Portable Runtime database drivers. Apache Portable Runtime database drivers mean that the drivers are accessible, supported by a community, and can work with the data sources you wish to use. The APR pgsql dbd driver is robust, performs well, and is available for most platforms.