Skip to content

The C++ Abstraction Layer

Rahul Iyer edited this page Jul 1, 2015 · 10 revisions

Writing Code for the MADlib® C++ Abstraction Layer

Preamble

The MADlib C++ Abstraction Layer provides a means for writing platform-independent user-defined functions in C++. It provides a complete abstraction, performs all necessary type checking and embraces the Eigen C++ Library for providing an intuitive and clean interface to high-performance linear-algebra functions (LAPACK).

Example:

AnyValue student_t_cdf(AbstractDBInterface &db, AnyValue args) {
    AnyValue::iterator arg(args);

    // Arguments from SQL call
    const int64_t nu = *arg++;
    const double t = *arg;

    /* We want to ensure nu > 0 */
    if (nu <= 0)
        throw std::domain_error("Student-t distribution undefined for "
            "degree of freedom <= 0");

    return studentT_cdf(nu, t);
}

Features

  • Performs type checking of function argument
    • Lossless conversion of pass-by-value is done implicitly (e.g., from uint32_t to uint64_t)
    • Implicit lossy conversion will throw an exception (e.g., from uint64_t to uint32_t)
  • Supports pass-by-reference whenever possible (performance!). However, if the user code asks for a mutable object but the database prohibits direct modification, a copy is automatically created.
    • Well-behaved/non-hacky user code cannot accidentally corrupt database data
  • The only supported means for user code to communicate with the DBMS backend is through the interface provided by AbstractDBInterface/AbstractAllocator (truly platform-independent!).
  • Integration of Armadillo for linear algebra operations (Armadillo itself is a C++ wrapper for LAPACK). This allows for intuitive math notation in C++ code.

Interesting Implementation Details

PostgreSQL port

  • Overloads the global throw/nothrow variants of operator new and operator delete to use palloc/pfree
  • All memory allocation is funneled through the PGAllocator class
  • All callbacks into the backend (in particular palloc/pfree) occur within PG_TRY/PG_CATCH blocks. This ensures that any postgres exception raised by ereport will return back to the calling C++ function. There we throw a C++ exception, which is caught at just above the C/C++ boundary. From there the PostgreSQL exception is rethrown. This procedure ensures that the C++ stack is always unwound properly (otherwise the longjump done by ereport would lead to behavior that is undefined by the C++ standard).
  • Callbacks into the DBMS backend where no exceptions are permitted (operator new (std::nothrow)) deactivate interrupt processing for the duration of their callback (interrupts are, of course, still properly recorded while processing is disabled). This is to ensure that no database signals get lost -- otherwise, e.g., it would be indistinguishable for the caller of a failed operator new (std::nothrow) whether the NULL pointer is due to a SIGINT or because of full memory. The rationale here is that by disabling interrupts, the SIGINT would get preserved and be dealt with at an appropriate later point.
Clone this wiki locally