Return to Main Page


SQLite user-defined functions and Perl

SQLite is an embedded, compact SQL database engine, reliable and very fast. Its popularity is growing rapidly and at the moment it is supposed to be the most widely deployed SQL database engine in the world (source http://sqlite.org/mostdeployed.html). At the same time SQLite has a limited number of core functions available by default and does not support stored procedures.

To circumvent such limitations and enhance the RDBMS performance one can use DBD::SQLite driver private methods to write user-defined functions thus embedding Perl processing power into SQLite native SQL queries. Perl can provide SQLite with capabilities that this RDBMS would never have on stand-alone basis.

For illustration purpose we create an SQLite database data.db with a single table called employees that will contain a list of employees and their respective salaries:

CREATE TABLE IF NOT EXISTS employees (name TEXT, salary INT)
INSERT INTO employees VALUES('JANE DOE', 8500)
INSERT INTO employees VALUES('JOHN DOE', 3500)
INSERT INTO employees VALUES('RICHARD ROE', 2500)
INSERT INTO employees VALUES('JOHN SMITH', 10500)
INSERT INTO employees VALUES('WALTER JOHNSON', 10500)
INSERT INTO employees VALUES('ANNE SMITH', 7500)

DBD::SQLite supports two methods for registering new functions to handle string, numeric and aggregate input.

To define a regular non-aggregate function one needs to provide a name of the function as its first argument, total number of arguments the function will use (setting it to -1 permits any number of arguments), and a reference to the function implementation, usually a subroutine that will process field data supplied by SQL query.

The following statement creates a new SQLite function that will embed Perl's s///, a regular expression substitution operator, into an SQL query:

$dbh->func('s', 2, sub
{
eval '$_[0] =~ s'.$_[1] or die "incorrect regexp syntax!";
return $_[0];
},
'create_function');

The function named s takes two arguments: the first one is a string to substitute and the second one is the regular search and replace pattern. Being a scripting language Perl has eval function that significantly reduces code complexity and its size. The eval in our case concatenates two strings first and then executes the resulting substitution operator.

To show how it works we put our user-defined function into a Perl script (see below) that will update the employees table changing the upper case names that we have to the title case ones (e.g. WALTER JOHNSON will be replaced with Walter Johnson). Please also note proper use of backslashes before regular expression's special characters like \ and $.

#!/usr/bin/perl

use strict;
use DBI;

my ($dbh, $sth);

$dbh = DBI->connect("dbi:SQLite:dbname=./data.db","","");

$dbh->func('s', 2, sub
{
eval '$_[0] =~ s'.$_[1] or die "incorrect regexp syntax!";
return $_[0];
},
'create_function');

$dbh->do("UPDATE employees SET name = s(name, '/\\b(\\w)(\\w+)/uc(\$1).lc(\$2)/eg')");

undef $sth;
$dbh->disconnect();

exit(0);

Modifier e tells s/// to evaluate and execute the replacement string and substitute its results for the operator's match side. This feature is very handy when you need to do some computation, sometimes quite sophisticated, on a respective database field values. It is possible to put there a number of available Perl built-ins (e.g. Perl numeric and math functions that SQLite lacks up to now) to improve SQLite performance.

The script has undef $sth inserted to avoid annoying "closing dbh with active statement handles", a DBD:SQLite bug that is not yet fixed at the time of writing.

The similar approach may be used to define Perl's m// match operator that returns 1 on success and 0 otherwise to be effectively embedded into SQL WHERE clauses (see below).

$dbh->func('m', 2, sub {return eval '$_[0] =~ '.$_[1] or die "incorrect regexp syntax!";}, 'create_function');

Each time you run DBI->connect() the best way to make user-defined functions accessible by any Perl script is to put them into a separate module, a subclass of DBI. If Apache mod_perl is used, one may want to preload such module into memory during server startup via PerlModule or PerlRequire directives to save time during the first request. This also increases the chances of the memory sharing among the Apache child processes.

The DBI man page contains some basic information about the module subclassing. After a successful connect, DBI::db->connected() method is automatically called. It actually does nothing by default but provides an opportunity to perform necessary post-connection setup. If we put there our user-defined functions they will be automatically registered with SQL parser each time we make successful connection to a database (see below). To proceed further we place our MyDBI.pm file into the current directory.

package MyDBI;
use base 'DBI';

package MyDBI::st;
use base 'DBI::st';

package MyDBI::db;
use base 'DBI::db';

sub connected
{
my $dbh = shift;
$dbh->func('s', 2, sub
{
eval '$_[0] =~ s'.$_[1] or die "incorrect regexp syntax!";
return $_[0];
},
'create_function');
$dbh->func('m', 2, sub
{
return eval '$_[0] =~ '.$_[1] or die "incorrect regexp syntax!";
},
'create_function');
}

1;

To use DBI subclass module, we need to modify DNS string in SQLite connect method to look after MyDBI by default instead of DBI (MyDBI above is the name of the subclass module we have just created):

$dbh = DBI->connect("dbi:SQLite(RootClass=MyDBI):dbname=./data.db","","");

Now we can demonstrate the power of combining both Perl and SQLite by running a sample SELECT query on data.db. The following script shows floating point formatted salaries of employees that have exactly two subsequent case insensitive n's in their names:

#!/usr/bin/perl

use strict;
use DBI;

my ($dbh, $sth, @row);

$dbh = DBI->connect("dbi:SQLite(RootClass=MyDBI):dbname=./data.db","","");

$sth = $dbh->prepare("
SELECT s(salary, '/(\\d+)/sprintf(\"%012.4f\", \$1)/e'), name FROM employees
WHERE m(name, '/N{2}/i')");
$sth->execute();
print "@row\n" while @row = $sth->fetchrow_array();

undef $sth;
$dbh->disconnect();

exit(0);

The above script gives:

0007500.0000 Anne Smith

With SQLite parser it is also quite easy to register a new user-defined aggregate function. The only difference is that instead of function name reference we need to specify a package that contains respective methods to handle supplied data as well as to change the last argument to 'create_aggregate'. For demonstration purposes we put into our current directory a separate package called MyAggregate (see below) that calculates basic descriptive statistics for a provided numeric field. To facilitate the task we will use CPAN Statistics::Lite package that contains statsinfo() function returning the most popular statistics for multiple values.

package MyAggregate;

use Statistics::Lite qw(statsinfo);

sub new {bless [], shift;}

sub step {push @{$_[0]}, $_[1];}

sub finalize {return statsinfo(@{$_[0]});}

1;

Any SQLite aggregate user-defined function interface consists of three methods. The first one new() is called once to create an anonymous array reference that bless tags as belonging to MyAggregate and which should be used later to store and process the rows in a particular group. The step() method populates the array with each row data. This method is called as many times as many rows aggregate function supplies. And in the end finalize() returns our aggregate function's results.

To access MyAggregate from Perl scripts, we should add two additional lines of code to our MyDBI module. The final modified MyDBI is presented below:

package MyDBI;
use base 'DBI';

package MyDBI::st;
use base 'DBI::st';

package MyDBI::db;
use base 'DBI::db';
use MyAggregate;

sub connected
{
my $dbh = shift;
$dbh->func('s', 2, sub
{
eval '$_[0] =~ s'.$_[1] or die "incorrect regexp syntax!";
return $_[0];
},
'create_function');
$dbh->func('m', 2, sub
{
return eval '$_[0] =~ '.$_[1] or die "incorrect regexp syntax!";
},
'create_function');
$dbh->func('statsinfo', -1, MyAggregate, 'create_aggregate');
}

1;

Running following SELECT query on employees table:

$sth = $dbh->prepare("SELECT statsinfo(salary) FROM employees");

gives the following output:

min       = 2500
max       = 10500
range     = 8000
sum       = 43000
count     = 6
mean      = 7166.66666666667
median    = 8000
mode      = 10500
variance  = 11866666.6666667
stddev    = 3444.80284873702
variancep = 9888888.88888889
stddevp   = 3144.6603773522

As it was shown despite the fact that SQLite does not have an extensive built-in functions list, it provides very efficient interface allowing users to define their own functions, both regular and aggregate. Existing SQLite Perl API gives an excellent opportunity to exploit plethora of freely available Perl modules embedding them directly into SQLite queries having to add only few lines of code.

December 01, 2009