Documentation

Not logged in

LumoSQL Build and Benchmark System

LumoSQL implements a meta-build system for SQLite, using the Not-Forking tool to handle many of the complexities so we can build a matrix of combined codebases and versions without having a mess of code specific to particular source trees.

But once a binary is built, how can we know if our changes to SQLite make a difference, and what kind of difference? There was no standard way to compare software that implements the SQLite APIs, so we designed one.

The LumoSQL Build and Benchmark System is relevant to all SQLite users wishing to compare different configurations and versions of standard SQLite. In addition, the LumoSQL project includes code to combine any version of SQLite with any version of third party storeage backends, and to run tests on the combined code. This requires small modifications to SQLite itself, and some new code to interface each backend.

Questions The Build and Benchmark System Answers

A single command can now give universal, repeatable, definitive answers to the following seemingly-simple questions:

Having addressed the above questions, the following seemingly more-difficult questions now become very similar to the previous ones:

The rest of this document introduces and defines the benchmarking tool that makes answering these questions possible.

Build and benchmark problem statement

Motivation: LumoSQL has established that there is currently no way of comparing like-for-like SQLite-related databases.

Test matrix: LumoSQL consists of multiple source trees from multiple sources, assembled with the assistance of the not-forking tool. These trees represent a matrix with a very large number of dimensions. The dimensions include among other things: the combination of these source trees; their build process; their invocation parameters; their input data; and the running environment.

Example instances of these dimensions are:

Problem statement:

The LumoSQL Build and Benchmark system solves the problem of defining the dimensions of the test matrix in a formal machine-friendly manner, and presenting them to the user in a human-friendly manner. The user can then select some or all of these dimensions by human-readable name, and then cause them to be actioned. Every selection by the user will have multiple dependency actions.

To ensure repeatability of tests, each test will include the following information:

Where the user has requested average results, the tests may be run several times.

Build and benchmark options

Commandline parameters

(this section is included verbatim comments at the top of build.tcl, which is the master version. It helps to have it here for context for the rest of the documentation.)

Executive summary:

	# build.tcl OPERATION NOTFORK_CONFIG ARGUMENTS

	# OPERATION: options
	# ARGUMENTS: [OUTPUT_FILE]
	#            create a Makefile fragment so that "make" can accept
	#            command-line options corresponding to build options

	# OPERATION: database
	# ARGUMENTS: BUILD_DIR DATABASE_NAME
	#            create database

	# OPERATION: what
	# ARGUMENTS: BUILD_OPTIONS
	#            show what targets/options have been selected based on command-line

	# OPERATION: targets
	# ARGUMENTS: BUILD_OPTIONS
	#            same as "what", but the list of targets are all in one line,
	#            for easier copy-paste when trying to run the exact same list
	#            in multiple places

	# OPERATION: build
	# ARGUMENTS: BUILD_DIR BUILD_OPTIONS
	#            build LumoSQL, if necessary

	# OPERATION: cleanup
	# ARGUMENTS: BUILD_DIR BUILD_OPTIONS
	#            check cached builds, and deletes anything which is no longer
	#            up-to-date (would be rebuilt anyway)

	# OPERATION: benchmark
	# ARGUMENTS: BUILD_DIR DATABASE_NAME BUILD_OPTIONS
	#            run benchmarks (run all tests marked for benchmarking and save timings)

	# OPERATION: test
	# ARGUMENTS: BUILD_DIR DATABASE_NAME BUILD_OPTIONS
	#            run tests (run all tests without saving timings)

Build and Benchmark configuration

A special subdirectory benchmark in not-fork.d/NAME contain files to control the build and benchmark process for backend NAME (NAME can be sqlite3 to control the process for all backends and/or for the unmodified sqlite3). There must be at least one of the following files in each of those directories:

The remaining files in these directories specify build/benchmark options and code to run to produce benchmarks; the code will be documented in another section.

File names matching the pattern *.option specify options which are relevant to building and/or benchmarking NAME. Each file corresponds to a single option (the file name with the .option suffix removed must be the same as the option name). Each file contains lines of the form key=value (or key only) with the following keys defined at present:

For example, not-fork.d/sqlite3/options/datasize.option contains information about the datasize benchmark option:

build = no
syntax = [1-9]\d*(?:,[1-9]\d*)?
default = 1
requiv = (\d+),\1 \1

this means that the values are one or two positive numbers, and two identical values are equivalent to just a single one (e.g. 2,2 is the same as 2) for compatibility with previous versions of LumoSQL where the value was just a single positive integer.

Options which affect the build must be known to the lumo.build script and/or to the lumo.mk Makefile fragment to be effective; these files are installed by the not-forking configuration and control the build process.

Options which affect the benchmark must be implemented by one or more of the tests actually ran, for example by changing data sizes or using PRAGMA statements; the tool/build.tcl tries to know as little as possible about what is being done, to help using the framework for other systems.

Options which apply to all backends are usually found in not-fork.d/sqlite3/benchmark; options which apply to a single backend will be found in the corresponding directory not-fork.d/BACKEND/benchmark. no matter which backend they use. Options which affect the build could be in any directory; currently there is no mechanism to address the case of the same option is present in multiple directories, and it is undefined which one will take precedence.

Backends as of LumoSQL 0.4

At present the only backend provided (in addition to sqlite's own btree) is the lmdb backend; this was originally derived from the sqlightning sources but has been rewritten to work with more recent versions of lmdb and sqlite3; however to add new backends see Adding new backends below.

A third backend, based on Oracle's Berkeley DB is in progress; a special target of +bdb-VERSION (without a sqlite3 version) indicates to build the code provided directy by Oracle, without using the LumoSQL build mechanism.

Specifying build/benchmark options to "make"

The Makefile has a mechanism to recognise build/benchmark options as command-line option with the form OPTION=value where OPTION is the name of an option translated to be in all capital letters; the name can also be prefixes with the string OPTION_ in case there is a name clash with other existing Makefile options: for example, if datasize is defined as the above example, the following two commands are equivalent and will set this option to the value 2:

make benchmark DATASIZE=2
make benchmark OPTION_DATASIZE=2

Options which affect the build may cause a rebuild of the objects; options which only change the benchmark parameters can reuse an older build if available.

These options are in addition to the existing Makefile mechanism to generate a list of targets, using the (previously documented) variables USE_backend, backend_VERSIONS, etc:

Options which take a list of versions expect a space-separated list (this will need to be quoted from the shell); each element can be one of the following:

For example, the following sqlite version list:

all -3.35.0- -latest

corresponds, at the time of writing to the list:

3.35.1 3.35.2 3.35.3 3.35.4 3.35.5 3.36.0 3.37.0 3.37.1 3.37.2 3.38.0

that is, all versions except the ones until 3.35.0 included, and also excluding the latest (3.38.1 at the time of writing); this could also be specified equivalently as:

3.35.1+ -latest

Instead of specifying USE_backend=yes/no and various lists of versions, it's possible to specify an explicit list of targets to build or benchmark; this list can be used, for example, to run the same set at different times, when all and latest may have different meanings. This is done by using the option TARGETS and is explained in the next section.

Some options are provided to control the use of the not-forking tool:

To help debugging, some options provide a mechanism to copy intermediate files, as well as the SQL statement used:

The make target test is similar to benchmark, however it produces output in a different database (by default tests.sqlite) and can run some extra tests which are not useful as benchmarks; also, some code which helps produce precise timing is skipped in favour of speed of execution: the aim here is to check that a backend works, not how long it takes. The name of the tests.sqlite database can be changed using the option TEST_DATABASE_NAME=newname.

Encoding options in the target name

The target name is used internally by the benchmark system to determine if two benchmarks are for similar things and can be compared; in general, two benchmarks are comparable if they have the same build and benchmark options; to simplify this decision, the options are encoded in the target name using the syntax: sqlite3version+[backendname-backendversion]+option-value[+option-value]... the options are always listed in lexycographic order, and default options are omitted, so that if two string differ then the options differ. This is an internal representation, however it appears in the "target" field of the benchmark database, in the output of make what and make targets, and can be specified directly to make to repeat just a particular benchmark without specifying all the options separately.

The syntax is:

make build TARGETS='target1 target2 ...'
make benchmark TARGETS='target1 target2 ...'
make test TARGETS='target1 target2 ...'

As mentioned, the list of targets can be obtained in several ways; possibly the easiest is make targets which will provide a single line for easy copy and paste, for example:

$ make targets USE_BDB=no USE_SQLITE=no LMDB_VERSIONS=0.9.28+ SQLITE_FOR_LMDB=3.37.1+
BENCHMARK_RUNS=1
COPY_DATABASES=
COPY_SQL=
CPU_COMMENT=
DB_DIR=
DISK_COMMENT=
MAKE_COMMAND=make
NOTFORK_COMMAND=not-fork
NOTFORK_ONLINE=0
NOTFORK_UPDATE=0
SQLITE_VERSIONS=latest 3.36.0
USE_SQLITE=no
USE_BDB=no
SQLITE_FOR_BDB=
BDB_VERSIONS=
BDB_STANDALONE=18.1.32=3.18.2
USE_LMDB=yes
SQLITE_FOR_LMDB=3.37.1+
LMDB_VERSIONS=0.9.28+
LMDB_STANDALONE=
OPTION_DATASIZE=1
OPTION_DEBUG=off
OPTION_LMDB_DEBUG=off
OPTION_LMDB_FIXED_ROWID=off
OPTION_LMDB_TRANSACTION=optimistic
OPTION_ROWSUM=off
OPTION_ROWSUM_ALGORITHM=sha3_256
OPTION_SQLITE3_JOURNAL=default
BUILDS=
    3.37.2 3.37.1 3.37.1+lmdb-0.9.28 3.37.2+lmdb-0.9.28 3.37.1+lmdb-0.9.29 3.37.2+lmdb-0.9.29
TARGETS=
    3.37.1 3.37.1+lmdb-0.9.28 3.37.2 3.37.2+lmdb-0.9.28 3.37.1+lmdb-0.9.29 3.37.2+lmdb-0.9.29

so to run exactly the same benchmark one can say:

make benchmark TARGETS='3.37.1 3.37.1+lmdb-0.9.28 3.37.2 3.37.2+lmdb-0.9.28 3.37.1+lmdb-0.9.29 3.37.2+lmdb-0.9.29'

A subset of the normal syntax for lists of versions is recognised, with the "+" and spaces escaped with a backslash, so for example one could run benchmarks for all sqlite versions since 3.35.0 combined with all LMDB versions since 0.9.25, enabling LMDB debugging with:

make benchmark TARGETS='3.35.0\++lmdb-0.9.25\++lmdb_debug-on'

The list of benchmarks generated by this syntax obviously depends on what the current latest version is, however it can be converted to a fixed list with:

make targets TARGETS='3.35.0\++lmdb-0.9.25\++lmdb_debug-on'

Specifying build options to the build and benchmark tools

The various tools provided by previous versions of LumoSQL have been merged into a single tool, tool/build.tcl, which guarantees identical parsing of configuration and options in all stages of the process; the Makefile arranges to call this tool as appropriate, but it can be called manually using the syntax:

tclsh tool/build.tcl OPERATION NOTFORK_CONFIG ARGUMENTS

The NOTFORK_CONFIG is usually the not-fork.d directory provided with LumoSQL; the OPERATION specifies what to do, and the ARGUMENTS depend on the operation specified; the following OPERATIONs are defined:

Note that apart from the slightly different syntax, build/benchmark/test options are specified in the same way as standard Makefile arguments.

For example, to build two versions of plain sqlite3, two versions of sqlite3+LMDB and one version of BDB with its own sqlite3:

tclsh tool/build.tcl build not-fork.d /tmp/objects \
      SQLITE_VERSIONS='3.14.15 3.33.0' \
      USE_LMDB=yes LMDB_VERSIONS='0.9.9 0.9.27' SQLITE_FOR_LMDB=3.8.0 \
      USE_BDB=yes BDB_STANDALONE='18.1.32'

To do the same build as above but specifying the target strings directly:

tclsh tool/build.tcl build not-fork.d /tmp/objects \
      TARGETS='3.14.15 3.33.0 3.8.0+lmdb-0.9.9 3.8.0+lmdb-0.9.27 +bdb-18.1.32'

To add option debug=on to the build:

tclsh tool/build.tcl build not-fork.d /tmp/objects myresults.sqlite \
      SQLITE_VERSIONS='3.14.15 3.33.0' \
      USE_LMDB=yes LMDB_VERSIONS='0.9.9 0.9.27' SQLITE_FOR_LMDB=3.8.0 \
      USE_BDB=yes BDB_STANDALONE='18.1.32' \
      -DEBUG=on

or, with an explicit list of targets:

tclsh tool/build.tcl build not-fork.d /tmp/objects \
      TARGETS='3.14.15++debug-on 3.33.0++debug-on \
      3.8.0+lmdb-0.9.9+debug-on 3.8.0+lmdb-0.9.27+debug-on \
      +bdb-18.1.32+debug-on'

To run the benchmarks rather just building the targets, replace build with benchmark, and add the name of the output database, for example:

tclsh tool/build.tcl benchmark not-fork.d /tmp/objects myresults.sqlite \
      TARGETS='3.14.15++debug-on 3.33.0++debug-on \
      3.8.0+lmdb-0.9.9+debug-on 3.8.0+lmdb-0.9.27+debug-on \
      +bdb-18.1.32+debug-on'

The first version of sqlite3 provided (in this case 3.14.15) will be used to update the benchmark results database.

What tests will run

Each test is composed of three lists of SQL statements, the "before" list prepares the environment for the test, then the test itself runs and the time it takes is logged, finally the "after" list can do any necessary cleanup. Two special files in not-fork.d/sqlite3/benchmark can provide common "before" and "after" code which will be included in every test; these files must have names before-test and after-test respectively.

A backend can add some extra statements to these lists: the special file not-fork.d/BACKEND/benchmark/before, if present, runs just after the one in the sqlite3 directory; and similarly the special file not-fork.d/BACKEND/benchmark/after, if present, runs just before the one in the sqlite3 directory: the idea is that the backend's "before" file executes some extra initialisation after the generic one, and the backend's "after" file does some extra cleanup before the generic one.

Files matching the pattern *.test in directory not-fork.d/sqlite3/benchmark contain the individual tests: the benchmark will read these files in lexycographic order to decide which tests to run and in which order; for each test, the contents of before-test, the test itself, and after-test are concatenated and the result interpreted as TCL code; it is expected that this TCL code sets the variable name to contain the name of the text, and also appends SQL statements to three variables: before_sql, sql and after_sql: these SQL statements will then be executed in the order listed, but only the middle (sql) one is timed, so that setup and cleanup code does not count towards the benchmarking.

If a backend defined a file with the same name as one in the directory not-fork.d/sqlite3/benchmark, that file will be executed immediately after the generic one and can modify the list of statement as appropriate; for example in the current distribution the first test to run, not-fork.d/sqlite3/benchmark/0000.test, creates a database; the LMDB backend has not-fork.d/lmdb/benchmark/0000.test which adds a PRAGMA to specify some backend-specific runtime options to the database.

This TCL code can access a number of variables from the build.tcl script, in particular the array options contains the build and benchmark options; test; each file is a fragment of TCL expected to set two variables: name which is the name of the test, and sql which is the SQL to be executed; the fragment can access the array options to determine the build and benchmark options; examples are provided in the LumoSQL configuration to specify the default set of tests, we show here an example from one of the tests:

set d25000 [expr $options(DATASIZE) * 25000]
set name "$d25000 INSERTs in a transaction"

append sql "BEGIN;\n"
append sql "CREATE TABLE t2(a INTEGER, b INTEGER, c VARCHAR(100));\n"
for {set i 1} {$i<=$d25000} {incr i} {
  set r [expr {int(rand()*500000)}]
  append sql "INSERT INTO t2 VALUES($i,$r,'[number_name $r]');\n"
}
append sql "COMMIT;\n"

This corresponds to the old "25000 INSERTs in a transaction" except that it now multiplies the number of inserts by the DATASIZE option; so it first uses $options(DATASIZE) to calculate the number of inserts, then sets the test name accordinly and generates the SQL. (For simplicity of presentation, this is an older version of the test; a recent version copes with the DATASIZE option having two numbers, a read datasize and a write datasize; see the files actually included in the distribution for the latest examples).

When running the benchmark, the program will measure just the time required to run the appropriate version of sqlite3/backend on the sql generated by each test.

The code fragment can optionally append to two more variables: before_sql is executed at the start, but not included in the time measurement, and after_sql is likewise executed at the end and not included in the time measurement.

At present, tests must be specified in the sqlite3 directory and not a backend one: this is so that we run the same tests for unmodified sqlite3 as we do for the one modified by a backend, to guarantee a meaningful comparison. If a test appears in a backend directory, it is considered additional code to add to the generic test, as described above.

Some of the test files do not produce meaningful timings, but are useful to help checking correctness of backends: to inform the build system of this fact, they can set variable is_benchmark to 0 (by default it has value 1). These tests will then be skipped by make benchmark but still included by make test.

Benchmark run comments

When running benchmarks it's possible to add up to two free-form comments which will be saved in the database but otherwise ignored by the program; these are intended to contain information about the system, and are specified using the command-line options DISK_COMMENT and CPU_COMMENT with the obvious intended meaning, for example:

make benchmark DISK_COMMENT='fast NVME' CPU_COMMENT='AMD Ryzen 3700x'

Adding new backends

To add new backends, create a new directory inside not-fork.d (or inside the appropriate not-forking configuration repository) with the same name as the backend, and add information about how to obtain the sources etc. At a minimum the directory will contain the following files:

The build process requires the backend to provide the following two files (in directory .lumosql), which means that lumo-new-files.mod or some other file in the not-forking configuration must install them:

The LumoSQL build system modifies sqlite3 to replace some of its own files with a stub, which used the C preprocessor's #include directive to read the original file. It also sets the include search path so that it looks first in a subdirectory .lumosql/backend of the backend's sources, and if not found there in the original sqlite3 sources. To avoid file name collision, all such files will be prefixed with lumo_

Therefore, to replace one of these sqlite3 files with a new one the backend will need to have a line in lumo-new-files.mod to specify a new file with the appropriate name in .lumosql/backend, and also add this file in the files directory.

For example, to replace btree.c with a new one (probably something to call the new backend using its own API rather than the original btree.c from sqlite3), one would have the following:

File lumo-new-files.mod: ```

method = replace

files required by the LumoSQL build system

.lumosql/lumo.mk = files/lumo.mk .lumosql/lumo.build = files/lumo.build

files we replace

.lumosql/backend/lumo_btree.c = files/btree.c ```

Then file files/btree.c would contain the new version, and file files/lumo.mk would provide information on how to link the backend with sqlite3, for example:

TCC += -I$(LUMO_SOURCES)/$(LUMO_BACKEND)/include
TLIBS += -L$(LUMO_BUILD)/$(LUMO_BACKEND)
TLIBS += -lmy_backend

would add the include subdirectory in the backend's sources to the search path when building sqlite3 (probably because the replaced btree.c needs to include something from there), and also add the build directory in the backend's sources as library search path; finally it asks to link libmy_backend.so or libmy_backend.a into the sqlite3 executable, probably finding it in the build directory just added to the library search path.

files/lumo.build could be something like:

global backend_name
global backend_version
global build_options

puts "Configuring $backend_name $backend_version"
if {$build_options(DEBUG) eq "on"} {
    system ./configure --enable-debug
} else {
    system ./configure --disable-debug
}

puts "Building $backend_name $backend_version"
system make

# now move files of interest to lumo/build
global lumo_dir
set dest [file join $lumo_dir build]
if {! [file isdirectory $dest]} { file mkdir $dest }
file rename mybackend.h $dest
foreach fn [glob liblmybackend.*] {
    file rename $fn $dest
}

Sharing the Build/Bench Environment

It is often useful to run multiple benchmarking sessions at once on a cluster. Some but not all components of LumoSQL can be shared. The sharing status is as follows:

So, assuming you've set up:

You can create a file Makefile.local in the repository directory (please do not commit this file!) with:

BUILD_DIR = /mnt/build
CACHE_DIR = /mnt/cache
DB_DIR = /mnt/benchmarks
DATABASE_NAME := /mnt/results/lumosql-$(shell hostname)-$(shell date +%Y-%m-%d).sqlite

Then these options will be automatically added to each run. You may want to change the DATABASE_NAME with a filename which makes sense, as long as it is unique even when things are running at the same time.

It is also possible to use the not-fork command directly from a clone if the fossil repository, rather than installing it; and that clone could be in a shared volume; a simple shell script needs to call it with the right options; if for example the repository is at /mnt/repositories/not-forking, the script could contain:

#!/bin/sh
repo=/mnt/repositories/not-forking
perl -I"$repo/lib" "$repo/bin/not-fork" "$@"

If the script is not in $PATH, or if it has a name other than not-fork, add a line like the following to Makefile.local:

NOTFORK_COMMAND = /path/to/not-fork-script

A specific example of a shared cluster is the Kubernetes example files