Lumosql

Documentation
Login

Documentation

Welcome to LumoSQL v0.82

LumoSQL is a modification of the SQLite embedded data storage library, made using the not-forking software reproducibility tool. SQLite is among most-deployed software which is why LumoSQL goes to great lengths not to fork it. LumoSQL adds features to SQLite by combining multiple source trees as they change over time, notably the LMDB library, besides other bespoke features available if users select them. Not all features involve an incompatible file format. This preview release is for curious technical developers wondering why anyone would want to touch trillion-scale software. Some reasons include:

Production use is not recommended - assume this pre-release LumoSQL will eat your data. We welcome all reports of success and failure.

LumoSQL exists today because of the efforts of many people since late 2019, as the minimum glue between famously well-tested code found on devices and computers everywhere: SQLite and LMDB. LumoSQL is an unusual feat of engineering, but if you're interested in the computer science theory behind it (especially the idea of intention when making code changes) please do get in touch because we have very active work in that area.

LumoSQL was supported by the NLNet Foundation 2020-2022, and we thank them very much.

Things you can do with LumoSQL:

LumoSQL source is at lumosql.org (Fossil) and Codeberg (git).

LumoSQL stays as close as possible to the SQLite way of software development, recognising that there is almost no room for change without becoming a fork. The tooling uses Tcl, we produce an SQLite-compatible amalgamation, and we use the MIT licence which is very close in spirit to the SQLite licensing situation, and our integration tool is literally called not-forking. It is also why Fossil was an early target to support, because Fossil and SQLite are symbiotic projects. If LumoSQL can work with Fossil, there's a good chance it can work with millions of other applications.

Technical detail:

About LMDB and mmap

The reason LMDB and SQLite native btree are extremely different relates to architecture. SQLite is a traditional userspace database program, with a pager, a page cache, a WAL (Write Ahead Log) to make transactions reliable and so on. LMDB takes an entirely different approach, by using the operating system's mmap system calls to do all memory management work for the database including managing buffers, COW (Copy on Write) and reliable persistence to disk. Operating system VM (Virtual Memory) systems are highly optimised for the particular hardware, so LMDB is outsourcing one of the most difficult and performance-sensitive parts of a database. The benchmarking system does not yet measure CPU and RAM usage, but we can reasonably expect they are considerably lower under LMDB because the operating system VM has a global view whereas an ordinary userspace database application only knows about its own processes.

SQLite can also use mmap for databases, in three different contexts, but never as a page management mechanism for the BTree. There are many applications and some databases that use mmap, but only LMDB has tried to become the thinnest layer possible over the operating system's VM. This does come with some limitations, including that there is some hardware so limited that it doesn't really have an operating system at all, and certainly not a memory manager. LMDB cannot run on such systems.

What doesn't work

There are plenty of details to fix. For example: LumoSQL has not been tested on anything other than Linux/Unix yet; the key handling for the encryption is rudimentary, and many other things listed in the LumoSQL TODO. We do not know of any crashbugs left, but as prerelease software there are doubtless some lurking.

Dependencies

LumoSQL needs all of the following to build the default matrix of binaries, consisting of SQLite native + LMDB 0.9.x + LMDBv1.0 without encryption:

If you ever plan to enable encryption:

So, assuming you have the above dependencies:

On Debian/Ubuntu:

sudo apt install build-essential tcl tclx tcl-dev fossil \
    libtext-glob-perl libsodium-dev \
    zlib1g-dev libreadline-dev libncurses-dev

On Fedora/RHEL:

sudo dnf install make gcc tcl8 tcl8-devel tclx fossil \
    perl-Text-Glob perl-ExtUtils-MakeMaker libsodium-devel \
    zlib-devel readline-devel ncurses-devel

make doctor checks every required dependency.

Containers and CI without a /etc/passwd entry for the effective UID need export USER=XXSOMESTRING before make otherwise fossil complains.

If you are using a large shared machine (eg on a cluster) you may have outdated packages and no permissions to install software. Install locally, for example for Tcl or no Tclx, build them into $HOME/.local and prepend that prefix to PATH, LD_LIBRARY_PATH, LIBRARY_PATH, C_INCLUDE_PATH, PKG_CONFIG_PATH, and set TCLLIBPATH=$HOME/.local/lib. The benchmark script benchmark/lumosql-matrix-on-big-machine.sh shows one such environment.

Build and benchmark

make what lists the resolved option set. make targets resolves latest against upstream and lists the targets the matrix would build. make build builds them. make benchmark runs the speedtest1-derived tests against each built binary and appends to benchmarks.sqlite. make test-sql runs the LumoSQL-specific tests in test/sql/ and appends to test-sql.sqlite. See doc/lumo-build-benchmark.md for the detail.

A single-binary benchmark on a small machine takes a few minutes; the full matrix on a big machine takes hours. You will see that by default the build system wants to make a number of binaries with different combinations. If you are just starting with LumoSQL, you probably want just two binaries: unmodified SQLite of a given version (we will go with 3.53.2 for this example) and the same version backed by LMDBv1.0, with no encryption or other options to start with. Restrict the versions to just these by editing the file:

In sqlite3/benchmark/versions

3.53.2

In lmdbv1/benchmark/versions

=3.53.2
1.0

In lmdb/benchmark/versions

# comment everything out

Now make what should say something like:

BUILDS=
    3.53.2
    3.53.2+lmdbv1-1.0

Build options are passed as OPTION_X=value or as a member of the target tuple. Examples switching on encryption and row-level checksums:

make build SQLITE_VERSIONS=3.53.1 LMDB_VERSIONS=0.9.35 \
    SQLITE_FOR_LMDB=3.53.1
make build SQLITE_VERSIONS=3.53.1 USE_LMDB=no USE_LMDBV1=yes \
    LMDBV1_VERSIONS=1.0 SQLITE_FOR_LMDBV1=3.53.1 \
    ROWSUM=on OPTION_LMDBV1_ENCRYPT=on

Build encryption with OPTION_LMDBV1_ENCRYPT=on, then supply the passphrase via the URI parameter at open time:

'file:/path/to/db?lumo_key=passphrase'
The first open generates a <db>-salt sibling file; subsequent opens reuse it. URI-key is the standard way SQLite-based products pass keys today (SQLCipher, SEE, and similar follow this convention), so adopting it here is intentional. See TODO-CRYPTO.md for the planned key-management work (interactive prompt, --key-fd, LUMO_KEY env, rekey, etc.).

If you just want to use the SQLite-compatible LumoSQL library to link against your own application, stop here.

If you want to do benchmarking, meplace 'build' with 'benchmark' in the make commands above to run benchmarking on the binaries you just built (if they aren't built, the benchmarking target will build them first anyway.)

Benchmarking results are in SQLite databases queryable via tool/benchmark-filter.tcl and tool/test-sql-filter.tcl. Here are some commands to try on the file benchmark/benchmarks.sqlite:

# Compare LMDB-backed SQLite against the same SQLite with the native btree:
tclsh tool/benchmark-filter.tcl -db benchmarks.sqlite \
    -compare -A -backend lmdb -B -no-backend

and

# List the 20 most recent benchmark runs with target, date and total duration:
tclsh tool/benchmark-filter.tcl -db benchmarks.sqlite \
    -list -fields TARGET,DATE,DURATION,BACKEND_NAME

There's a lot more benchmarking to play with, including the hardware tests were run on, basic statistics, and exporting as TSV to statistical and graphing programs.

Other details

Target names encode the option dimensions: 3.53.1+lmdbv1-1.0+lmdbv1_encrypt-on+rowsum-on is one binary at build/3.53.1+lmdbv1-1.0+lmdbv1_encrypt-on+rowsum-on/sqlite3. Built binaries are shell wrappers that set LD_LIBRARY_PATH to find the backend .so they were linked against.

The not-fork cache lives under $CACHE_DIR (default ~/.cache/LumoSQL/not-fork) and is content-addressed, so re-running make build after editing a .mod fragment only rebuilds affected targets.