Welcome to LumoSQL v0.82

LumoSQL is a modification of the SQLite embedded data storage library, made using the not-forking software reproducibility tool. SQLite is among most-deployed software which is why LumoSQL goes to great lengths not to fork it. LumoSQL adds features to SQLite by combining multiple source trees as they change over time, notably the LMDB library, besides other bespoke features available if users select them. Not all features involve an incompatible file format. This preview release is for curious technical developers wondering why anyone would want to touch trillion-scale software. Some reasons include:

You would like to run an SQLite application faster than it currently does. Speed is a tricky subject and we measure it extensively
You would like to explore different approaches to SQLite application robustness and crash recovery
You would like an easy, mainstream way to encrypt your SQLite database
You would like additional features in SQLite stored compatibly with the sqlite3 file format readable by every standard SQLite install

Production use is not recommended - assume this pre-release LumoSQL will eat your data. We welcome all reports of success and failure.

LumoSQL exists today because of the efforts of many people since late 2019, as the minimum glue between famously well-tested code found on devices and computers everywhere: SQLite and LMDB. LumoSQL is an unusual feat of engineering, but if you're interested in the computer science theory behind it (especially the idea of intention when making code changes) please do get in touch because we have very active work in that area.

LumoSQL was supported by the NLNet Foundation 2020-2022, and we thank them very much.

Things you can do with LumoSQL:

Compile against applications that use SQLite. Mostly we have tested against Fossil and libfossil because these are demanding applications developed closely against SQLite, but we welcome bug reports regarding anything else. We have also tested some web applications.
Replace the SQLite Btree storage engine with either of two LMDB storage engines, transparently to both SQLite and applications. Like SQLite, LMDB is a vastly deployed embedded library, and LumoSQL just ties these two codebases together with minimum fuss. You can mix-and-match versions of each across the last decade or so. LMDB works on a different principle to SQLite btree, and one or the other may be better for particular use cases.
The user functionality supports SQLite usage fully, except where a backend does not have an SQLite concept due to architectural differences. All SQL works unmodified, and nearly all pragmas.
The third backend LumoSQL ships is the standard SQLite btree backend, but with the ability to add features to an ordinary sqlite3 database (the one specified as a US Library of Congress Standard, and we are compatible with that.) The first feature we added was storing a checksum for every row, and providing SQL functions to read and calculate the checksum. This is useful for detecting changes in rows without doing comparisons of BLOBs, and also for detecting corruption or unexpected modification on a per-row basis. Any SQLite binary that can read an sqlite3 database can read and write to a rowsum enabled database without noticing, or at least our testing shows that so far.
Features that come with LMDBv1.0 include encryption and page-based checksums. Both of these can be compiled in to LumoSQL. Encryption is via libsodium, and we provide one example cipher with basic key management via an environment variable or commandline. There are other ways of achieving both encryption and checksums, as we document, but these come for free, often even in the sense of performance.
Benchmark SQLite with results in an SQLite database. SQLite famously ships with maybe quarter of a million tests, but these do not as far as we know compare different compilation options with each other on the same hardware, or across architectures, or across different users. LumoSQL benchmarking takes note of the hardware and software environment, and provides a basic statistical analysis tool. Benchmark databases can be merged, so people can send in their results. The benchmarking could easily be extended to other SQL databases but we haven't tried to do that.
Run an SQL test suite that tries to break the storage backend across multiple binary configurations, including cross-binary interoperability tests. Like benchmarking, the results are queryable from an SQLite database via tool/test-sql-filter.tcl. The system supports role-based testing, corruption detection by corrupting a test database, and encryption round-trips. The Fossil test suite is separate and drives the LMDB-backed Fossil through correctness, speed, and stress checks.
Mix and match LumoSQL features across a decade of SQLite and LMDB versions. If you build everything you have a combinatorial explosion: of the order of a thousand binaries and a good few hours even on a fast machine. Or you can pair an older LMDB with a newer SQLite because those versions matter for your embedded application, and choose to enable rowsums, encryption, checksummed pages or more.

LumoSQL source is at lumosql.org (Fossil) and Codeberg (git).

LumoSQL stays as close as possible to the SQLite way of software development, recognising that there is almost no room for change without becoming a fork. The tooling uses Tcl, we produce an SQLite-compatible amalgamation, and we use the MIT licence which is very close in spirit to the SQLite licensing situation, and our integration tool is literally called not-forking. It is also why Fossil was an early target to support, because Fossil and SQLite are symbiotic projects. If LumoSQL can work with Fossil, there's a good chance it can work with millions of other applications.

Technical detail:

the default SQLite Btree storage backend, optionally with rowsums, and the infrastructure to add other features over time.
LMDB 0.9.x, very stable and probably the most widely-used key-value store (after SQLite's native btree key-value store). This is now in maintenance mode by the LMDB project.
LMDBv1.0, which is a superset of the stable LMDB 0.9.x API with very different internals. LMDBv1.0 comes with optional page-based encryption, page-based checksums, and incremental backup. LMDB-backed SQLite inherits all these. LMDBv1.0 is still pre-release.
LumoSQL has been known to run on x86, ARM-32 and RISC-V architectures, and many Linux and BSD OSs. Our benchmark results are stored in an sqlite file with a tool to merge and compare, so do please try this at home and share.
Other potential backends are discussed in our knowledgebase of relevant technologies document (there aren't very many.)
LumoSQL reproducibly integrates 9 years of software infrastructure. The SQLite-side glue is about 2,000 lines under not-fork.d/sqlite3/. The LMDB and LMDBv1 backends are larger because each is a substantive re-implementation of SQLite's btree against an LMDB environment; the two backends share a great deal of code.

About LMDB and mmap

The reason LMDB and SQLite native btree are extremely different relates to architecture. SQLite is a traditional userspace database program, with a pager, a page cache, a WAL (Write Ahead Log) to make transactions reliable and so on. LMDB takes an entirely different approach, by using the operating system's mmap system calls to do all memory management work for the database including managing buffers, COW (Copy on Write) and reliable persistence to disk. Operating system VM (Virtual Memory) systems are highly optimised for the particular hardware, so LMDB is outsourcing one of the most difficult and performance-sensitive parts of a database. The benchmarking system does not yet measure CPU and RAM usage, but we can reasonably expect they are considerably lower under LMDB because the operating system VM has a global view whereas an ordinary userspace database application only knows about its own processes.

SQLite can also use mmap for databases, in three different contexts, but never as a page management mechanism for the BTree. There are many applications and some databases that use mmap, but only LMDB has tried to become the thinnest layer possible over the operating system's VM. This does come with some limitations, including that there is some hardware so limited that it doesn't really have an operating system at all, and certainly not a memory manager. LMDB cannot run on such systems.

What doesn't work

There are plenty of details to fix. For example: LumoSQL has not been tested on anything other than Linux/Unix yet; the key handling for the encryption is rudimentary, and many other things listed in the LumoSQL TODO. We do not know of any crashbugs left, but as prerelease software there are doubtless some lurking.

Dependencies

LumoSQL needs all of the following to build the default matrix of binaries, consisting of SQLite native + LMDB 0.9.x + LMDBv1.0 without encryption:

a C compiler and GNU make
Tcl 8.6 with Tclx
Perl core plus the Text::Glob module
Fossil
the not-forking tool at >= 0.7
SQLite's own build-deps on Linux: zlib, readline, ncurses

If you ever plan to enable encryption:

libsodium — required for OPTION_LMDBV1_ENCRYPT=on (encrypted LMDBv1)

So, assuming you have the above dependencies:

On Debian/Ubuntu:

sudo apt install build-essential tcl tclx tcl-dev fossil \
    libtext-glob-perl libsodium-dev \
    zlib1g-dev libreadline-dev libncurses-dev

On Fedora/RHEL:

sudo dnf install make gcc tcl8 tcl8-devel tclx fossil \
    perl-Text-Glob perl-ExtUtils-MakeMaker libsodium-devel \
    zlib-devel readline-devel ncurses-devel

make doctor checks every required dependency.

Containers and CI without a /etc/passwd entry for the effective UID need export USER=XXSOMESTRING before make otherwise fossil complains.

If you are using a large shared machine (eg on a cluster) you may have outdated packages and no permissions to install software. Install locally, for example for Tcl or no Tclx, build them into $HOME/.local and prepend that prefix to PATH, LD_LIBRARY_PATH, LIBRARY_PATH, C_INCLUDE_PATH, PKG_CONFIG_PATH, and set TCLLIBPATH=$HOME/.local/lib. The benchmark script benchmark/lumosql-matrix-on-big-machine.sh shows one such environment.

Build and benchmark

make what lists the resolved option set. make targets resolves latest against upstream and lists the targets the matrix would build. make build builds them. make benchmark runs the speedtest1-derived tests against each built binary and appends to benchmarks.sqlite. make test-sql runs the LumoSQL-specific tests in test/sql/ and appends to test-sql.sqlite. See doc/lumo-build-benchmark.md for the detail.

A single-binary benchmark on a small machine takes a few minutes; the full matrix on a big machine takes hours. You will see that by default the build system wants to make a number of binaries with different combinations. If you are just starting with LumoSQL, you probably want just two binaries: unmodified SQLite of a given version (we will go with 3.53.2 for this example) and the same version backed by LMDBv1.0, with no encryption or other options to start with. Restrict the versions to just these by editing the file:

In sqlite3/benchmark/versions

3.53.2

In lmdbv1/benchmark/versions

=3.53.2
1.0

In lmdb/benchmark/versions

# comment everything out

Now make what should say something like:

BUILDS=
    3.53.2
    3.53.2+lmdbv1-1.0

Build options are passed as OPTION_X=value or as a member of the target tuple. Examples switching on encryption and row-level checksums:

make build SQLITE_VERSIONS=3.53.1 LMDB_VERSIONS=0.9.35 \
    SQLITE_FOR_LMDB=3.53.1
make build SQLITE_VERSIONS=3.53.1 USE_LMDB=no USE_LMDBV1=yes \
    LMDBV1_VERSIONS=1.0 SQLITE_FOR_LMDBV1=3.53.1 \
    ROWSUM=on OPTION_LMDBV1_ENCRYPT=on

Build encryption with OPTION_LMDBV1_ENCRYPT=on, then supply the passphrase via the URI parameter at open time:

'file:/path/to/db?lumo_key=passphrase'

The first open generates a <db>-salt sibling file; subsequent opens reuse it. URI-key is the standard way SQLite-based products pass keys today (SQLCipher, SEE, and similar follow this convention), so adopting it here is intentional. See TODO-CRYPTO.md for the planned key-management work (interactive prompt, --key-fd, LUMO_KEY env, rekey, etc.).

If you just want to use the SQLite-compatible LumoSQL library to link against your own application, stop here.

If you want to do benchmarking, meplace 'build' with 'benchmark' in the make commands above to run benchmarking on the binaries you just built (if they aren't built, the benchmarking target will build them first anyway.)

Benchmarking results are in SQLite databases queryable via tool/benchmark-filter.tcl and tool/test-sql-filter.tcl. Here are some commands to try on the file benchmark/benchmarks.sqlite:

# Compare LMDB-backed SQLite against the same SQLite with the native btree:
tclsh tool/benchmark-filter.tcl -db benchmarks.sqlite \
    -compare -A -backend lmdb -B -no-backend

and

# List the 20 most recent benchmark runs with target, date and total duration:
tclsh tool/benchmark-filter.tcl -db benchmarks.sqlite \
    -list -fields TARGET,DATE,DURATION,BACKEND_NAME

There's a lot more benchmarking to play with, including the hardware tests were run on, basic statistics, and exporting as TSV to statistical and graphing programs.

Other details

Target names encode the option dimensions: 3.53.1+lmdbv1-1.0+lmdbv1_encrypt-on+rowsum-on is one binary at build/3.53.1+lmdbv1-1.0+lmdbv1_encrypt-on+rowsum-on/sqlite3. Built binaries are shell wrappers that set LD_LIBRARY_PATH to find the backend .so they were linked against.

The not-fork cache lives under $CACHE_DIR (default ~/.cache/LumoSQL/not-fork) and is content-addressed, so re-running make build after editing a .mod fragment only rebuilds affected targets.