Welcome LumoSQL v0.8 preview
LumoSQL is a modification (not a fork) of the SQLite embedded data storage library, which is among the most-deployed software. LumoSQL gives a way to add features to SQLite by combining multiple source trees as they change over time. LumoSQL has had an eventful history and exists today because of the efforts of many people.
Source-level LumoSQL is a drop-in replacement for SQLite, preserving the SQLite C ABI. With the stock SQLite btree backend, a LumoSQL database file is also
binary-compatible with a normal SQLite file. The LMDB backends store the database as a directory (data.mdb + lock.mdb) and are not binary-compatible with
single-file SQLite databases.
LumoSQL is intended for experts to explore. Source is at lumosql.org (Fossil) and Codeberg (git).
LumoSQL stays as close as possible to the SQLite way of software development, recognising that as (probably) the only trillion-scale software in existence,
there is little room for change without becoming a fork. That is why the tooling uses Tcl, we produce an SQLite-compatible amalgamation, and we use the MIT
licence which is as close in spirit to the SQLite licensing situation, and our integration tool is literally called not-forking. It is
also why Fossil was an early target to support, because Fossil and SQLite are symbiotic projects.
The following LumoSQL features are available for testing:
- Build SQLite with an LMDB database storage engine replacing SQLite's native btree. Here's some comparison info
- Encryption for SQLite via LMDBv1.0's page-based encryption, libsodium-backed, not closed source, not SQLCipher. Key management is deliberately a small demo: see
TODO-CRYPTO.mdfor what still needs to land before this can be used on systems where the URI-key form is unsafe. - The concept of row-based checksums (SQLite has a page-based checksum VFS. Separately, LumoSQL inherits LMDBv1.0's page-based checksums)
- Fossil and libfossil can be built against LumoSQL — libfossil directly via its
--lumosqlconfigure flag, Fossil via a small documented patch. The LumoSQL test suite exercises the same vfile/commit paths Fossil hits. Production use is not recommended. We welcome all reports of success and failure! See fossil/FOSSIL-ON-LUMOSQL.md. - Benchmarking SQLite (with or without LumoSQL changes, we have a cool benchmarking suite)
- Avoiding forking SQLite and yet carrying major patches: we have gone to great lengths to respect the SQLite stability guarantees.
In LumoSQL 0.8 there are three LumoSQL backends:
- the default SQLite Btree storage system
- LMDB 0.9.x series, the latest being 0.9.35. This is very stable and probably the most widely-used key-value store (after SQLite's native btree key-value store). This is now in maitenance mode.
- LMDBv1.0, which is a superset of the stable LMDB 0.9.x API with very different internals. LMDBv1.0 comes with optional page-based encryption, page-based checksums, and incremental backup. LMDB-backed SQLite inherits all these.
Other things:
- LumoSQL has been known to run on x86, ARM-32 and RISC-V architectures, and many Linux and BSD OSs. Our benchmark results are stored in an sqlite file with a tool to merge and compare, so do please try this at home and share.
- Other potential backends are discussed in our knowledgebase of relevant technologies document (there aren't very many.)
- LumoSQL reproducibly integrates 9 years of software infrastructure. The SQLite-side glue is about 2,000 lines under
not-fork.d/sqlite3/. The LMDB and LMDBv1 backends are larger because each is a substantive re-implementation of SQLite's btree against an LMDB environment; the two backends share a great deal of code. - LumoSQL was supported by the NLNet Foundation 2020-2022, and we thank them very much.
Dependencies
LumoSQL needs all of the following to build and run the default matrix (SQLite native + LMDB 0.9.x + LMDBv1.0 without encryption):
- a C compiler and GNU make
- Tcl 8.6 with Tclx
- Perl core plus the
Text::Globmodule - Fossil
- the not-forking tool at >= 0.7
- SQLite's own build-deps on Linux: zlib, readline, ncurses
If you ever plan to enable encryption:
- libsodium — required for
OPTION_LMDBV1_ENCRYPT=on(encrypted LMDBv1)
On Debian/Ubuntu:
sudo apt install build-essential tcl tclx tcl-dev fossil \
libtext-glob-perl libsodium-dev \
zlib1g-dev libreadline-dev libncurses-dev
On Fedora/RHEL:
sudo dnf install make gcc tcl-devel tclx-devel fossil \
perl-Text-Glob libsodium-devel \
zlib-devel readline-devel ncurses-devel
make doctor checks every required dependency.
Containers and CI without a /etc/passwd entry for the effective UID need export USER=XXSOMESTRING before make otherwise fossil complains.
If your shared cluster has stale Tcl or no Tclx, build them into $HOME/.local and prepend that prefix to PATH, LD_LIBRARY_PATH, LIBRARY_PATH,
C_INCLUDE_PATH, PKG_CONFIG_PATH, and set TCLLIBPATH=$HOME/.local/lib. The two matrix-runner scripts under loft/large-matrix-runs/ show one such
environment.
Build and benchmark
make what lists the resolved option set. make targets resolves latest against upstream and lists the targets the matrix would build. make build builds
them. make benchmark runs the speedtest1-derived tests against each built binary and appends to benchmarks.sqlite. make test-sql runs the
LumoSQL-specific tests in test/sql/ and appends to test-sql.sqlite. See doc/lumo-build-benchmark.md for the detail.
A single-binary benchmark on a small machine takes a few minutes; the full matrix on loft/large-matrix-runs/ takes hours. Results are in SQLite databases
queryable via tool/benchmark-filter.tcl and tool/test-sql-filter.tcl. Here are some commands to try on the file loft/large-matrix/runs/benchmarks.sqlite:
# Compare LMDB-backed SQLite against the same SQLite with the native btree:
tclsh tool/benchmark-filter.tcl -db benchmarks.sqlite \
-compare -A -backend lmdb -B -no-backend
and
# List the 20 most recent benchmark runs with target, date and total duration:
tclsh tool/benchmark-filter.tcl -db benchmarks.sqlite \
-list -fields TARGET,DATE,DURATION,BACKEND_NAME
There's a lot more to play with, including the hardware tests were run on, basic statistics, and exporting as TSV to statistical and graphing programs.
Build options are passed as OPTION_X=value or as a member of the target tuple. Examples:
make build SQLITE_VERSIONS=3.53.1 LMDB_VERSIONS=0.9.35 \
SQLITE_FOR_LMDB=3.53.1
make build SQLITE_VERSIONS=3.53.1 USE_LMDB=no USE_LMDBV1=yes \
LMDBV1_VERSIONS=1.0 SQLITE_FOR_LMDBV1=3.53.1 \
ROWSUM=on OPTION_LMDBV1_ENCRYPT=on
Target names encode the option dimensions: 3.53.1+lmdbv1-1.0+lmdbv1_encrypt-on+rowsum-on is one binary at
build/3.53.1+lmdbv1-1.0+lmdbv1_encrypt-on+rowsum-on/sqlite3. Built binaries are shell wrappers that set LD_LIBRARY_PATH to find the backend .so they were
linked against.
The not-fork cache lives under $CACHE_DIR (default ~/.cache/LumoSQL/not-fork) and is content-addressed, so re-running make build after editing a .mod
fragment only rebuilds affected targets.
Encryption is a demonstration in v0.8: build with
OPTION_LMDBV1_ENCRYPT=on, then supply the passphrase via the URI
parameter at open time:
sqlite3 'file:/path/to/db?lumo_key=passphrase'
The first open generates <db-dir>/lumo.salt; subsequent opens reuse
it. URI-key is the standard way SQLite-based products pass keys today
(SQLCipher, SEE, and similar follow this convention), so adopting it
in v0.8 is intentional. It is fine for small embedded devices but not
for general systems: passphrases passed this way leak to argv, shell
history, and /proc/PID/cmdline. See TODO-CRYPTO.md
for the planned key-management work (interactive prompt, --key-fd,
LUMO_KEY env, rekey, etc.).