View Ticket

Not logged in
Ticket Hash: 7539889d70b49a79a6e1e03059abe069d1bf5911
Title: Reserve bytes in SQLite magic.txt
Status: Open Type: Documentation
Severity: Minor Priority: Immediate
Subsystem: Resolution: Open
Last Modified: 2022-10-24 21:16:43
Version Found In:
User Comments:
danshearer added on 2021-02-25 10:39:53:

Problem Statement

LumoSQL needs to store a number in permanent metadata in standard SQLite data files, such that no existing SQLite binaries will see the file as corrupt.

In addition, it is correct and polite behaviour to identify databases as created by LumoSQL.

What is SQLite Magic

The first 100 bytes of an SQLite database is a metadata header. This is page 1 of the first BTree in the database.

At offset 68 in the database header is the Application ID, set by PRAGMA application_id.

Application IDs can be registered at sqlite.org in the file magic.txt which is used by file(1), forensics/repair tools and SQLite library user applications. Very few applications have registered.

All applications to date use upper bits = application, lower 16 = type of database by the application, e.g. fossil is 0xf0551__ and Bentley systems is 0x426554__ (or "BENx" if interpreted as ASCII)

Initial Requirement

LumoSQL has added the concept of Lumo Columns, transparent columns much like ROWID that are added to a standard SQLite database and which contain information in a blob in every row, for example "use checksum algorithm X to verify the contents of this row, and here is the checksum last calculated for this row".

This works fine for table btrees. Unfortunately, for reasons documented fully eleswhere, it is not so fine for index btrees. One of the reasons is that SQLite assumes that ROWID is always the last column. Even worse, a COVERING INDEX does not even reference the table it is indexing, so there is no possibility of seeing the checksums stored there.

Therefore, LumoSQL needs to store the extra metadata elsewhere. One possibility is in a third type of BTree, for which a root page number can be stored in the upper bits of the Application ID. It could be other things including a filename.

Potential Problem

Using Application ID means LumoSQL is viewing itself as an application or library user, whereas it is in fact the library itself. This ambiguity will cause corruption in the case where an end-user application wants to use the Application ID, and therefore overwrites the LumoSQL data. This will result in loss of any LumoSQL data stored in the database (because there will no longer be a pointer to it) but it will not result in a corrupted database when viewed from any binary. LumoSQL could rebuild the database and re-overwrite the Application ID.

This could potentially result in an edit war, where LumoSQL and the application keep rewriting the Application ID on every open.


danshearer added on 2021-02-25 11:20:09:

The Special Case of Fossil

One of the very few applications already in magic.txt is Fossil. Fossil is also one of the very few applications that does not need rowsums since its whole purpose is around using hashes of content.

Given that Fossil should always be built using a current SQLite we don't need to be worried about an old system SQLite for Fossil. At least if external sqlite is deprecated in Fossil.

Therefore, we shouldn't be too worried about a situation arising where Fossil is somehow running on a database using LumoSQL and it overwrites the Application ID and therefore LumoSQL loses its metadata. There are other reasons that Lumo Columns might be used, for example per-row ctime/mtime/atime has been suggested, but it doesn't seem that these features are going to be vital to Fossil any time soon.

So magic(5) clashes are not a major worry for LumoSQL


danshearer added on 2022-10-24 20:49:22:

Various things have changed in this ticket. One of them is that we have committed changes to Fossil so that external libsqlite3 is more-or-less deprecated. The problem was with Linux/BSD distributions, who much prefer to default to using the distribution library. But Fossil and SQLite are symbiotic and Fossil should always be built with either a current SQLite or the latest available.

So that makes a magic clash in the special case of Fossil even less likely and I think we can forget about it until we have fixed every other relevant problem to do with magic.


danshearer added on 2022-10-24 21:13:17:

There are millions of applications that use SQLite. Apart from Fossil, only one of these millions has registered with magic(5).

LumoSQL could ask sqlite.org if it is time to redefine the use of magic. Perhaps, with the honourable exceptions of Bentley Systems and Fossil, sqlite magic numbers should be for library capabilities refelected in a database rather than application identification.

LumoSQL could have ids for each of the types of data it hides inside an sqlite database.

If Bentley Systems or Fossil want to use lumosql then we have a problem. As previously discussed Fossil is a very small potential problem, and perhaps LumoSQL can come to an arrangement with Bentley Systems, if Bentley Systems even exists anymore.