Login
Documentation
Login

The Not-forking Software Reproducibility Tool

Not-forking is a programming tool that helps you apply changes to a source code tree even when those changes go beyond what a normal patch can describe — things like renamed files, replaced files, or text substitutions across many places. The effect is that of patch/sed/diff/cp/mv rolled into one, controlled by a single description file of what is to be done. While Not-forking will work on any source tree, the common use case it was designed for was when you wish to use someone else's code and apply changes to it, without forking the original.

Note: If you are reading this on Codeberg you are looking at a read-only mirror rather than the
Not-forking main repository. Codeberg will work fine.

Not-forking is a well-specified machine-readable way of answering this question:

What is the minimum difference between multiple source trees, and how can this difference be applied as versions change over time?

The use of the word 'difference' is important. Most of the world's software infrastructure is assembled, checked and published by continuous integration (CI) and build tools which use patch(1)-type tools and algorithms internally, often via git. These are applied mechanistically, without much connection to the intention of the human software developer. Not-forking is a higher level of abstraction, consisting of recipes which describe a collection of operations (likely including patching) which together result in codebases combining in ways that no merge command can ever achieve. Not-forking recipes communicate more of the programmer's intent of code merges than lower-level tools. The Not-forking TODO details future Not-forking development plans.

Getting started with Not-forking

Download Not-forking using one of:

Here you will find the tool and its libraries, the full documentation and an example configuration (in directory doc/examples) which can be used for testing. The directory lib/NotFork/not-fork.d contains example configuration for the Not-forking tool itself, used to find a specific version of Not-forking even if it is different from the one installed.

Pre-requisites

Not-forking will inform you of any missing dependencies. Systems often have everything Not-forking needs installed already except for the Perl module Text::Glob. To install this, using sudo or as root type:

apt install libtext-glob-perl      # on Debian or Ubuntu, or
dnf install perl-text-glob         # on Fedora or RedHat, or
emerge --ask dev-perl/Text-Glob    # on Gentoo, or
pkg install Text-Glob              # on FreeBSD

Installation

Once you have downloaded the Not-forking source as described above, you can install it using:

perl Makefile.PL
make
sudo make install
not-fork -V   # should give the current version, eg "Not-fork 0.7"

Once you have satisfied the Not-forking build dependencies, you can check the Not-forking runtime dependencies by:

not-fork --check-recommend

and installing any package reported as missing or too old. The not-fork command is now installed on your system.

Runtime dependencies are covered in the full documentation, but in brief, not-fork knows what is needed for each of many different scenarios and explains clearly if it is lacking a particular tool. That means you don't need to worry about what not-fork might be used for when you install it. Currently not-fork can use access methods including Git, Fossil, wget/tar and ftp. Modules will likely be added, for example for Mercurial. If you don't need something then you won't be asked to install it.

It is also possible to run Not-forking from the installation directory without installing it on the system as follows:

perl Makefile.PL
make
perl -Iblib/lib bin/not-fork [options] ...

To try the tools using the included example configuration, use:

perl -Iblib/lib bin/not-fork -idoc/examples [other_options] ...

The reason for Not-forking

Here is a picture of the simplest Not-forking use case. External software, here called Upstream, forms a part of a new project called Combined Project. Upstream is not a library provided on your system, because then you could simply link to libupstream. Instead, Upstream is source code that you copy into the Combined Project directory tree like this:

Diagram 1: Not forking with one Upstream

There are some obvious questions raised by this example:

If you are maintaining a codebase or configuration files that are mostly also maintained elsewhere, Not-forking could be the answer for you. Not-forking was designed to be part of a build tool, and can remove a lot of build system complexity.

Not-forking is strictly about unintentional or reluctant whole-project forks.

Not-forking produces a buildable tree from inputs that would otherwise need manual merging, or an algorithm so specific that it would become its own project. Rather than adding intelligence to a diff tool, Not-forking gets trees in a condition where diff will work. To do that it needs some guidance from a config file. Of course, at times there will be a merge conflict that requires human intervention, and since Not-forking uses all the ordinary VCS and diff tools, that is a normal merge resolution process. Not-forking understands and can compare many different human-readable styles of version numbering, and is able to monitor and pull source code from upstream via many kinds of upstream protocols.

The full documentation goes into much more detail than this overview.

Referring to Diagram 1, the developer now has good reasons to separate Upstream project code from its repository and maintain it within the Combined Project tree, because in the short term it is just simpler. But that brings the very big problem of the Reluctant Project Fork. A Reluctant Project Fork, or "vendoring" as the Debian Project calls it, is where Combined Project's version of Upstream starts to drift from the original Upstream. Nobody wants to maintain code that is currently being maintained by its original authors, but it can become complicated to avoid that. Not-forking makes this a much easier problem to solve.

Not-forking also addresses more complicated scenarios, such as when two unrelated projects are upstream of Combined Project:

Diagram 2: Not-forking With Two Upstreams

In more detail, the problem of project forking includes these cases:

The following diagram indicates how even more complex scenarios are managed with Not-forking. Any of the version control systems could be swapped with any other, and production use of Not-forking today handles up to 50 versions of three upstreams with ease.

Diagram 3: Not-forking with Multiple Versions and Multiple Upstreams

Why Not Just Use Git/Fossil/Other VCS?

Git rebase cannot solve the Not-forking problem space. Neither can Git submodules. Nor Fossil's merge, nor the quilt approach to combining patches.

A VCS cannot address the Not-forking class of problems because the decisions required are typically made by humans doing a port or reimplementation where multiple upstreams need to be combined. A patch stream can't describe what needs to be done, so automating this requires a tangle of fragile one-off code. Not-forking makes it possible to write a build system without these code tangles.

Examples of the sorts of actions Not-forking can take:

Disambiguation of "Fork"

The term "fork" has several meanings. Not-forking is addressing only one meaning: when source code maintained by other people elsewhere is modified by you locally. This creates the problem of how to maintain your modifications without also maintaining the entire original codebase.

Not-forking is not intended for permanent whole-project forks. These tend to be large and rare events, such as when LibreOffice split off from OpenOffice.org, or MariaDB from MySQL. These were expected, planned and managed project forks.

Not-forking is not intended for extreme vendoring either, as in the case decided by Debian in January 2021, where the upstream is giant and well-funded and guarantees it will maintain all of its own upstreams.

Here are some other meanings for the word "fork" that are nothing to do with Not-forking: