Reworking the NetBSD™ build system
This is a proposed reimplementation of NetBSD’s build system, to allow it to implement minimal, hermetic and globally correct builds.
Last Significant Update: |
2025-01-20 |
Status: |
Draft |
Comments to: |
-
Their builds are hermetic: the output of each build step in the build is dependent solely on the step’s inputs, and is completely independent of the build environment.
-
The builds are minimal: unnecessary build steps are not executed. For example, if the output of a build step is unchanged from a prior build, then the downstream dependencies are not rebuilt.
-
The builds are correct: a changed dependency is never “overlooked”.
-
The build process relies on the file checksums instead using (unreliable) timestamps.
-
The build process itself is queryable. This facility is very useful in understanding how a build is constructed.
NetBSD’s build process on the other hand is built over
make
. The build is coordinated by a shell script at the
root of the source tree called build.sh
, and is
implemented by making make
recurse into the subdirectories that make
up the source tree.[1]
NetBSD’s build is believed to be reproducible (see ‘NetBSD fully reproducible builds’, Christos Zoulas, 2017). However:
-
The build system has not been designed for hermeticity — for example, the make rules in
src/share/mk/*.mk
do not track changes to the build rules themselves. -
The make binary determines if a build target is out-of-date by looking at timestamps, and not by comparing checksums of file content. This could cause the build to be incorrect, for example, if an input file changes once again before a build step that uses the file completes.
-
The build itself is not minimal, for at least two reasons:
-
make needs to spawn sub-makes to traverse the
/usr/src
hierarchy even if no work needs to be done. -
Build step outputs that are otherwise unchanged can nevertheless trigger unnecessary build steps downstream due to their changed file modification timestamps.
-
-
The correctness of the build is not guaranteed if make is invoked ‘in-tree’. For example, a change to the source code for the in-tree (cross-) compilers needs
build.sh tools
to be run manually to bring these up to date; such tools would not be automatically rebuilt otherwise.
-
The amount of build configuration in the form of Makefiles is large.
$ cd /usr/src $ find -name 'Makefile*' | wc -l 9439 $ find -name 'Makefile*' | grep -v 'external/' | wc -l 4779 $ wc -l $(find -name 'Makefile*') | tail -1 1318750 total
This implies that any conversion needs to be done incrementally, with both Makefiles and Bazel/Buck’s
BUILD
files co-existing during the conversion process. -
The NetBSD build framework is itself bootstrapped from source code during the NetBSD build. Unfortunately Bazel requires a Java runtime, while Buck is written in Rust. Needing a Java runtime or Rust toolchain would greatly complicate the build bootstrap process.
This implies that any build tool would need to be written in the programming languages supported by the bootstrap process (i.e. in C or C++).
-
The NetBSD build system today works on machines with a limited amounts of RAM. It is not clear if the current implementations of Bazel and Buck would work well in such environments.
This implies that any replacement system would need to be implemented with the same care as current NetBSD make.
Proposal
Split NetBSD make's functionality into independent components, re-using code where feasible.
-
The first component analyzes Bazel/Buck-style
BUILD
files or regular Makefiles and prepares a graph of build steps. This ‘front end’ isolates the syntax used to specify build configuration, i.e., whether the build rules usedMakefile
/<*.mk>
syntax or the Starlark language.[2]The build graph is then stored on disk, to allow low-memory machines to handle large build graphs, and to allow the build to be easily restarted.
-
A second component ‘executes’ this build graph, issuing the actual commands for building the tree. This component updates the build graph on disk as the build progresses.
-
A third component queries the build graph for build dependencies (the equivalent of
bazel query
). This component could also be used for querying the current progress of the build.
Notes:
-
In the Bazel/Buck model, the build works with a dependency graph that spans the entire source tree. Such a graph would be larger than that which each (recursive) invocation of make would need to handle. Externalizing the dependency graph to disk could permit the new build system to work on low end, small memory machines.
-
Since the NetBSD build is monadic (to use the terminology from the ‘Build systems à la carte’ paper), the build graph itself may need to change as the build proceeds.
-
The ‘execution component’ would ensure hermeticity of a build step, where the build graph indicates that a particular build step can be done hermetically.
-
The build graph would be complete in that it would track changes to build configuration information (i.e., to Makefiles,
BUILD
files, etc.) in addition to other forms of ‘source code’. A change to build configuration (say a change to a cross-compiler option) would trigger a regeneration of the build graph, and hence a rebuild of all affected dependencies. -
Since the build graph tracks all dependencies, there would be no need for a separate
tools
build phase when building the OS. It should be possible to issue amake
command from within a subdirectory, …$ cd /usr/src/some/module/inside/the/source/tree $ make-ng -m risc-v all # Cross-compilation request.
… and have the build system seamlessly (re)build the cross-compilation toolchain and any necessary pre-requisite tools and libraries, prior to compiling the current module itself.
A Possible Roadmap
- Milestone 1
-
The graph of build steps for
/usr/src
is written to a (graph) database on disk.This milestone could be reached in multiple ways:
-
By parsing Makefiles independently of make, to determine the build graph.
-
By parsing the output from
make -d […]
(make's debug output). This may be hard to do reliably. -
By modifying make to directly write its graph of dependencies and actions to a suitable graph database.
-
Leverage make's “meta” mode and build the dependency graph from the information in the “meta” file. Unfortunately, meta-mode seems somewhat under-documented at present.
-
Option (3) above may be the least tedious to implement.
- Milestone 2
-
The ‘executor component’ is ready.
This component picks up tasks from the build graph and executes them (possibly remotely using a remote execution protocol).
- Milestone 3
-
A translator from Starlark syntax to the build graph is ready.
This translator would allow the source tree to start using
BUILD
files alongside Makefiles. The tool would need to allow dependencies on targets being built by as yet unconverted Makefiles, although such dependencies could be coarse-grained (in the style of NetBSD make'sdirdeps.mk
).
The process of manually converting Makefiles to BUILD
files can
start once this milestone is reached.
- Milestone 3a
-
(Optional) A tool to convert ‘declarative’ Makefiles to syntax is ready.
This tool could help speed up conversions from make syntax to Starlark.
- Milestone 4
-
The existing Makefiles under
/usr/src
have been converted to Starlark or equivalent notation, enabling all build steps to be fully hermetic.
By milestone Milestone 4, the build of /usr/src
should be correct, minimal
and hermetic, in addition to being reproducible.
Related Work
- Kleaf - Building Android Kernels with Bazel
-
Instructions for building the Android Linux using the Bazel.
- reproducible-builds.org
-
Information about reproducible builds in general.
- Reproducible Builds for NetBSD
-
Describes some of the changes needed to make the NetBSD build reproducible.
- Building NetBSD in meta mode
-
Some information about ‘meta’ mode support in NetBSD make.
- Building with dirdeps.mk
-
Describes a make rule set to specify cross-directory dependencies.
make
invocation is considered an anti-pattern for build systems, see ‘Recursive Make Considered Harmful’, Peter Miller, 1998.