Last Significant Update:

2024-10-02

Status:

Draft

Comments to:

jkoshy@NetBSD.org

This is a way to run non-native executables seamlessly on NetBSD, allowing such non-native processes to be ‘first class’ entities in the OS.

Such non-native processes:

  • would be visible in the process tree,

  • would be able to send and receive signals to and from other processes,

  • would be able write to read and write files and sockets in the filesystem,

  • could be part of a unix “pipe”, and so on.

In other words, non-native executables would behave very similarly to native executables, modulo the overhead of ‘interpreting’ instructions instead of directly executing them.[1]


The Basic Idea

The idea here is that when asked to execute a non-native binary, instead of failing the invocation[2] of execve() with an ENOEXEC, the kernel would hand over the binary to be executed to a specially configured instruction set emulator.

This instruction set emulator would then map in the non-native executable image and proceed to emulate the instructions in it, either instruction-by-instruction or by translating the non-native instructions in the program image to native instructions.[3]

Why Bother Though?

Being able to execute non-native binaries in this fashion offers a number of advantages:

  • When cross-compiling software for a non-native platform, we could directly run a non-native toolchain on more powerful hardware.  This could help speed up package builds for older architectures inspite of the interpretation overhead.

  • We could transparently run software compiled for legacy architectures.  This could be of interest for organisations that need to run binaries built for old hardware that is no longer in production.

  • We may be able to transparently run software written for non-POSIX operating systems too, provided that we can translate NetBSD’s behavior to that expected by the software under emulation and vice-versa.

  • We could offer a selectively bug-compatible execution environment for old programs on newer systems, for example, propagating only security fixes while keeping other behavior unchanged.  This would allow older software to be run in an otherwise up to date and secure system.

Once implemented, we would progress from ‘Running NetBSD on a toaster’ to ‘Running NetBSD on NetBSD on a toaster’.

Speed-ups

  • Whenever the emulated non-native process issues a system call, the call would be translated to a native system call, which would then run at native speed inside the kernel.

  • For dynamically-linked non-native executables, we could look at making function calls into shared objects that are available natively (like libc) to switch into “native mode”, allowing even more of the emulated program’s execution to run at native speed.[4]

Why Take Up This Project?

{ For GSoC applicants }

The project’s implementors will gain a deep understanding in how both modern and legacy systems actually work.  They would gain insights:

  • Into instruction set architectures, both old and new.

  • Into the process of linking & loading in diverse operating systems.

  • Into what ‘POSIX compatibility’ actually entails.

  • Into the kernel’s VM and process handling code.

  • Into the ELF file format.

  • And into other areas in computer science and engineering.

Dyninst

LGPL-licensed tools for binary instrumentation and modification.

GXEmul

A BSD-licensed architecture emulator.  The instruction emulation code in this project could possibly be reused (https://gavare.se/gxemul/).

QEMU

A GPL-licensed machine emulator that additionally offers “user-mode” emulation (https://www.qemu.org/).

History

NOTE: This project idea was originally posted on FreeBSD’s Summer of Code page on Apr 22, 2006, titled The Arbitrary Interpreter Hack:

The Arbitrary Interpreter Hack
Allow FreeBSD™ binaries for non-native CPU architectures to be emulated by having these 'interpreted' by an instruction set emulator.  This work would allow FreeBSD/{ARM,MIPS,PowerPC} executables to run on a FreeBSD/i386 or FreeBSD/amd64 host.  This project requires an in-depth study of machine ABIs and of Unix semantics.  It has a small kernel component and also involves effort in making the instruction set emulator robust.

1. And modulo other corner-cases, such as using ptrace(2) on such processes.
2. See the POSIX specification for the exec family of APIs.
3. QEMU calls this mode of operation User-Mode Emulation emulation.
4. Implementing stack backtraces for such mixed-mode execution would be interesting, to say the least.