Last Significant Update:

2024-10-02

Status:

Early draft

Comments to:

jkoshy@NetBSD.org


Many current embedded and mobile devices feature a mix of ‘performant’ and ‘energy efficient’ CPUs.  These CPUs can differ in their cache types and sizes, in their support for floating point operations, in their support for instruction parallelism, in their proximity to specialized hardware within their chip package, and so on.

In theory, a device with such a mix of CPU types could allow software to run non-time-critical tasks on the lower-power but energy-efficient CPUs in the system while preserving the ability to running time-sensitive and computationally intensive tasks on the powerful CPUs.  In practice, current programming abstractions & operating system APIs come in the way of extracting the full potential offered by asymmetric hardware.

With the current set of abstractions, the platform’s power management layer and the kernel’s scheduler have little visibility into the application’s immediate needs: e.g., whether an application needs to finish a task before a deadline, or whether a particular thread in the application is best scheduled on a CPU that supports a particular kind of hardware accelaration, and so on.

This project offers a way to structure software for systems whose constituent CPUs differ not only in their compute capability, but also along other dimensions (e.g., some CPUs may be ‘closer’ to specific hardware accelerators than others):

  • By generalizing the ‘upcalls’ described in Anderson et al., 1991, while keeping backward compatibility with existing executables.

  • By offering user-space APIs to fully utilize the new functionality, say in the Elftoolchain project’s libtask.

(TBD)

Caveats

Prior attempts to incorporate the notion of an ‘upcall’ in FreeBSD and NetBSD ended up being removed for various reason.  It needs to be investigated whether this was due to (perhaps correctable) implementation choices, or whether there is a fundamental mismatch between the design of traditional UNIX-like kernels and upcall-based techniques.

  • Scheduler activations: effective kernel support for the user-level management of parallelism; Thomas E. Anderson, Brian N. Bershad, Edward D. Lazowska, and Henry M. Levy; SOSP '91.

    This paper lists the correctness issues and performance problems that arise when scheduling user-space and kernel threads, and proposes an “upcall” mechanism called “scheduler activations” to address these.

    The paper does not look at the additional complexity that an asymmetric system design would introduce to thread scheduling (e.g., a thread may desire fast floating point support for portions of a computation, while desiring low-power operation otherwise).

    Open-source projects have had limited success incorporating the ideas of this paper into their kernels: NetBSD removed its implementation of scheduler activations in NetBSD 5.0 (see Thread scheduling and related interfaces in NetBSD 5.0, Mindaugas Rasiukevicius, 2009), while FreeBSD removed its own implementation of the idea, named “Kernel Scheduled Entities” (KSE), by FreeBSD 8.

  • The Problem with Threads; Edward A. Lee; Tech. Report No. UCB/EECS-2006-1, University of California, Berkeley.

    This paper argues that threads are not a good way to exploit concurrency, because the non-determinism introduced by having multiple threads of control makes program behavior intractable to reason about.  The paper does not examine the additional design complexity that any asymmetricity in system design introduces to thread-based designs.

  • Composition and Decomposition of Task Systems; Lucian Radu Teodorescu; Overload, 29(162):12-16, April 2021.

    The author argues that threads and locks do not compose well, and that task-based systems offer better composability without sacrificing performance or correctness.

    Although this paper does not examine the additional complexity that asymmetric hardware designs introduce, a task-centric approach to structuring computation should help preserve composability, performance and correctness even on asymmetric hardware.