By: The Depressed Milkman
Last Updated: 2021-07-01
Tags: #os
This is an idea, still in development, for an operating system where all programs are safe to execute without any kernelspace-userspace barriers. This is achieved by requiring programs to be written a hardware-independent, safe-by-construction bytecode language. Programs written in this language are incapable of performing unsafe actions such as using pointers to memory not owned by the program. The key benefits of such a system are increased system simplicity, possible performance boosts, and better portability across architectures.
I've developed these ideas over the past year or so, as I've used Linux and experimented with writing operating systems. If they interest you, or if you have any questions or comments, please feel free to contact me (see the website root).
Memory paging, and page access control, is a common and necessary feature in traditional operating systems in order to prevent programs from accessing each other's memory, or that of the kernel. But since all programs are guaranteed to only access the memory they're allowed to, memory paging is not necessary. This should result in a performance boost across the entire system as the CPU no longer has to jump through page tables to convert (uncached) virtual addresses to physical ones.
CPU ring-based isolation also becomes redundant. The bytecode language does not allow programs to perform unsafe actions such as accessing I/O ports or manipulating special registers. There is thus no need for userspace applications to work on a different CPU ring than the kernel. This causes a large performance boost because there is no longer any need to jump between CPU rings, an operation which requires reading and writing almost all CPU state.
The userspace and kernelspace can also share a single pool of memory, and in fact a single memory allocator. Instead of having two (or more) levels of memory management, where the kernel keeps track of regions of memory handed off to applications, and applications keep track of exactly how each byte of that memory is used, the system can have a single global memory allocation system, which directly manages how all applications use memory. On top of this, some security features like memory quotas may still be necessary, but are trivial to implement.
The most important safety aspect of the bytecode language is preventing access to memory not available to the program. This is achieved by using reference types, which are safe pointers which are guaranteed to point to valid memory. Memory can only be accessed through references, and references cannot be converted to or from integers. References can only be created from other references or by referencing an object that is guaranteed to exist. An array type is also provided, which is essentially a safe pointer-length pair. All array accesses are bounds-checked at runtime (unless an access can be guaranteed to succeed).
Since all references have to be statically typed, composing references and arrays together must be typed as well. The language thus needs to be fully statically typed, with structures, enumerations, etc. all provided. Because it is bytecode, typing data is provided in a separate section to the code. When bytecode is loaded into memory, the data for every type is stored at a unique memory location, which can be used to identify it, even at runtime. This forms the basis for run-time type information, thereby allowing programs to handle dynamically typed data safely.
In order to remain hardware-independent, the bytecode only has a stack available. An implicit heap is used. New objects can be created on the stack, which can grow as necessary, and arrays, which are the only available dynamically-sized objects, are built-in to the language and so implicitly use the heap for memory.
Although the language features described above should provide strong safety, further research is necessary to determine if critical hardware vulnerabilities like Spectre and Meltdown are still accessible to programs. In particular, is it possible for programs written in this bytecode to read memory that does not belong to them using speculative execution and cache effects?
Using bytecode naturally causes a large slowdown, as the system resorts to interpretation, which is time-consuming. A much better solution is to use a JIT-based compiler, which can transform the hardware-independent bytecode into standard machine code, which can be run directly. The program remains safe to use because unsafe actions were never coded into the bytecode, and so they will not be present in the machine code. Compiling can also be time-consuming, which is why compiled results can be cached, even across reboots (although this may necessitate a verification system to ensure that malicious code has not been inserted, perhaps via cryptographic signatures).
However, compiling programs to machine code may enhance the possibility of bypassing restrictions via hardware vulnerabilities like Meltdown and Spectre. The program may never access memory it's not supposed to, but if it can do so speculatively, it may still be able to perform unsafe actions. Further research is needed here.
Because programs must be written in bytecode, the features provided by the bytecode become features available to all programs, and consequently the entire system.
Note that Agner Fog's ForwardCom system design can provide some very interesting ideas for the bytecode, such as vector-based work.
=> Agner Fog's ForwardCom system website
One large overhaul of how programs work that can only be implemented because of this is complete colorblind asynchrony, both in userspace and kernelspace. Every function can be called synchronously, in which case the caller only continues execution after the callee has completed its task, or asynchronously, where the caller continues executing other code, and explicitly waits upon the callee when it requires the callee to finish. The callee may or may not execute simultaneously to the caller when it is called asynchronously (essentially depending upon the availability of system resources).
Asynchrony can simplify the work the entire system does, due to the growing use of 'green threads', which is essentially asynchrony in individual applications. Instead of having the system consist of programs which consist of OS threads (program threads which are registered to the operating system) which execute any number of green threads (asynchronous tasks local to the application), the entire system only consists of a large number of small, asynchronous tasks.
Kernel tasks can also integrate into this system, thereby becoming essentially indistinguishable from asynchronous tasks from userspace. System calls then simply become calls to kernel tasks, which can be executed synchronously or asynchronously in order to achieve things like asynchronous I/O. The only additional requirement is some form of priority control to ensure that an overload of userspace tasks does not prevent kernel tasks from executing.
It may be possible to provide additional information to the system scheduler, so that it can better select which tasks to run, by providing task-level priorities. For example, a task A asynchronously calls a task B with priority 1/4 and a task C with priority 2/4. The scheduler then splits up the time it would have given to A into 4 parts, dedicating 1 part to A, 1 part to B, and 2 parts to C. If A then waits upon tasks B and C, then the scheduler distributes the time dedicated to A to tasks B and C, so that 1/3 of the time is give to B and 2/3 of the time is given to C (note that the ratio of time given to B and C remains the same). Although different priority measures are possible (e.g. using integers), the idea remains the same. Other measures, such as CPU time quotas, can also be used.
The system can implement asynchrony support by using the concept of frames. A frame represents a currently executing task, and includes all the data necessary to execute it (such as where its stack is located). If a task asynchronously calls another one, a new frame is created, and the caller's frame is linked to the callee's. If a task synchronously calls another one, there is no need to create a new frame; the function can be directly executed within the same frame, reusing the same data. When a task ends, its frame holds the return value; once the caller waits upon the task, the return value is extracted from the frame, and the frame is removed. The system runs by repeatedly choosing which frames to execute (using the scheduler) and executing them for a short period of time. Because there can be many, many tasks on a system (in the magnitude of millions), frames must be very lightweight.
Although userspace programs can't perform any unsafe functions, the system still has to, in order to interact with external systems such as USB, PCI, timers, etc. As such, (most) device drivers cannot be written in the bytecode language. Instead, they will have to be written in traditional languages like C. They are allowed to access all features of the hardware, and so have to be written carefully and they must be verified manually.
The system is organized as a microkernel supplemented by device driver functions. The kernel provides the API that drivers obey, so that programs (and other drivers) can use them.
Drivers may be structured in the form of asynchronous kernel tasks, as described above. This would require a special C API be provided which they use for acting as tasks. Programs that are compiled to machine code may also use this interface.
Due to the safe nature of programs, they are unable to perform graphical actions like drawing on a screen without some kernel-provided API to do so. This gives a one-time chance to the system designer to introduce accessibility into user interfaces from the very beginning, rather than having it as an add-on afterthought. Accessibility is important not just for the deaf and/or blind, but for non-power users who want a simple interface.
An accessible interface is one which acts in a conversational (request-response) manner. All programs are required to structure their interfaces this way, and this must be enforced somehow. On top of the conversational interface, additional layers can be added, such as textual and/or graphical user interfaces. The system may be able to derive these automatically.
The system must also be internationalized. As many languages as possible must be supported. At the moment, Unicode is the best way to do this (although that does not disqualify developing new alternatives). The system should provide common utility functions for dealing with Unicode. It may also be beneficial to include Unicode text as a special type within the bytecode language, so that it becomes easier to work with it (instead of just assuming ASCII).
Text editors may be the hardest interface to correctly display and to provide accessibly. It must be an interactive widget, in that the user can request the data around the cursor in the text. It must be possible and easy to navigate all levels of content - from individual characters, to words, to lines or paragraphs, etc. However, the available levels depends upon the format of the text being edited - for example, a gemtext document does not have the concept of lines (only paragraphs), and Markdown documents support nested lists. Thus, the text editor interface depends upon the format of the content being edited. It must be possible to dynamically change the format as well.
Certain programs (e.g. games or programs like 'cmatrix') are not entirely accessible, and this is necessary and by design. They will require a different system through which to represent their interface - perhaps some sort of raw graphics or terminal access. It is necessary to forbid (or at least strongly disincentivize) normal programs (those that can be made accessible) from incorrectly using these systems to avoid doing accessibility.
The system may be able to generate a textual and graphical interface from the core accessible interface, using defined structures like menus to lay out the application. This would lead to a very generic and plain interface for every application, which can be offset by using things like generating random color themes (and possibly icons) which fit with the system (this is inspired by the Lagrange Gemini browser).
Linux has slowly progressed towards using file descriptors for a very wide variety of objects - from files, to sockets, FIFOs, timers, event monitors, pollers, etc. File descriptors are used because they are generic handles to objects owned by the kernel. But if programs can hold pointers directly to kernel objects, which is safe when immutable reference types are used by the bytecode language, then there is no longer any need for this abstraction layer. Programs can directly hold references to kernel objects, and call the appropriate kernel functions directly to manipulate them. Syscalls become extremely simple, needing only some input verification (and even then, inputs like references and arrays are guaranteed to be valid, simplifying the process even more). Proper typing of the inputs can make verification almost entirely unnecessary.
Properly typing kernel objects allows programs to safely and easily use system resources. But some degree of polymorphism is also necessary - for example, both files and sockets can be treated as streams. Such common interfaces may be typed in a manner similar to Rust's traits, which can be used to ensure safety in their use even at compile-time. Once again, they can also be dynamically used thanks to run-time type information.
There is a distinct danger that programs may be written as device drivers to work around the lack of functionality available. For example, a game may need to directly interact with a GPU using proprietary APIs, or a program may not be able to achieve the desired performance due bytecode, even with the JIT compiler. Multimedia applications (audio and video players), cryptographic applications, and compilers all have high performance needs which the use of bytecode may interfere with.
2021-07-01: Completed and published first version.
-- (C) CC-BY-NC-SA 4.0 2021-07-01 The Depressed Milkman
text/gemini
This content has been proxied by September (ba2dc).