SBC bootcamp - installing OpenBSD on the Rock PI-4a.

Introduction

If you're feeling frustrated by the steady stream of flaws and defects being found in x86 chips over the last few years, you may have been tempted to have a look at the mysterious, `other hardware architectures', page of your chosen operating system.

All three of the main BSD systems, as well as Linux, support various platforms other than x86, and have done for some time. In many cases, the support is solid and reliable enough to be just as suitable for use in production systems as x86, and if you choose the right hardware, even more so.

"SOUNDS EASY!"

Unfortunately, if you've done more than simply glance at the list out of curiosity, you'll have found that the barrier to entry is quite high. Despite having years or even decades of IT experience, suddenly you are in a world full of terms you've either never heard of before, or at best know only by name.

Reading further, you'll find that the boot process is different, the hardware is different, everything seems different.

In this paper, we'll try to de-mystify things a bit. We'll provide some practical examples using hardware that is fairly easily obtainable and cheap enough not to be a major investment.

Barriers to entry

If you've tried searching for more details on hardware platforms such as SPARC, MIPS, ARM, and VAX to name a few, you'll have noticed two things. Either the hardware is obsolete, or expensive. Or both.

"Non-X86 hardware is often expensive, obsolete... Or both."

Yes, a used SPARC workstation from the late 1990s will often command a higher resale price than new x86 hardware that will outperform it in terms of raw processing power. Of course, raw processing power isn't the only metric by which hardware should be judged. Build quality and reliability of today's mass-market consumer-orientated x86 kit is quite likely not going to be on a par with a dedicated workstation from 20 years ago. Just looking at the quality of the construction of the metal casing tells a lot about the product you're buying.

Nevertheless, it may come as a surprise that if you want processing power anywhere near that which you have in the x86 machine on your desk right now from another hardware platform, it's going to come at a price. As a rough ballpark figure, expect to pay four or five times the amount for comparable new hardware.

Of course, this is a major reason why getting a UNIX-like operating system running on cheap consumer grade hardware was such a big deal back in the 1990s. If you didn't appreciate the efforts of the early Linux kernel developers and modern BSD system developers before now, well maybe you should do.

The first point to take away, then, is that for raw compute power, you probably want x86. If you're doing CPU-based ray-tracing, rendering, video compression, bulk compiling, or other CPU intensive tasks on a budget, then x86 is still the way to go. On the other hand, if you're not on a limited budget, then there is plenty of non-x86 hardware that will do these tasks just fine.

However, if what you really want is to get some practical experience with other platforms ahead of more affordable hardware hopefully coming to the market in the near future, then until relatively recently you'll have been somewhat stuck for choice.

The second main barrier to entry into this exciting new world is that, whilst most of the fundamentals of computing are obviously the same as the common PC-compatible x86 platform, there are also a lot of differences in some of the fine details. You might indeed be an `elite hacker' with guru-level UNIX experience, but if you've worked exclusively with consumer hardware or even enterprise x86 kit, then at times you'll feel like a novice again, having to look up lots of specifics for things that you assumed you already knew.

Some of this information can be hard to find, if you don't know where to look, which is one of the things that this paper aims to simplify a bit.

But there is a way!

Practical solutions

In recent years, a number of relatively low-cost Single Board Computers', (SBCs), have become available. These are basically an off-shoot of what we used to call, development boards', except that they are more enthusiast-orientated in their design, and thanks to mass-production, typically offered at a much lower price-point.

The majority of these SBCs are based around ARM processors, many of them the 64-bit variants. These CPU cores are certainly powerful enough to do useful work on a modern BSD or Linux system, and as a result, you can now gain entry to this field without the limitations that we described above.

However...

Not all SBCs are the same! Some are more suited to this task than others, and it might not be immediately obvious to the uninitiated which boards are even likely to be compatible with your chosen operating system.

Unfortunately, the current trend in marketing seems to be to offer these SBCs as a way to run pre-supplied OS images, and use pre-written modules to achieve your end goal, whether that's controlling other connected hardware, or something simpler such as making a music player.

Obviously, doing this teaches you virtually nothing about how the underlying hardware works at a low level, and is rather useless as an introduction to non-x86 computing, but apparently it's what a lot of people want, and it's what sells SBCs. Just be aware that installing a Linux distribution on an SBC from a binary image that you downloaded from the SBC reseller does not make you a `1337 h@x0r' by any stretch of the imagination.

Even more unfortunately, though, this seems to have led to an explosion of SBCs that are packed full of gimmicky extra hardware, such as GPUs, WIFI and bluetooth controllers, which have little or no documentation and rely on closed source binaries to work, whilst the core functionality of the SBC is somewhat lacking.

But don't worry!

There are SBCs suited for making a general purpose workstation out there, and the one we've chosen for the example walk-through in this paper is the Rock-PI 4 from Radxa, which we'll look at in the next section.

Choosing an sbc - the Rock-PI 4 from Radxa

Several things stand out about this board to make it particularly good for our purposes:

Firstly, it's based around the RK3399 SoC from Rockchip. This is a nice, capable SoC which enjoys good support in all of the BSDs and Linux. Rockchip make plenty of documentation available for it, which is always welcome.

Secondly, the board is available with 4 Gb of LPDDR-4 RAM, whereas many if not most SBCs at the time of writing this paper are still using slower DDR-3 RAM. Many other SBCs are also not available with 4 Gb, limiting you to 2 Gb or less. If you intend to compile large amounts of source code on your new system, you'll appreciate plenty of fast memory.

Thirdly, the board supports a removable eMMC module up to 128 Gb, which may be all the local storage you actually need. Alternatively, the board could be used without local storage on eMMC at all for an ultra-low-cost diskless workstation booted over the LAN, (bootloader on the internal SPI flash), but this is definitely not a recommended idea for the beginner.

A number of other details we like about the Rock-PI 4 are USB 3 ports, a built-in realtime clock, tiny physical size, and the availability of a model without wi-fi and bluetooth, the Rock-PI 4 model A.

At the time of writing the board itself is available in the 4 Gb RAM configuration for about $65 or 55 euros, although you will need some accessories which we will discuss in another section.

If you're wondering about performance, and to which generation of x86 CPU it compares, that's quite a difficult question to answer as it obviously depends heavily on the type of workload. As a very rough guide though, we found that with all six cores in use, CPU computing power would typically outperform our Thinkpad X210. Perfectly usable as a light-weight workstation, especially considering it's tiny footprint and low power consumption.

To be fair, there are three small things we would have liked to have seen on this board to improve it even further. Power supply is via a dedicated USB-C connector for power only, which is fine, but a barrel connector would have been nice from a durability viewpoint. Secondly, the on-board firmware has a recovery mode that can load an image into memory via a Rockchip specific USB protocol. It would have been nice to have the option of, say, x-modem upload via the serial port, too. Lastly, it would be much more practical if the eMMC socket was on the opposite side of the PCB to the CPU, to improve it's accessibility when a large heatsink is installed.

But we're splitting hairs here, really. This SBC is a great all-round choice for building a general purpose workstation to run a BSD system on non-x86 hardware.

IMPORTANT NOTE:

The newer Rock-PI 4a+, (note the plus), differs from the Rock-PI 4a shown here, in that the eMMC is soldered in place rather than being socketed.

Although the Rock-PI 4a+ hardware will likely run OpenBSD just fine, the installation method detailed here requires using a model with socketed eMMC, such as the original Rock-PI 4a, or other models such as the Rock-PI 4b or Rock-PI 4c.

Choosing an operating system - OpenBSD

For this demonstration, we'll be installing OpenBSD 6.9 on the Rock-PI 4a SBC.

OpenBSD has several features which make it a convenient and useful choice for testing an unfamiliar hardware architecture. A complete base installation can be performed in about 15 minutes on the Rock-PI 4a, and easily fits onto a 64 Gb eMMC with plenty of space to spare for user storage.

A lot of useful functionality is available with the base system alone, so you can use the Rock-PI 4a as a convenient extra machine to test out different kernel configurations, test patches, try different configurations of the native webserver httpd, set up remote X11 display to your desktop, run sndio over the lan, configure a DNS resolver, or synchronise the RTC with other local machines using ntpd, to name a few examples.

Basically, once it's installed, it will feel almost exactly like an X86 based OpenBSD machine, except that it's much smaller and considerably less tedious to re-install, because you can access the console via a serial link from your host machine and don't need to connect a local keyboard and monitor.

Building software from the OpenBSD ports tree is also straightforward, and the compilation can be run natively on the Rock-PI 4a itself. This is not an embedded platform that requires a bloated development environment to be set up on a faster host machine for cross-compilation. If you want to do your own software development, the base system already includes a C compiler for the arm64 architecture.

Finally, if you really want to know how the hardware itself works at a lower level, the OpenBSD manual pages and ultimately the kernel source code, are excellent sources of information.

If you hope to follow these notes as a guide to performing your own installation, be aware that some experience of OpenBSD on x86, (or any other hardware platform), is assumed.

If you are not familiar with OpenBSD, but have experience with another BSD or UNIX-like OS, you'll probably be able to cope. Otherwise, you will probably want to find somebody with such experience to guide you through the process.

Hardware and software we will be using - a summary

SBC HARDWARE CHECKLIST

On the SBC hardware side of things, we'll be using the following:

The model 4b is almost identical for our purposes

Included with the above heatsink

Should support 115200 bps, preferably 1500000 bps

Should be able to supply a clean 20 volts at about 2 amps

HOST WORKSTATION REQUIREMENTS

We also have a few requirements for the host system:

Important note

As the main focus of this walk-through is to teach some of the fundamental concepts of working with an arm64 based machine, we will, as far as possible, not be relying on magic downloads of random bits of software that `just make things work'. Most of what we need is in the OpenBSD base system, and the few tools we need to build the bootloader for the arm64 platform are available in the OpenBSD ports tree.

Once again, the information on this page is intended as a slower-paced educational experience, rather than a `put the disk in and it works' experience. If you want the latter, pre-built Linux images are probably available for download elsewhere, and this paper may not be for you.

A little theory before we get started

Once the OpenBSD installation is complete, your new ARM-based workstation will feel very familiar for normal user-mode tasks.

The main differences that you'll encouter in the beginning are to do with the bootstrap process and by extension, in the installation procedure.

The BIOS-like functionality of other hardware architectures is way more advanced and useful than what you are probably accustomed to seeing.

The BIOS in PC-compatible x86 machines, especially cheap consumer equipment, is usually intended to be used with a directly attached keyboard and monitor. The first-stage bootloaders in the MBR and PBR inherit this limitation. This arrangement is usually known as a glass console, since traditionally locally connected monitors had a glass CRT screen. It may or may not come as something of a surprise that the equivalent of the low-level BIOS functions on non-x86 boards can often be accessed remotely from another machine, usually via a serial link. Furthermore, the BIOS-like functionality of non-x86 machines is often way more advanced than what you are used to seeing in consumer hardware, with options to interact with filesystems on local disks, ping network addresses, boot over the network, and more.

This also means that we can usually perform an entire installation of OpenBSD, or any other operating system, comfortably from a terminal program on another workstation, where we have all the facilities to scroll-back to see previous output, copy and paste things like IP addresses, pipe a copy of everything that we're seeing to a file, and more. We could even access the host running the terminal program remotely over the internet, and from there perform a complete OS installation on the machine connected directly to it's serial port. All from the other side of the planet.

Some of this may sound familiar to those of you who have experience with virtual servers, where this kind of functionality is simulated to a degree, and you can interact with the virtual system over a virtual serial line. With a typical non-x86 machine, though, the functionality is real, not virtual, and you can interact with the bare hardware over the serial link. No glass console required.

This might seem like a trivial issue, but it becomes very convenient when you have a number of servers to manage, and don't want to either dedicate a keyboard and monitor to each one, or fiddle with KVM switches every time you need access to the BIOS.

The bootstrap process

From this point on, we'll concentrate our examples on the Rock-PI 4a, but most if not all of this will be applicable to just about any RK3399-based SBC, and much of it will be broadly similar in principle for those based on other ARM-based SoCs.

The actual bootstrap process on the RK3399 is really nothing like a typical x86 BIOS.

First of all, we should really clarify exactly what we mean by `bootstrap'.

It's become common to refer to the whole startup procedure of a computer, from power-on to when the OS has finished loading, as booting'. This is unfortunate, as the term starts to become ambiguous when talking about a machine that has multiple stages of bootloader, and also in a hypervisor environment where you may be booting' several different kernels after loading the hypervisor.

The RK3399, has a built-in masked ROM, with a tiny hardwired and non-rewritable program that tries to load and run another program from locally attached storage. If it fails to find any code to run, it resorts to listening on the USB port for a binary image to write to the SPI flash, (which would usually be some kind of bootloader).

To those familiar with the PC-BIOS, this description probably sounds rather like the MBR code. However, there is one very important difference: at this point, the masked ROM has basically done absolutely no hardware initialisation whatsoever. It hasn't even configured the RAM, it doesn't know how much RAM is installed, and it hasn't configured the memory timings. The first-stage boot code to which it transfers control, inherits the hardware in it's raw power-on state, whatever that might be. Effectively, nothing that you're used to adjusting in the BIOS of an x86 machine has been set up by the ROM code.

Which, of course, is excellent. Whereas in a typical consumer-orientated x86 board, the BIOS is a proprietary piece of code, that may be full of bugs, lacking features that you want, and that you have no easy way to modify, by contrast, the first stage bootloader that the RK3399 loads is read from a standard local storage device, and you are free to write whatever you like to those bytes. If you want to write a program to initialise the serial port, and transmit, 'hello world!', repeatedly until the machine is reset, then you are free to do exactly that.

Of course, a more useful thing to do would be to configure the RAM, and do other low-level tasks, which is, of course, precisely what the typical first stage bootloader does.

But wait!

You might be wondering exactly how the RK3399 does this initial program loading, (IPL), if the RAM isn't yet initialised or configured.

In fact, the RK3399 has it's own dedicated 200Kb of SRAM onboard exactly for this purpose. The code in the internal masked rom loads the first stage bootloader from a fixed address on an external storage medium, such as SPI flash or eMMC, into the internal SRAM and executes it. The first stage bootloader is intended to initialise the main system RAM, and then jump back to the masked rom where it finds code that loads the second stage bootloader from external storage in a similar way. The second stage bootloader loads two more fragments of code from external storage, namely the arm trusted firmware and the third stage bootloader. The third stage bootloader will then load the actual operating system, but generally also provides other functionality useful for system maintenance, debugging, and troubleshooting.

The fixed addresses at which these pieces of boot code are located on the external storage are specific to Rockchip SoCs. This is one detail that will be different for SoCs from other manufacturers.

From there on, control passes to the operating system. In our case this is OpenBSD, and the boot process continues just as it would on an X86 machine. The kernel boot is usually quite fast, partly because there is less reliance on discovering internally connected hardware by probing. External devices such as USB storage are discovered by probing, but on-board devices such as serial ports, and USB host controllers, are specified in what is known as a `device tree'. This is basically a description of the internally connected hardware which the kernel can simply read and parse directly.

Constructing the SBC - preamble

We won't connect the serial adaptor until after the SBC is in mounted on the heatsink and ready to be put in the case, as the cable needs to be threaded through a hole in the side. However, it's probably useful to familiarise yourself with where the connectors are just in case you want to do some early testing before mounting it.

SERIAL PORT CONNECTIONS ON THE SBC HEADER PINS

=> gemini://gemini.exoticsilicon.com/images/sbc_2.jpg

From left to right in the image linked above, the connections are RXD on pin 10, TXD on pin 8, and GND on pin 6, data direction being from the point of view of the SBC itself. There is no hardware handshaking, so no RTS, CTS, DSR, and DTR lines to connect. The fourth cable from this particular USB serial adaptor carries +5 volts, and is not required to be connected to the SBC. This should be obvious, but in case you're wondering, no, it couldn't supply anywhere near enough power to be used to power the Rock-PI 4a.

There is actually a another UART available on pins 21, (RXD), and 19, (TXD), however this is not configured for use as a console by the default devicetree files, and also cannot be enabled at the same time as the SPI flash. If you want to connect to other serial devices, though, it should be perfectly usable, as long as the lack of hardware handshaking and TTL voltage levels are not an issue.

The first step of the process then, is to connect the blank eMMC chip to the USB reader/writer so that we can write the OpenBSD miniroot image to it.

The eMMC chip is very small and quite fiddly, but once the pins are correctly aligned it will press into the socket with moderate finger pressure, and give a tangible click as it does so.

We're now ready to program the eMMC chip from the host machine.

Testing the eMMC on the host

The USB eMMC reader/writer presents itself to the host as a standard USB mass storage device, so as long as you have support for the umass' driver compiled into the kernel, it should be automatically detected as an sd' device upon connection:

 umass0 at uhub2 port 2 configuration 1 interface 0 "Genesys UFD" rev 3.20/18.55 addr 6
 umass0: using SCSI over Bulk-Only
 scsibus3 at umass0: 2 targets, initiator 0
 sd3 at scsibus3 targ 1 lun 0:  removable serial.XXXXXXXXXXXXXXXXXXXX
 sd3: 59000MB, 512 bytes/sector, 120832000 sectors
Use the correct device!

If you are following these examples yourself, be absolutely sure that you check the correct device file for your own system, and substitute it in place of sd3 in the commands that you enter.

When listed by usbdevs -v, the eMMC reader/writer will report itself something like this:

 05e3:0756 Genesys, UFD
 super speed, power 224 mA, config 1, rev 18.55, iSerial XXXXXXXXXXXXXXXX
 driver: umass0

A quick test of read and write performance shows that the eMMC is quite fast compared to a lot of cheap memory cards:

 # dd if=/dev/rsd3c of=/dev/null bs=1m count=512
 512+0 records in
 512+0 records out
 536870912 bytes transferred in 3.959 secs (135583609 bytes/sec)
 
 # dd if=/dev/zero of=/dev/rsd3c bs=1m count=512
 512+0 records in
 512+0 records out
 536870912 bytes transferred in 4.234 secs (126795302 bytes/sec)
 
 # dd if=/dev/random of=/dev/rsd3c bs=1m count=512 
 512+0 records in
 512+0 records out
 536870912 bytes transferred in 6.393 secs (83971093 bytes/sec)

Overwriting the whole of this 64 Gb eMMC with random data took about 25 minutes, with an average speed of about 38 Mb/second.

Writing a miniroot image to the eMMC

Since the OpenBSD installer runs from a ramdisk, we can boot the installer from the eMMC itself and then overwrite it as we perform the actual installation. The required files for the base installation can then be fetched from a local webserver running on the host.

The base system binaries for the arm64 architecture should be available from your local mirror site in /pub/OpenBSD/6.9/arm64/ and obviously the source archives in /pub/OpenBSD/6.9/ are the same ones that you would use for any architecture.

You can and should check the integrity of the downloaded arm64 binaries in the normal way using signify on the host.

 # signify -C -p /etc/signify/openbsd-69-base.pub -x SHA256.sig
 Signature Verified
 BOOTAA64.EFI: OK
 BUILDINFO: OK
 INSTALL.arm64: OK
 base69.tgz: OK
 bsd: OK
 bsd.mp: OK
 bsd.rd: OK
 comp69.tgz: OK
 game69.tgz: OK
 install69.img: OK
 man69.tgz: OK
 miniroot69.img: OK
 xbase69.tgz: OK
 xfont69.tgz: OK
 xserv69.tgz: OK
 xshare69.tgz: OK

Note that we don't actually need the file install69.img for this installation.

For now, the only file we need from this directory is miniroot69.img, which we write directly to the eMMC:

 # dd if=miniroot69.img of=/dev/rsd3c bs=1m

In case you're wondering, the miniroot image is exactly 43 Mb, I.E. 45088768 bytes, so we don't need to use conv=sync to pad the final block when writing to the raw device.

Customising the miniroot image

Although we've written the miniroot image to the eMMC, it's not ready to boot yet.

Next, we need to write the correct device tree blob, (DTB), file and the correct bootloader code, because as supplied, the miniroot image doesn't contain the DTB or bootloader code that is specific to this SBC.

Whilst the bootloader code is written to known fixed blocks of the storage device, the DTB file needs to be written to a FAT filesystem. If you enjoy drawing analogies with legacy X86 systems, this use of this FAT partition here is very broadly comparable with the use of CMOS NVRAM, in that it's storing configuration data required for the boot process.

The correct DTB file can be found in /usr/local/share/dtb/arm64/rockchip/rk3399-rock-pi-4a.dtb on the host, once you have built and installed the sysutils/dtb package from the ports tree.

Although it's possible to install sysutils/dtb from a binary package, compiling it from source allows us to make any desired local changes to the DTB file. Prior to OpenBSD 6.9, this was particularly important as the default baud rate for the serial console was 1500000 and our USB serial adaptors didn't communicate reliably with the Rock-PI 4a at this speed. Changing it to 115200 baud completely solved this problem. However, about half way through the OpenBSD 6.9 development cycle, a patch to do exactly this was committed to the ports tree, so now the binary package for sysutils/dtb already contains a DTB that configures the serial console for operation at 115200 baud.

The upshot of all this is that you probably no longer actually need to compile your own sysutils/dtb package from source to get the SBC up and running. Doing so will give you the flexibility to make adjustments later on such as setting the serial console back to 1500000 baud, over or under clocking, or changing the various supply voltages. Unlike many X86 systems, the hardware doesn't really try to stop you from doing these things, but the potential benefits of overclocking the Rock-PI 4a seem very limited and in our opinion not worth the risk of damage to the hardware. Underclocking for lower power consumption and heat generation might be more useful.

If you don't already have dpb set up and configured on your workstation, now would be a good time to read part one of Exotic Silicon's "Reckless guide to OpenBSD", where we explain in detail how to set up dpb on an OpenBSD system.

=> Part one of our "Reckless guide", covering dpb

In any case, assuming that you do have dpb set up and configured, building sysutils/dtb requires just a single command:

 # dpb sysutils/dtb

Installing the package you've just built can be done with a single invocation of pkg_add. Alternatively, if you decided not to build sysutils/dtb from source, but instead downloaded a pre-built binary package, then the same command will install that for you.

 # pkg_add dtb

Now we can copy /usr/local/share/dtb/arm64/rockchip/rk3399-rock-pi-4a.dtb to the correct location on the eMMC:

 # mount /dev/sd3i /mnt
 # mkdir /mnt/rockchip 
 # cp /usr/local/share/dtb/arm64/rockchip/rk3399-rock-pi-4a.dtb /mnt/rockchip/
 # umount /mnt

The next step is to write the bootcode at the correct locations. By far the most popular opensource bootloader for these systems is `Das U-Boot', which is available in the OpenBSD ports tree as sysutils/u-boot. We can build and install this port on the host in much the same way as we built the device tree blobs, although note the use of a comma to specify the required package flavour.

 # dpb sysutils/u-boot,aarch64
 # pkg_add u-boot-aarch64

We specify the aarch64 package flavour in the dpb invocation, as we don't need to build the bootloaders for the 32-bit arm architecture.

Just as the port for the DTBs contains a patch to change the default baud rate from 1500000 to 115200, the OpenBSD port of Das U-Boot also contains a similar patch, so if you want or need to change the baud rate, you will want to make sure that the values configured in each place match.

Yes, the serial port does indeed have to be configured separately in two places.

With the sysutils/u-boot-aarch64 package installed, we can now write the bootcode to the magic locations on the eMMC:

 # dd if=/usr/local/share/u-boot/rock-pi-4-rk3399/idbloader.img of=/dev/sd3c seek=64
 # dd if=/usr/local/share/u-boot/rock-pi-4-rk3399/u-boot.itb of=/dev/sd3c seek=16384

Note that we are using the block device sd3c this time, instead of the raw device rsd3c. Neither file is a multiple of the 512-byte blocksize, so if we were to write them using the raw device the last partial block would not be written.

The idbloader.img file contains the first and second stages of the bootcode, and is written at an offset of 64 sectors. The u-boot.itb file contains the arm trusted firmware and the main u-boot code, and is written at an offset of 16384 sectors.

The eMMC chip is now ready to boot the Rock-PI 4a into the OpenBSD installation ramdisk kernel. We can disconnect the USB reader/writer from the host, and return our attention to the SBC hardware.

Constructing the SBC - back to the hardware...

Now that we've prepared the OpenBSD 6.9 miniroot image with the correct bootstrap code for the Rock-PI 4a and written it to the eMMC, we can get back to the hardware installation propper.

The next step is to remove the eMMC chip from the reader/writer, and insert it into it's socket on the SBC. The socket is near to one corner of the board, on the same side as the SoC. If your SBC seems bricked when you first turn it on, check for a bad connection here.

Unfortunately, due to it's location on the board, if you want or need to remove the eMMC chip in the future it will probably be necessary to remove the heatsink as well. This may in turn require the application of fresh thermal paste. However, it's perfectly possible to update the installation of OpenBSD and even the boot code from the Rock-PI 4a itself without removing the eMMC, as long as the system is capable of booting.

With the eMMC in place, we can move on to mounting the SBC in it's case, the top of which also acts as a heatsink for the CPU.

This particular case comes with everything that you need to put it together, including a small screwdriver, screws, standoffs, adhesive feet, and a generous amount of thermal paste.

Looking carefully at the board and heatsink in profile, you can see that the metal of the heatsink is raised slightly in just the right place in order to make good contact with the SoC. The RAM chips do not appear to contact the heatsink, and most likely wouldn't require extensive cooling anyway.

In use, we've generally observed temperatures reported by the CPU between about 30 degrees C when idling, and about 60 degrees C under load in an air-conditioned office.

When running under load for an extended period of time, the heatsink can become uncomfortably hot to touch. There are four threaded screw holes in a rectangular arrangement over the vanes of the heatsink which appear at first sight to be intended for mounting a fan. However, the spacing of the holes doesn't seem to correspond with any standard size of fan. We also note that there is no easy way of obtaining a 12 volt power supply directly from the SBC itself, so if this is indeed intended as a mounting place for a fan, making use of it might be somewhat challenging.

The four brass standoffs screw into corresponding holes in the top of the case. The larger standoffs are on the left in this picture. The SBC itself will be placed on top and screws inserted into the threads of the standoffs, but first we need to ensure good thermal contact with the heatsink by using the supplied thermal paste.

Check that both surfaces are clean and free of debris before applying the thermal paste. Once it's been applied, place the SBC on the standoffs that we just screwed into the the heatsink, and fix it in place with the set of screws supplied for this purpose.

At this point we can thread the cable for the serial adaptor through the hole on the side of the case, and connect it to the relevant three pins on the SBC. I prefer to leave a separate serial adaptor permanently attached to each SBC for convenient access to the console, but if you prefer not to do this, it would probably be trivial to bring the serial lines and ground out to a chassis mounted TRS socket, (E.G. a standard 3.5 mm headphone jack).

Note that in this example, we have opted not to connect a cell battery to power the internal realtime clock. The Rock-PI 4a has an RTC onboard, but it requires an external power souce, such as a CR2032 cell, to maintain the time and date when the board is otherwise powered off. For our purposes, an RTC is probably superfluous as we can simply synchronise the clock with another machine on the LAN at boot time.

However, if you do want to use the on-board RTC, now would be a good time to connect a suitable cell to the two-pin RTC header.

Screw the heatsink to the base of the case, using the other set of four screws supplied.

All that is left to do now is to remove the protective film from the plastic lid, and our SBC hardware is ready to go!

First power-on

Once our installation of OpenBSD 6.9 on the Rock-PI 4a is complete, we will obviously be able to access it via a network connection. However, the actual interactive installation process requires access to the system console, which is provided via the USB serial adaptor.

The USB serial adaptor used in this example is detected as a uftdi' device, which in turn provides a ucom' device to the host system. This can be used in essentially the same was as any other serial port.

Curiosity

For reference, the apparently identical USB serial adaptor devices that we tested while preparing this write-up, reported themselves differently depending on their own firmware version:

uftdi0 at uhub1 port 1 configuration 1 interface 0 "NXP DK4 Controller Board" rev 2.00/6.00 addr 3
ucom0 at uftdi0 portno 1
uftdi1 at uhub1 port 2 configuration 1 interface 0 "FTDI USB <-> Serial" rev 1.10/4.00 addr 4
ucom1 at uftdi1 portno 1

The relevant output from usbdevs -v being:

0403:6001 NXP, DK4 Controller Board
full speed, power 90 mA, config 1, rev 6.00, iSerial XXXXXXXX
driver: uftdi0
0403:6001 FTDI, USB <-> Serial
full speed, self powered, config 1, rev 4.00
driver: uftdi1

OpenBSD includes a serial terminal emulator in the base installation, so on the host we can start communication between the USB serial adaptors and the Rock-PI 4a at 115200 baud with a command such as:

 # cu -s 115200 -l cuaU0

At this point, if you power on the Rock-PI 4a by connecting a power supply to the USB-C connector, you should be greeted by a lot of output on the serial terminal, the last of which will be the familiar OpenBSD installer prompt.

If you get no output whatsoever, check that the board is actually receiving power, that the eMMC chip is firmly in it's socket, and that the serial link is connected to the correct pins on the SBC. You can check the operation of the USB serial adaptor separately by connecting it's TXD and RXD lines together, which should give you a local echo of anything you type on the serial terminal. If the bootcode was not correctly written to the eMMC because you didn't follow the steps above correctly, it will likely result in no output from the serial terminal.

If you get output from the SBC, but your input is ignored, check that you haven't mistakenly inverted the TXD and RXD lines.

If you get garbled output from the SBC, suspect a mis-configured baudrate or another serial line problem.

If you get a sporadic output consisting mostly of ÿ characters, (0xFF HEX), check that the baudrate isn't set to 1500000 on the SBC, and 115200 on the USB serial adaptor.

The serial line rests at the mark' state, which also encodes a binary 1'. When the speed is mismatched in this way, the receiver may interpret any line activity at the higher speed as a start bit, then continue to read the rest of the data bits long after the line returns to it's resting mark state, thereby reading them all as binary `1', and encoding our favourite character ÿ.

None of this should happen if you are using the same hardware and software versions that we used when producing this paper. However, if you are using later or earlier versions of OpenBSD, different versions of the bootcode, or a different SBC, it's possible that the default configuration might not match what we describe here.

Understanding the boot process

"You're probably wondering what everything that has just flown off of the top of the terminal at 115,200 bps actually means..."

Assuming that you did indeed get the expected output on the serial terminal, you're probably now sitting and wondering what it all means. Especially everything that scrolled past before you got chance to read it.

Let's try to de-mystify it a bit. The first thing output is from the first stage bootloader:

 U-Boot TPL 2021.01 (May 12 2021 - 13:10:43)
 Channel 0: LPDDR4, 50MHz
 BW=32 Col=10 Bk=8 CS0 Row=16/15 CS=1 Die BW=16 Size=2048MB
 Channel 1: LPDDR4, 50MHz
 BW=32 Col=10 Bk=8 CS0 Row=16/15 CS=1 Die BW=16 Size=2048MB
 256B stride
 lpddr4_set_rate: change freq to 400000000 mhz 0, 1
 lpddr4_set_rate: change freq to 800000000 mhz 1, 0
 Trying to boot from BOOTROM
 Returning to boot ROM...

The first stage bootloader initialises the system ram, then jumps back to the code in the masked rom.

At power-on the system ram is initially running at a mere 50 Mhz, but the type, size and configuration is correctly detected and the clock speed is changed to the expected 800 Mhz in two steps.

BUG!

Obviously there is a bug here, as the reported speed of 800000000 Mhz is somewhat impossible with current technology. For reference, 800 Thz would be in the frequency range of ultraviolet light.

This is a fairly trivial bug, but it's worth noting that unlike bugs in proprietary X86 bios code, we can easily fix this one ourselves. The file that would need to be changed is u-boot-2001.01/drivers/ram/rockchip/sdram_rk3399.c in the u-boot source archive, and the bug is on line 2555.

Next, we see the output from the second stage bootloader:

 U-Boot SPL 2021.01 (May 12 2021 - 13:10:43 -0300)
 Trying to boot from MMC2
 NOTICE:  BL31: v2.4(debug):2.4
 NOTICE:  BL31: Built : 13:00:59, May 12 2021
 INFO:    GICv3 with legacy support detected.
 INFO:    ARM GICv3 driver initialized in EL3
 INFO:    plat_rockchip_pmu_init(1624): pd status 3e
 INFO:    BL31: Initializing runtime services
 INFO:    BL31: cortex_a53: CPU workaround for 855873 was applied
 WARNING: BL31: cortex_a53: CPU workaround for 1530924 was missing!
 INFO:    BL31: Preparing for EL3 exit to normal world
 INFO:    Entry point address = 0x200000
 INFO:    SPSR = 0x3c9

The second stage bootloader basically initialises the cpu and interrupt controllers.

The bootcode that has run so far is what is contained in the idbloader.img file that we wrote to the eMMC at sector 64. Control now passes to the main stage of the bootloader, where we get the first opportunity to actually interact with the SBC via the serial console.

Some diagnostic messages will appear here that might seem to be suggesting that something is wrong, but in fact they are to be expected.

Firstly, our Rock-PI 4a is reported as a model 4b. In older versions of the bootloader, it was simply reported as a model 4 with no letter suffix. The model 4a and model 4b are almost identical in terms of hardware, so this mis-reporting is probably just an oversight. The output of the line beginning `Reset cause', will be POR for a power-on reset, or RST for a warm re-boot. The warning of a bad CRC when loading the environment from the MMC is also to be expected, as this hasn't been set up yet.

After a few lines of messages, you will briefly see a prompt giving you two seconds to interrupt the boot process by pressing any key. If you do so, you'll be dropped into a debugging and maintenance type of shell which allows you to do all sorts of low-level tasks with the hardware, including configuring the wired network connection and pinging hosts, writing to the SPI flash memory, and much more. Just to be absolutely clear, this is not the OpenBSD bootloader. The chance to interact with that will come next.

To perform the OpenBSD installation, we don't actually need to do anything special with U-Boot, and can simply allow it to continue the autoboot of the OpenBSD ramdisk kernel.

You will see more output about cards not responding to voltage select, disks not ready, unrecognised filesystems and lack of an EFI system partition. Don't worry, these are to be expected. This will be followed by the sign-on message from the OpenBSD bootloader, which will also pause briefly waiting for interactive commands before booting the ramdisk kernel, just like it does on an X86 machine.

From here on, the output should look very familiar indeed if you're accustomed to OpenBSD on any other architecture, except for some unfamiliar devices in the dmesg output.

Installation of OpenBSD 6.9

Although the OpenBSD installation process is essentially the same across different platforms, there are a few caveats to be aware of if you are new to the arm64 architecture.

Since we only have the bootcode and ramdisk kernel on the eMMC, we'll supply the binary packages for the base system install from a webserver running on the host. This is trivial to set up using the native webserver httpd, included in the base installation.

For this example, we'll set the host to fd00::1, and the Rock-PI 4a to fd00::2, although if you prefer to use other addresses or set up autoconfiguration using rad' on the host, that will obviously work too. The ethernet device in the Rock-PI 4a is dwge0', and we will use `if1' as the example device in the host, which will obviously need to be substituted with the real device.

Note that we are using IPv6 addresses in this example.

Whilst we could just as easily assign IPv4 addresses, IPv6 has existed since 1998 and been an internet standard since 2017. Here at Exotic Silicon we consider IPv4 to be an obsolete, legacy protocol.

Good industry practice requires the use of IPv6 for new deployments, and these sorts of internal, non-connected exercises are an ideal opportunity for users without IPv6 experience or even without IPv6 internet connectivity to learn the new standard.

We need to place a copy of the distribution files from /pub/OpenBSD/6.9/ on a local OpenBSD mirrorsite, into /var/www/htdocs/6.9/ and edit /etc/httpd.conf to serve them over the LAN, with a section similar to the following:

 server "host.lan" {
 listen on fd00::1 port 80
 directory auto index
 }

Next we configure the IP address on the spare network card in the host using ifconfig, and start httpd:

 # ifconfig if1 inet6 fd00::1
 # /etc/rc.d/httpd -f restart

Firewall rulesets

The default firewall ruleset supplied with OpenBSD 6.9 will allow inbound access to a webserver on port 80, but if you have added any rules blocking such access you will need to adjust them.

At the console of the Rock-PI 4a, we should be at the first prompt from the OpenBSD installer:

 Welcome to the OpenBSD/arm64 6.9 installation program.
 (I)nstall, (U)pgrade, (A)utoinstall or (S)hell?

The first part of the installation is fairly standard, as we are just configuring the hostname, network interface and ssh access in the usual way:

 Welcome to the OpenBSD/arm64 6.9 installation program.
 (I)nstall, (U)pgrade, (A)utoinstall or (S)hell? i
 At any prompt except password prompts you can escape to a shell by
 typing '!'. Default answers are shown in []'s and are selected by
 pressing RETURN.  You can exit this program at any time by pressing
 Control-C, but this can leave your system in an inconsistent state.
  
 Terminal type? [vt220] 
 System hostname? (short form, e.g. 'foo') sbc
  
 Available network interfaces are: dwge0 vlan0.
 Which network interface do you wish to configure? (or 'done') [dwge0]
 IPv4 address for dwge0? (or 'dhcp' or 'none') [dhcp] none
 IPv6 address for dwge0? (or 'autoconf' or 'none') [none] fd00::2
 IPv6 prefix length for dwge0? [64] 
 Available network interfaces are: dwge0 vlan0.
 Which network interface do you wish to configure? (or 'done') [done] 
 1) none
 IPv6 default router? (list #, IPv6 address or 'none') none
 DNS domain name? (e.g. 'example.com') [my.domain] lan
 DNS nameservers? (IP address list or 'none') [none]
  
 Password for root account? (will not echo)
 Password for root account? (again)
 Start sshd(8) by default? [yes]
 Setup a user? (enter a lower-case loginname, or 'no') [no] no
 Since no user was setup, root logins via sshd(8) might be useful.
 WARNING: root is targeted by password guessing attacks, pubkeys are safer.
 Allow root ssh login? (yes, no, prohibit-password) [no] yes

Next we are prompted to partition the root disk...

Caveat! Booting from a softraid encrypted volume

If we wanted to use full disk encryption on this installation, we could drop to the shell here and configure a softraid volume that would then be detected when we returned to the installer.

However be aware that unlike on X86, where the bootloader allows booting from a RAID volume on any disklabel partition, on arm64 the RAID volume must be on the `a' partition to be reliably bootable.

The installer will allow you to install onto partitions in a softraid volume contained on, for example, sd0d, and the installation will appear to complete successfully. However, you will not be able to boot into the new system, but will instead see an error similar to the following:

Booting /efi\boot\bootaa64.efi
disks: sd0* sr0
>> OpenBSD/arm64 BOOTAA64 1.4
Passphrase: 
open(sr0a:/etc/boot.conf): can't read disk label
boot> 
cannot open sr0a:/etc/random.seed: can't read disk label
booting sr0a:/bsd: open sr0a:/bsd: can't read disk label
 failed(100). will try /bsd
boot> ls sr0a:/
stat(sr0a:/): can't read disk label

So if you do want to install on to a softraid crypto volume, ensure that the RAID partition is created as partition 'a' to avoid this problem.

Since this is only a test installation anyway, we'll install onto a regular unencrypted device.

Partitioning the eMMC with fdisk and disklabel

 Available disks are: sd0.
 Which disk is the root disk? ('?' for details) [sd0] 
 Disk: sd0       geometry: 7521/255/63 [120832000 Sectors]
 Offset: 0       Signature: 0xAA55
             Starting         Ending         LBA Info:
  #: id      C   H   S -      C   H   S [       start:        size ]
 -------------------------------------------------------------------------------
 *0: 0C      2  10   9 -      3  15  12 [       32768:       16384 ] FAT32L      
  1: 00      0   0   0 -      0   0   0 [           0:           0 ] unused      
  2: 00      0   0   0 -      0   0   0 [           0:           0 ] unused      
  3: A6      3  15  13 -      5 122  53 [       49152:       38912 ] OpenBSD     
 Use (W)hole disk or (E)dit the MBR? [whole] 
 Creating a msdos partition and an OpenBSD partition for rest of sd0...done.
 /dev/rsd0i: 32668 sectors in 8167 FAT16 clusters (2048 bytes/cluster)
 bps=512 spc=4 res=1 nft=2 rde=512 mid=0xf8 spf=32 spt=63 hds=255 hid=32768 bsec=32768
 The auto-allocated layout for sd0 is:
 #                size           offset  fstype [fsize bsize   cpg]
   a:          1024.0M            65536  4.2BSD   2048 16384     1 # /
   b:          4155.1M          2162688    swap                    
   c:         59000.0M                0  unused                    
   d:          3974.9M         10672416  4.2BSD   2048 16384     1 # /tmp
   e:          6344.2M         18812928  4.2BSD   2048 16384     1 # /var
   f:          6144.0M         31805792  4.2BSD   2048 16384     1 # /usr
   g:          1024.0M         44388704  4.2BSD   2048 16384     1 # /usr/X11R6
   h:          8251.9M         46485856  4.2BSD   2048 16384     1 # /usr/local
   i:            16.0M            32768   MSDOS                    
   j:          2048.0M         63385728  4.2BSD   2048 16384     1 # /usr/src
   k:          6144.0M         67580032  4.2BSD   2048 16384     1 # /usr/obj
   l:         19857.9M         80162944  4.2BSD   2048 16384     1 # /home
 Use (A)uto layout, (E)dit auto layout, or create (C)ustom layout? [a]

Here are the first subtle differences compared to a typical X86 installation. The whole disk option in fdisk doesn't just create a single native OpenBSD partition of type A6, but also creates a small FAT partition to hold the device tree blob, and other code needed at boot time. Note that the partition information displayed is being read from the miniroot image that we wrote to the eMMC, which explains why the native OpenBSD partition is only 38912 sectors, or 76 Mb in size. Selecting the whole disk option has written a new partition table to the eMMC, and at this point if we reboot the Rock-PI 4a before fully completing the installation, it will likely hang during the boot process, unable to reload the ramdisk kernel. If this happens, you will need to re-write the miniroot image to the eMMC again, following the same steps as before.

As the FAT partition has to be represented in the disklabel as well, we lose one of the available partition slots here, too. This is slightly inconvenient, but probably not a serious problem for most use cases.

Since the eMMC has a relatively small capacity, it would be wise to spend a moment thinking about your own particular partitioning needs, rather than simply accepting the auto-allocated layout. If you want to set up the ports build system in a chroot, for example, it's useful to create two partitions such as /portschroot and /portschroot/usr/ports/distfiles. We also tend to create a small partition of about 1.5 Gb to hold a local copy of the distribution sources, and any errata patches.

Caveat! Swap partitions - you really do want one!

One very important thing to note here is that removing the `B' swap partition completely will cause you a lot of problems later on. It's tempting, as the Rock-PI 4a is available with 4 Gb of physical ram, which seems like plenty for many applications. Furthermore, we have been running most of our X86 machines without swap for many, many years without any problems whatsoever, but at least as of OpenBSD 6.9, on the arm64 architecture doing this will lead to system instability. The symptoms are very similar to those of unreliable hardware, seemingly random and non-deterministic segmentation faults when performing multi-threaded kernel compiles, for example. Adding even a tiny amount of swap, such as 32 Mb, will completely solve this problem, even in cases where the swap space is never touched because there is sufficient real memory.

Be aware, too, that the RK3399 SoC has six cpu cores. If you intend to do bulk ports builds on this machine using all six cores, then 4 Gb of physical ram without swap might well be insufficient. Memory requirements tend to be higher when doing any particular computing task with a large number of slower cores, compared with one or two faster cores.

After deciding on our disklabel layout, the partitions will be created and formatted as normal. We then move on to the actual installation:

 Let's install the sets!
 Location of sets? (disk http nfs or 'done') [http] 
 HTTP proxy URL? (e.g. 'http://proxy:8080', or 'none') [none] 
 (Unable to get list from ftp.openbsd.org, but that is OK)
 HTTP Server? (hostname or 'done') [fd00::1]
 Server directory? [pub/OpenBSD/6.9/arm64] /6.9/arm64
 Unable to connect using https. Use http instead? [no] yes
  
 Select sets by entering a set name, a file name pattern or 'all'. De-select
 sets by prepending a '-', e.g.: '-game*'. Selected sets are labelled '[X]'.
     [X] bsd           [X] base69.tgz    [X] game69.tgz    [X] xfont69.tgz
     [X] bsd.mp        [X] comp69.tgz    [X] xbase69.tgz   [X] xserv69.tgz
     [X] bsd.rd        [X] man69.tgz     [X] xshare69.tgz
 Set name(s)? (or 'abort' or 'done') [done] -game69.tgz
     [X] bsd           [X] base69.tgz    [ ] game69.tgz    [X] xfont69.tgz
     [X] bsd.mp        [X] comp69.tgz    [X] xbase69.tgz   [X] xserv69.tgz
     [X] bsd.rd        [X] man69.tgz     [X] xshare69.tgz
 Set name(s)? (or 'abort' or 'done') [done] 

The installation of the base sets over the lan to the eMMC is surprisingly fast, especially considering that the CPU is not even running anywhere near full speed:

 Get/Verify SHA256.sig   100% |**************************|  1544       00:00
 
 Signature Verified
 
 Get/Verify bsd          100% |**************************| 13333 KB    00:01
 Get/Verify bsd.mp       100% |**************************| 13407 KB    00:01
 Get/Verify bsd.rd       100% |**************************| 16958 KB    00:01
 Get/Verify base69.tgz   100% |**************************|   219 MB    00:24
 Get/Verify comp69.tgz   100% |**************************| 64340 KB    00:06
 Get/Verify man69.tgz    100% |**************************|  7561 KB    00:00
 Get/Verify xbase69.tgz  100% |**************************| 25544 KB    00:02
 Get/Verify xshare69.tgz 100% |**************************|  4503 KB    00:00
 Get/Verify xfont69.tgz  100% |**************************| 39345 KB    00:04
 Get/Verify xserv69.tgz  100% |**************************| 10943 KB    00:01
 Installing bsd          100% |**************************| 13333 KB    00:00
 Installing bsd.mp       100% |**************************| 13407 KB    00:00
 Installing bsd.rd       100% |**************************| 16958 KB    00:00
 Installing base69.tgz   100% |**************************|   219 MB    00:45
 Extracting etc.tgz      100% |**************************|   254 KB    00:00
 Installing comp69.tgz   100% |**************************| 64340 KB    00:20
 Installing man69.tgz    100% |**************************|  7561 KB    00:04
 Installing xbase69.tgz  100% |**************************| 25544 KB    00:07
 Extracting xetc.tgz     100% |**************************|  7103       00:00
 Installing xshare69.tgz 100% |**************************|  4503 KB    00:04
 Installing xfont69.tgz  100% |**************************| 39345 KB    00:10
 Installing xserv69.tgz  100% |**************************| 10943 KB    00:02
 Location of sets? (disk http nfs or 'done') [done]

The kernel relinking may take slightly longer than you are used to, at about 90 seconds, or perhaps up to two minutes if you're using full disk encryption.

 What timezone are you in? ('?' for list) [Canada/Mountain]
 Saving configuration files... done.
 Making all device nodes... done.
 Multiprocessor machine; using bsd.mp instead of bsd.
 Relinking to create unique kernel... done.
  
 CONGRATULATIONS! Your OpenBSD install has been successfully completed!
  
 When you login to your new system the first time, please read your mail
 using the 'mail' command.
  
 Exit to (S)hell, (H)alt or (R)eboot? [reboot] 
 syncing disks... done
 rebooting...

Congratulations indeed. Our Rock-PI 4a is now ready for use, starting with some post-installation tweaks.

Post-installation pointers and surprises

The first boot into the newly installed system proceeds much as you might expect. The initial key generation is quite fast, between about four and twelve seconds for the the ssh keys, and only about one to four seconds for the isakmpd/iked keys.

Syspatch will throw an error when it's invoked from the rc.firsttime script, as our local mirror doesn't include any syspatch-related files. Although we only configured a point to point network link between the Rock-PI 4a and the host machine during the installation, and deliberately didn't configure a route out to the internet, it would still be a good idea to apply any errata patches that are available.

At this point we can switch from using the serial link to accessing the SBC via ssh, which is somewhat more convenient and allows for multiple login sessions.

Since we configured ssh to allow root login with a password, a sensible first task would be to set up authentication via ssh keys and disable password logins altogether. The relevant public key can be transferred from the host using sftp, and /etc/ssh/sshd_config edited as required to achieve this.

The output of sysctl hw.sensors shows us that we have CPU and GPU temperature monitoring available.

 # sysctl hw.sensors
 hw.sensors.rktemp0.temp0=38.12 degC (CPU)
 hw.sensors.rktemp0.temp1=35.00 degC (GPU)

If you're curious about the performance of the CPU, you might have already tried running md5 -t, with mixed results:

 # md5 -t
 MD5 time trial.  Processing 10000 10000-byte blocks...
 Digest = 52e5f9c9e6f656f3e1800dfa5579d089
 Time   = 1.430000 seconds
 Speed  = 69930069.930070 bytes/second
 
 # md5 -t
 MD5 time trial.  Processing 10000 10000-byte blocks...
 Digest = 52e5f9c9e6f656f3e1800dfa5579d089
 Time   = 1.440000 seconds
 Speed  = 69444444.444444 bytes/second
 
 # md5 -t
 MD5 time trial.  Processing 10000 10000-byte blocks...
 Digest = 52e5f9c9e6f656f3e1800dfa5579d089
 Time   = 1.190000 seconds
 Speed  = 84033613.445378 bytes/second
 
 # md5 -t
 MD5 time trial.  Processing 10000 10000-byte blocks...
 Digest = 52e5f9c9e6f656f3e1800dfa5579d089
 Time   = 1.170000 seconds
 Speed  = 85470085.470085 bytes/second
 
 # md5 -t
 MD5 time trial.  Processing 10000 10000-byte blocks...
 Digest = 52e5f9c9e6f656f3e1800dfa5579d089
 Time   = 1.430000 seconds
 Speed  = 69930069.930070 bytes/second

Here we can notice two things, firstly that the performance is lower than we probably expected, and secondly it's quite variable.

The overall low performance is caused by the fact that as of OpenBSD 6.9, there is not yet any real support for automatic clock speed management on this platform. The kernel inherits whatever configuration the bootloader set up and doesn't automatically change it. Looking at hw.setperf and apm shows us that we are running at a noticeably reduced clock rate:

 # sysctl hw.setperf
 hw.setperf=20
 
 # apm
 Battery state: unknown, 0% remaining, unknown life estimate
 A/C adapter state: not known
 Performance adjustment mode: manual (600 MHz)

Setting hw.setperf to 100 solves this problem:

 # sysctl hw.setperf=100
 hw.setperf: 20 -> 100
 
 # md5 -t
 MD5 time trial.  Processing 10000 10000-byte blocks...
 Digest = 52e5f9c9e6f656f3e1800dfa5579d089
 Time   = 0.610000 seconds
 Speed  = 163934426.229508 bytes/second
 
 MD5 time trial.  Processing 10000 10000-byte blocks...
 Digest = 52e5f9c9e6f656f3e1800dfa5579d089
 Time   = 0.390000 seconds
 Speed  = 256410256.410256 bytes/second

However the variation between invocations is still noticeable, and this is due to the design of the RK3399 SoC that the Rock-PI 4a is based on.

Although the RK3399 contains six cpu cores, they are not all identical. This can be seen at the beginning of the dmesg output where the cpu cores are enumerated:

 cpu0 at mainbus0 mpidr 0: ARM Cortex-A53 r0p4
 cpu0: 32KB 64b/line 2-way L1 VIPT I-cache, 32KB 64b/line 4-way L1 D-cache
 cpu0: 512KB 64b/line 16-way L2 cache
 cpu0: CRC32,SHA2,SHA1,AES+PMULL,ASID16
 cpu1 at mainbus0 mpidr 1: ARM Cortex-A53 r0p4
 cpu1: 32KB 64b/line 2-way L1 VIPT I-cache, 32KB 64b/line 4-way L1 D-cache
 cpu1: 512KB 64b/line 16-way L2 cache
 cpu1: CRC32,SHA2,SHA1,AES+PMULL,ASID16
 cpu2 at mainbus0 mpidr 2: ARM Cortex-A53 r0p4
 cpu2: 32KB 64b/line 2-way L1 VIPT I-cache, 32KB 64b/line 4-way L1 D-cache
 cpu2: 512KB 64b/line 16-way L2 cache
 cpu2: CRC32,SHA2,SHA1,AES+PMULL,ASID16
 cpu3 at mainbus0 mpidr 3: ARM Cortex-A53 r0p4
 cpu3: 32KB 64b/line 2-way L1 VIPT I-cache, 32KB 64b/line 4-way L1 D-cache
 cpu3: 512KB 64b/line 16-way L2 cache
 cpu3: CRC32,SHA2,SHA1,AES+PMULL,ASID16
 
 
 cpu4 at mainbus0 mpidr 100: ARM Cortex-A72 r0p2
 cpu4: 48KB 64b/line 3-way L1 PIPT I-cache, 32KB 64b/line 2-way L1 D-cache
 cpu4: 1024KB 64b/line 16-way L2 cache
 cpu4: CRC32,SHA2,SHA1,AES+PMULL,ASID16
 cpu5 at mainbus0 mpidr 101: ARM Cortex-A72 r0p2
 cpu5: 48KB 64b/line 3-way L1 PIPT I-cache, 32KB 64b/line 2-way L1 D-cache
 cpu5: 1024KB 64b/line 16-way L2 cache
 cpu5: CRC32,SHA2,SHA1,AES+PMULL,ASID16

The first four cpus, cpu0 through cpu3, are slower ARM Cortex-A53 cores, whereas cpu4 and cpu5 are faster ARM Cortex-A72 cores.

Currently, OpenBSD has no particular support for this concept of clusters of cpus with unequal performance characteristics, and will schedule processes to run on any of these six cpus just as it would for a multicore X86 machine.

This can easily be observed in the output of top when run with a short time delay between updates, say top -s 0.1, and with multiple subsequent md5 -t processes run on another terminal. You can clearly see that the higher performance comes when the md5 process is scheduled on one of the fast cores.

Implementing support for all this in the kernel process scheduler might sound straightforward, and for synthetic benchmarks such as md5 -t it's not too difficult to get an improvement. However, after a whole morning tinkering with /usr/src/sys/kern/kern_sched.c, we only achieved at best about a 5% performance increase with real workloads, so it likely requires more considerably more attention than the quick workarounds we tested.

The actual clock speed reported by apm is that of the first cpu, cpu0:

 # sysctl hw.setperf
 hw.setperf=100
 
 # apm
 Battery state: unknown, 0% remaining, unknown life estimate
 A/C adapter state: not known
 Performance adjustment mode: manual (1416 MHz)

Changing the value of hw.setperf selects one of several profiles for clock speed and voltage, and those profiles are different for the two cpu clusters.

The profiles used come from the device tree blob that is parsed at boot time, which we copied to the FAT partition on the eMMC from /usr/local/share/dtb/arm64/rockchip/rk3399-rock-pi-4a.dtb. This file comes from the sysutils/dtb package, and is built from the source code defining the device tree. This source code is actually from the Linux kernel source, which is why if you build the sysutils/dtb port on OpenBSD, it downloads the Linux kernel source code.

DO NOT ADJUST THE VOLTAGE AND CLOCK SETTINGS IN THE DEVICE TREE BLOB UNLESS YOU FULLY UNDERSTAND THE RISK OF DAMAGE TO THE HARDWARE

The profiles are defined in linux-5.11/arch/arm64/boot/dts/rockchip/rk3399-opp.dtsi, there are six configurations for the first cluster containing the ARM Cortex-A53 cores, and eight configurations for the second cluster containing the ARM Cortex-A72 cores. Setting hw.setperf to 0 will select the first, usually lowest, profile for each cluster, and setting hw.setperf to 100 will select the last, and by default highest, profile for each cluster. Setting an intermediate value will, as expected, select one of the other profiles. The output of apm will always report the clock speed selected for the first cluster, but the second cluster will be configured in the expected way.

This also means that some possible pairs of consecutive settings for hw.setperf will report identical clock speeds, as the first cluster is using the same profile, but performance of the second cluster will change. Compare, for example, sysctl hw.setperf=72 and sysctl hw.setperf=73.

Interestingly, the installer was running on just one of the slower ARM Cortex-A53 cores, at a clock speed of only 600 Mhz, and yet the decompression of the base packages was still respectfully fast.

On the subject of clocks, the real-time clock is most certainly completely wrong at this point. The OpenBSD installer warned about being unable to read a valid time from the system RTC when we booted the ramdisk kernel:

 WARNING: bad clock chip time
 WARNING: CHECK AND RESET THE DATE!

Since we haven't yet taken this advice to check and reset the date, it will have been set to the filesystem time on the miniroot image as a last resort, and simply continued from there. If the SBC has been powered off, time will have effectively been frozen until it was powered on again, as the next boot will have relied on reading the filesystem timestamp as it was written at the last shutdown.

If you've always used X86 machines with a functioning battery-backed RTC, and either set the RTC in the BIOS or from within the operating system, you might be unfamiliar with the filesystem timestamp and the various sanity checks that the OpenBSD kernel does to ensure accurate timekeeping. The kernel sourcecode is the ultimate reference here, but the basic concept is that when the machine is halted or rebooted, it writes the current timestamp to the root filesystem.

Upon the next boot, if the time supplied from a hardware RTC differs significantly from this last recorded filesystem timestamp, various actions can be taken. This hopefully ensures that at a minimum, the clock doesn't roll backwards due to a failed RTC battery. It also hopefully serves to keep the system time accurate enough across reboots in a system that doesn't have a supported hardware RTC, so that a correctly configured ntpd can drift the clock back to the correct time reasonably quickly, and avoid a sudden jump. The code for these boot-time sanity checks is in /usr/src/sys/kern/kern_time.c.

Our own solution to accurate timekeeping on the Rock-PI 4a is simple. Since we have an NTP server on the LAN, we just configured ntpd on the Rock-PI 4a to listen to it, and also added a call to /usr/sbin/rdate in /etc/rc.local to set the time immediately on boot.

Closing remarks and conclusions

So there you have it!

A fun and convenient introduction to running OpenBSD on something other than the X86 architecture. From here on, the Rock-PI 4a should basically behave just like any other OpenBSD machine from a userland point of view. Whilst it might not have the same processing power as your main desktop, it's a fairly capable SBC, and an especially convenient way of getting another machine on-line for testing various networking setups and configurations.

=> Home page of the Exotic Silicon gemini capsule. | Your use of this gemini capsule is subject to the terms and conditions of use.

Copyright 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023 Exotic Silicon. All rights reserved.

Proxy Information
Original URL
gemini://gemini.exoticsilicon.com/articles/sbc_bootcamp_2021
Status Code
Success (20)
Meta
text/gemini; charset=utf-8
Capsule Response Time
377.059229 milliseconds
Gemini-to-HTML Time
8.740327 milliseconds

This content has been proxied by September (ba2dc).