This page permanently redirects to gemini://complete.org/isolating-data-from-your-own-processes-with-linux-namespaces/.

Isolating Data From Your Own Processes with Linux Namespaces

Back in my 2019 article "The Desktop Security Nightmare"[1], I noted that on most of our desktops, we don't have good control of what data a program can access and when.

=> 1: https://changelog.complete.org/archives/10006-the-desktop-security-nightmare

I noted that we have things like AppArmor, which is something, but not the entire picture. SELinux is so extremely complicated that even Ted T'so had a comment about never getting some of his life back.

I don't like complexity, especially when it comes to security.

One of my goals is what I'm going to call context-sensitive security. For instance, I would like the PDF of my taxes to be unavailable to all software... except when I'm working on my taxes. So, the okular PDF viewer shouldn't be able to access my tax files, except when I explicitly say it's OK.

One way to accomplish this, of course, would be to just mount the filesystem containing my taxes when I'm working on taxes, and leave it unmounted. However, besides the obvious convenience drawback, this has another one: either the files are inaccessible entirely, or they're accessible to the 5000 programs I have in /usr/bin, the untold number of npm packages a person may have installed, and so forth.

What I really want is to be able to say: "make this directory tree available only to this process and its children." And that's what I'm going to lay out in this article.

Background on Linux namespaces

You are probably already familiar with containers in the sense that they're behind Docker and LXC. A container uses a bunch of Linux[2] namespaces[3] to give the illusion of a separate machine. The namespace types include cgroup, IPC, network, mount, PID, time, user, and UTS. So if, for instance, a process has a separate PID namespace, then the process IDs within it may not show the entire system's PID table, may map to other "real" PIDs, etc. Likewise, with a distinct mount namespace, it may have different filesystems mounted.

=> 2: /linux/ | 3: https://en.wikipedia.org/wiki/Linux_namespaces

The trick I'm going to use here is this: you don't have to use all of these as separate namespaces. You can just use a couple, and achieve some nice separation without having a fully-isolated container! And it can be done entirely without root permissions.

A first demonstration

For this demonstration, I'm going to use gocryptfs[4]. It is an encrypted filesystem in FUSE[5], which means no root is necessary. You could use anything, though, from a traditional filesystem to other FUSEs, or even bind mounts.

=> 4: https://nuetzlich.net/gocryptfs/ | 5: https://en.wikipedia.org/wiki/Filesystem_in_Userspace

I should note, however, that the in-kernel keyring (used by fscrypt and e4crypt) is not separated out by namespaces, so you can't just unlock a certain tree with e4crypt and expect it to be only unlocked in one namespace.

First, we're going to enter a different namespace. The unshare command will create a separate user namespace (-U, necessary for the mount namespace), a separate mount namespace with -m, and populate the user namespace with our current user with -c. Since I don't give it an explicit command to run, it will run a shell. Here we go:

$ echo $$
873411
$ unshare --keep-caps -Umc
$ echo $$
887896

So you can see we're in a different PID, at least. Now let's set up gocryptfs:

$ mkdir crypt plain
$ gocryptfs -init crypt
Choose a password for protecting your files.
Password:
Repeat:
...
The gocryptfs filesystem has been created successfully.
You can now mount it using: gocryptfs crypt MOUNTPOINT

OK. We've made two directories, crypt which holds the encrypted data, and plain which holds the plaintext (decrypted) view. We also initialized crypt. Now let's mount it -- remember, we're still in the new namespace:

$ gocryptfs crypt plain
Password:
Decrypting master key
Filesystem mounted and ready.

OK! Now how about creating a file in plain:

$ echo Testing > plain/test

Now, we can see that there's an encrypted file representing it in crypt:

$ ls -l crypt
total 6
-rw-r--r-- 1 jgoerzen jgoerzen  58 Dec 10 06:24 C1kX7S1Lq423tp7QVwdNfA
-r-------- 1 jgoerzen jgoerzen 385 Dec 10 06:24 gocryptfs.conf
-r--r----- 1 jgoerzen jgoerzen  16 Dec 10 06:24 gocryptfs.diriv

And in plain, we have the file:

$ ls -l plain
total 1
-rw-r--r-- 1 jgoerzen jgoerzen 8 Dec 10 06:24 test
$ cat plain/test
Testing

Now, keep this terminal open. Open another one (but not by starting it from this shell). From the other terminal, you can see:

$ ls -l plain
total 0

Yes! The plain directory was completely empty here, because it was mounted only in the other namespace!

Now, back in the namespace, let's clean up:

$ fusermount -u plain
$ exit

It's important to unmount plain before exiting the namespace. If you don't, you can't directly umount it from the parent namespace. You would have to either kill the gocryptfs process.

A simple script

Let's create a script called nsrun to make this easier.

#!/bin/bash

# Pass the command to run in the namespace,
# and any parameters, on the command-line.

if [ -z "$1" ]; then
   echo "Syntax: $0 command [args]"
   exit 5
fi

gocryptfs crypt plain || exit "$?"

"$@"

RETVAL="$?"

fusermount -u plain

exit "$RETVAL"

Now, run it:

$ chmod a+x nsrun
$ unshare --keep-caps -Umc ./nsrun ls -l plain
Password:
Decrypting master key
Filesystem mounted and ready.
total 1
-rw-r--r-- 1 jgoerzen jgoerzen 8 Dec 10 06:24 test

Excellent! And our script make sure to unmount the plaintext view before exiting. So now, I could type unshare --keep-caps -Umc ./nsrun okular plain/taxes.pdf or something to view a file that's otherwise unavailable - and it will be only available to the okular process started this way (and any of its child processes)! No other process on the system can see it.

Simultaneous access

What if we want to run multiple programs to have access to the data? Note that most filesystems, including gocryptfs, don't really like to have the same data mounted multiple times. There are a couple of options.

We could run something like unshare --keep-caps -Umc ./nsrun bash and launch them all from that shell.

Or, we can simultaneously enter the same namespace multiple times.

#!/bin/bash

# Pass the command to run in the namespace,
# and any parameters, on the command-line.

if [ -z "$1" ]; then
   echo "Syntax: $0 command [args]"
   exit 5
fi

IDENTIFIER="BLOGDEMO"

until TARGETPID=`pgrep -u "$(id -u)" -n -f "^/usr/bin/gocryptfs.* -fsname $IDENTIFIER "`; do
    echo "$IDENTIFIER not mounted; mounting."
    unshare --keep-caps -Umc /usr/bin/gocryptfs -fsname "$IDENTIFIER" crypt plain
done

echo "Entering namespace at PID $TARGETPID"

# gocryptfs likes to see at least one read before it permits writes, so do that here.
nsenter --preserve-credentials -U -m -t "$TARGETPID" ls "$(pwd)/plain" > /dev/null
exec nsenter --preserve-credentials -U -m -t "$TARGETPID" /usr/bin/env "--chdir=$(pwd)" "$@"

So this is working a bit differently. It's going to first mount the filesystem in its own namespace, then just let it hang there.

Then, we figure out the PID of the gocryptfs command, using a (presumably-unique) identifier to differentiate it from other potential gocryptfs instances. Now, by using nsenter, we can launch a new command in the namespace we created earlier, which is the only way we can access the files.

In this case, we keep reusing the existing mount until we're done with it. Note that it will be necessary to kill the gocryptfs process in the end when we're done, since nothing here is going to unmount it.

Watch how it works:

$ ./nsrunenter bash
BLOGDEMO not mounted; mounting.
Password:
Decrypting master key
Filesystem mounted and ready.
Entering namespace at PID 919929
$ cat plain/test
Testing
$ exit
exit
$ cat plain/test
cat: plain/test: No such file or directory
$ ./nsrunenter bash
Entering namespace at PID 919929
$ cat plain/test
Testing
$ exit
exit
$ cat plain/test
cat: plain/test: No such file or directory

So here, the first time we called our new script, it mounted the gocryptfs filesystem, and then ran bash inside the namespace we created for it. After exiting from that namespace, of course we couldn't see our test file.

The second time we called the script, it detected the existing namespace and joined it. Again, the command worked.

A word on security

You might be thinking, "well, if I can just nsenter the namespace, what good is this?" One of the principles of Computer Security[6] is defense in depth[7]; that is, multiple lines of defenses.

=> 6: /computer-security/ | 7: https://en.wikipedia.org/wiki/Defense_in_depth_(computing)

The premise of this whole post is to add protections in case malicious code is executed in your account. That is, one of your lines of defenses has already failed. Here's what we're adding:

Protection of data at rest via encryption. Or, if the underlying filesystem was already encrypted, a second key is introduced such that an attacker would have to know both to decrypt the data at rest.

Having the relevant data only mounted at times when it's needed.

Keeping it invisible from other processes, unless those processes specifically know about the scheme in use and which process to nsenter.

You could bolster this further by running the unshare and nsenter under sudo, so that the local user wouldn't be able to enter the namespace without authenticating. This has some tradeoffs (greater complexity for sure), and raises the bar towards an attacker having to fool the user into authenticating to sudo.

So, while this approach isn't absolutely perfect, it is another line in the other defenses you should already have.

More things you can do

You can set up the namespaces with a different user (though note that just sudo unshare will expose all of root's files - probably not what you want! Do this carefully!)
You can have a graphical password prompt, for instance with gocryptfs -extpass ssh-askpass

Links to this note

Interesting Topics[8]

=> 8: /interesting-topics/

Here are some (potentially) interesting topics you can find here: