Using SCGI to serve dynamic content over the Gemini protocol

Introduction

In this post I want to explain how you can serve dynamic content to Geminispace using the Simple Common Gateway Interface (SCGI) protocol. I'll start by giving a brief explanation of what SCGI is, then we'll talk about how to set up your Gemini server to work with SCGI applications. Finally, we'll look at how you can actually write an SCGI application in Python.

For the purposes of this post I'll be assuming some basic familiarity with the Gemini protocol, Python 3 and the Linux operating system, but not too much (indeed, my own knowledge of these concepts doesn't run all that deep).

Much of the experience I am sharing here I gained while writing Remini, a Gemini proxy for browsing Reddit. When a client requests a page from Remini (using the Gemini protocol), it fetches the corresponding page from Reddit (over HTTP), converts the content to gemtext and displays it to the user. That's just to give you an idea of the kind of thing you can do using SCGI.

What is dynamic content?

Most pages in Geminispace are just text files, sitting in a directory somewhere. When you make a request from a Gemini server which specifies a path that corresponds to the text file's location, the server will send the contents of the text file to your client. That's what we call static content, because it doesn't really change from one request to the next (unless of course the underlying text file changes).

Dynamic content, on the other hand, is not simply stored in a text file and displayed faithfully to the user; rather, it is generated by a computer program based on the content of the user's request and various other inputs or state that the program can access. For example, the program might search a database or query a web service and serve content based on what it finds. The program can also do other things in response to the user's request, such as store new information in a database.

Done well, this allows a programmer to create all sorts of interesting and useful services. Done poorly, it can result in a sluggish, unstable and generally frustrating user experience, and can even expose the server and the system it runs on to security risks.

On the web, dynamic content is ubiquitous. Social media, comments sections on websites, browser-based games, etc, are all obvious examples of dynamic content. Even web pages that seem like they should be static, such as blogs or news articles for example, are often dynamically serving you ads and other content based on what they can figure out about you (and trying to find out more about you, so they can target you even better in the future).

Geminispace is, of course, supposed to be a refuge from that kind of thing, and I think it's fair to say that humble static content is expected to occupy a much greater proportion of Geminispace than it does of the web. Still, the fact that most of the main Gemini servers in development support some form of dynamic content suggests that it does have some role to play in the development of Geminispace. Responsible dynamic content is nothing to be afraid of.

What is SCGI?

Before we talk about SCGI, we should talk a little about the Common Gateway Interface (CGI). You may have heard of CGI, or if you have been browsing the web or Geminispace you may at least have come across URLs with "cgi-bin" in them. Simply put, the way CGI works is as follows:

The CGI protocol defines how the server and the CGI application talk to each other. Most CGI-enabled servers follow a convention that files within a specific directory (usually, but not necessarily, "/cgi-bin/") will be treated as CGI applications and not static files, so a request for a file in that directory will trigger the process I set out above.

CGI is a well-known and widely supported standard. In the earlier days of the web, it was how dynamic content was commonly served there, though it has now largely been replaced by more "sophisticated" solutions. Anecdotally, it seems to be the main way that dynamic content is served in Geminispace, though I don't have any hard evidence for that.

However, CGI is not without its problems. The main one is that a new CGI application must be initialised from scratch for every request. If you're getting a lot of requests, your server is running on low-power hardware (such as a Raspberry Pi) and/or your application takes a while to start up, this can use up a lot of your system's resources and lead to long response times. It can be a particular problem if your application is written in an interpreted language like Python, because a new instance of the interpreter needs to be fired up for every request which is quite inefficient.

Enter SCGI.

Briefly, the way SCGI works is:

The benefit is clear: the SCGI application only needs to be initialised once, and that process can then handle all requests. And by the time a request gets to the SCGI application, all the work involved in initialising the application has already been done, so it can respond much more quickly. The main drawback is that you (as the developer) now have to deal with this thing called a "socket", but that's not so bad, particularly if you're using a language like Python which provides a decent interface to them as part of its standard library.

Selecting and configuring your server

Not all servers support dynamic content; an example of a popular Gemini server that only serves static content is Agate. Of those that do support dynamic content, most support CGI, but not all support SCGI. So if you want to create an SCGI application, you will need to choose the right server to host it.

Fortunately, there are (at least) two relatively mature Gemini servers that support SCGI. Molly Brown, the one we'll be looking at, is written by Solderpunk, the same person who wrote the original Gemini protocol. GLV-1.12556, the first Gemini server ever written, also has support for SCGI.

Introducing Molly Brown

NOTE: I'll be assuming in this section that you will be running Molly Brown as a single user, from your own domain. If your capsule is on a shared hosting service such as tilde.club or rawtext.club, you should ask the admins if and how you can run an SCGI application on their server.

The README file for Molly Brown gives a pretty good introduction to how to install and set it up:

=> https://tildegit.org/solderpunk/molly-brown/src/branch/master/README.md

Installing and configuring it is pretty easy if you follow those instructions (the README calls it a "pretty clunky manual process", which I think makes it sound a lot worse than it is). One prerequisite is having Go installed. Depending on your operating system, you may be able to install Go easily using your package manager (search for "go" or "golang"). If not, follow these instructions:

=> https://golang.org/doc/install

This article won't go into the general set-up of Molly Brown. I suggest you read through the README and ensure that you can get Molly to serve simple static content before proceeding.

Configuring Molly

If you've read through the README, you've probably already seen the section about configuring Molly to serve dynamic content via SCGI. You'll need to decide on two things: where to put your socket file, and the base URL for your app.

On Linux systems, a Unix domain socket is a special type of file. The file can be placed anywhere (within reason); a common place to store socket files is "/var/run", but you can choose to place it somewhere else on your system if you like. The main consideration is that the file should be readable and writeable by the user that Molly will run as, and, if different, the user your SCGI application will run as. It should not be readable or writeable by anyone else. For the purposes of this example let's assume your socket file will be stored at "/var/run/scgi-example.sock" with the appropriate permissions.

Let's also assume that you have set up Molly to serve a Gemini capsule at "gemini://gemini.example.org" and that you want requests to your SCGI application to begin with "gemini://gemini.example.org/scgi-example/" (note the trailing slash). In that case, you just need to include the following in your molly.conf file:

[SCGIPaths]
"/scgi-example/" = "/var/run/scgi-example.sock"

With that configuration, the following URLs (for example) should be routed to your SCGI app:

Be aware that "gemini://gemini.example.org/scgi-example" (no trailing slash) will NOT be routed to your app, unless you redirect it.

Once you've updated your molly.conf and restarted the server, send a request to "gemini://gemini.example.org/scgi-example/" (of course, when trying out the examples in this post, you should replace "gemini.example.org" with your own domain.) You should get a "42" response with a message like "Error connecting to SCGI service!". (42 is the Gemini error code for a CGI or other dynamic content error.) Now you just need to write an SCGI app for it to connect to.

NOTE: For security purposes I suggest disabling the SCGI routing (by commenting out the above lines in your molly.conf and restarting your server again) unless you have an app ready to listen on the socket.

Writing a Python SCGI application

We need our application to be able to communicate with Molly, over the socket file, in a way that complies with the SCGI protocol. As is often the case with Python, there is a library to do the heavy lifting for us. Install the "scgi" library as follows:

pip3 install scgi

Now we can write some code. Our simple example looks like this:

#!/usr/bin/env python3

import os
import socket
from scgi.scgi_server import SCGIServer, SCGIHandler

SOCK = '/var/run/scgi-example.sock'
if os.path.exists(SOCK):
    os.remove(SOCK)

class ExampleHandler(SCGIHandler):
    
    def produce(self, env, bodysize, input, output):
        print('"produce" method of ExampleHandler called with the following environment:')                                                           
        print(env)
        output.write(b'20 text/gemini\r\n')
        output.write(b'Thanks for your request!\n')

s = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
s.bind(SOCK)

server = SCGIServer(handler_class=ExampleHandler)
server.serve_on_socket(s)

Let's run through what this code does.

Run this script from the command line so you can see its standard output. While the script is running, open your favourite Gemini browser and go to

gemini://gemini.example.org/scgi-example/test_path?foo=ham&bar=spam

If all goes well, you should receive the "Thanks" message in your browser. In the terminal where you're running your SCGI app, you should see something like the following:

{'SCRIPT_PATH': '/scgi-example/', 'REQUEST_METHOD': '', 'SERVER_PORT': '1965', 'SERVER_PROTOCOL': 'GEMINI', 'SERVER_SOFTWARE': 'MOLLY_BROWN', 'REMOTE_ADDR': '[the IP address the request came from]', 'QUERY_STRING': 'foo=ham&bar=spam', 'SERVER_NAME': 'gemini.example.org', 'SCGI': '1', 'CONTENT_LENGTH': '0', 'PATH_INFO': 'test_path'}

The PATH_INFO and QUERY_STRING values are particularly interesting, as they can allow you to figure out exactly what the client has requested.

If all DOESN'T go well, then you'll have to do some debugging I'm afraid. A common problem is that you don't have the required permissions to work with your socket file. If that is the problem, either fix the permissions on your file or choose a different location for it. If your Python script is crashing, check the traceback to try and diagnose the problem. If your Python script works fine but you're not getting the right message in your browser, check Molly's access and error logs.

There are a few points to note about serving SCGI requests in this way:

This was a very basic example, but I hope it gives you a sense of the many things you can achieve with SCGI and Python, and a good base of knowledge from which to build more complex applications.

Daemonising your application

You'll probably want to run your SCGI application in the background, and you'll probably want to restart it automatically if it stops for whatever reason. The best way to do this depends on your operating system. If you are using systemd, you can run it as a systemd service. The following is an example of a service file that you could save to "/etc/systemd/system/scgi-example.service":

[Unit]
Description=SCGI example application

[Service]
Type=simple
Restart=always
User=user_name
ExecStart=/path/to/python/script.py --with --any --arguments

[Install]
WantedBy=multi-user.target

You can then start the process as follows (from your terminal):

systemctl daemon-reload
systemctl enable scgi-example.service
systemctl start scgi-example.service

Now your application will be automatically started on boot and will be restarted if it stops. As you won't be able to monitor your script directly, be sure your script keeps good logs that you can consult if things don't work as intended.

Be responsible

Remember that when you run an SCGI application like this, you are executing code on your machine based on input you cannot predict, from a user you do not know and cannot trust. This increases your "attack surface", giving potential attackers another way to try and compromise your system. System security is way beyond the scope of this post, but remember that it is up to you to keep your system secure from attackers. A few things to keep in mind:

Finally, while your app should not be trusting, it should be trust-worthy. On the web, dynamic content is often used in ways that could be considered anti-user. For example, it can be used to track users unnecessarily, or to serve bloated, distracting or even malicious content. The Gemini protocol is designed to minimise the potential for such harmful applications, but let's not push the envelope. Just because your app can access the client's IP address, it doesn't mean you have to use it!

Further reading

Thank you for reading! I hope you found this post helpful. Below are some links which will give you more information about some of the concepts we discussed.

=> Remini, my Reddit proxy service that uses SCGI | The Gemini documentation, including the protocol specification, FAQs and more

Wikipedia articles:

=> Common Gateway Interface | Simple Common Gateway Interface | Unix domain sockets

Gemini servers:

=> Agate (static content only) | Molly Brown | GLV-1.12556

SCGI:

=> Specification | Python library

System configuration:

=> Systemd | Permissions

Using SCGI to serve dynamic content over the Gemini protocol was published on 2021-04-07

=> Return to index

Proxy Information
Original URL
gemini://gemini.bunburya.eu/gemlog/posts/2021-04-07-dynamic-content-scgi-gemini.gmi
Status Code
Success (20)
Meta
text/gemini; lang=en-IE
Capsule Response Time
144.137237 milliseconds
Gemini-to-HTML Time
4.898191 milliseconds

This content has been proxied by September (3851b).