Thoughts on Privacy Exploits in Gemini

2022-12-28

For the purpose of this post, I'm going to ignore the possible ways companies could add their own "extensions" to the Gemini protocol to get around privacy and surveillance limitations. I'll only look at ways existing standards and practices could be abused.

Yesterday Sean responded^ to a older post by Ainent about a potential security issues in Gemini. Specifically, big tech could use marketing and PR campaigns to push their own browsers, browsers that auto-generate client certificates and attach them to every request the browser makes. Sean pointed out that such a practice might be more correctly considered a privacy issue, and that there are other, simpler ways to track users, such as analyzing IP logs and page requests.

Sean also mentioned that such tracking would be limited to one server. I began to wonder what an underhanded company or companies might do to break that limitation.

One thought that jumped out to me immediately was a proprietary database of client certificates. Imagine a Gemini browser that auto-generated a certificate for each user, then automatically uploaded that certificate to a company-controlled database. That company could then sell access to other analytics companies, partnering with B2C sites to query the database anytime a user connects to their servers with a given certificate. Of course, this assumes every page on the servers require a certificate, but this could be touted via PR as a convenience for consumers ("You can stay logged in all the time! No need to remember passwords or check e-mails for login links!").

One wouldn't even need a special client to do this. Suppose the database company offered a discount for access, in exchange for B2C servers uploading to the database every client certificate they encountered. An end Gemini use would then connect to a server with their cert, and suddenly, their unique fingerprint is logged in a tracking database.

As was mentioned by both Ainent and Sean, not requiring a certificate on every page effectively nullifies most of this approach. They also suggest an active countermeasure: send an undocumented "63" return code when users access a page with a cert and the page doesn't require one. This approach can work, but given how today's clients handle certificates, I think it would create a lot of headaches and even break services.

As I discovered with my own todo.txt service, clients often handle certificates in two different ways--if the cert is enabled at the root it is used site-wide, but if the cert is enabled at a certain document, it is only enabled for that document. Enforcing a 63 return code on all pages that do not require a certificate would cripple this current paradigm: site-wide enabling would break any page that doesn't require certs, and single-page enabling would break any CGI that automatically redirects to cert-required pages.

My recommendation to fight cert-based tracking would be to use ephemeral certificates for everything. Kristall allows for the creation of transient session certificates, valid for only a few minutes or hours. A browser or plugin could generate a new cert for every request: even if a commercial site required a certificate to access every page, they would see a different user every time a page was requested. This would severely hamper their efforts to track by certificate.

=> ^ It's not a "security hole," it's a "privacy hole" and I don't think it's anything to worry about

A big debate in the Gemini specification focused on how clients handle a response from a server. The question was simple: should a client wait for the server to close the connection before handling a response, or should it be allowed to handle the response while still receiving data? Allowing for preemptive handling enables "streaming" in the protocol--a famous example of this is the Gemini Chat^, utilizing an input loop and a long-running TCP connection to chat with other Gemini users in real time. This effectively creates two-way communication between a client and a server. Preemptive response handling was originally ambiguous in the spec, but it is now explicitly allowed.

It seems to me that this paradigm can easily be abused. We must assume that companies and monied interests would create their own Gemini browsers that do some dirty tricks behind the scenes, and the most basic of such tricks would be to establish other connections to the server without informing the user.

Imagine a browser that, upon connecting to a site, opens two other connections to the same site: one an input loop to an ephemeral feed page, and a second connection that constantly reads from that feed page. The client can send and receive information from the server in real time while browsing the site, happily passing along everything from a browser fingerprint to a list of pages visited to what client certificates were enabled for each page. This would easily open up all the tracking we currently see on the Web, and perhaps even more.

Two-way communication also allows for multiplexing several requests into a single connection. A server could send a response through the ephemeral feed, the browser could parse that response and send a request into the input loop, and the server can send a second response, and so on. This could enable the browser to communicate with multiple other servers through one connection, if the ephemeral feed were located on a CDN-type edge network that proxied responses from multiple locations.

Insidious companies could even use this to restrict access to a certain subset of browsers. If a certificate is required to access a page, the server could check if that certificate has been logged in a currently-running ephemeral feed, and if not, send an error saying that the user's browser is incompatible with the site.

I don't know how much effort it would take to build a backend that does such checks in real time, but the possibility is there. And as long as the possibility is there, it isn't crazy to assume that one day, someone will abuse it.

The easiest solution to this threat would be to disallow preemptive response handling in the spec. Even though it is currently allowed, I'm not aware of many services that use it, so I don't think it would have much impact in the current community. Some possibilities with the protocol would be removed that way, but if user privacy is a paramount goal, the sacrifice might be worth it down the road.

=> ^ Gemini Chat

=> Up One Level | Home

[Last updated: 2024-10-06]

Proxy Information

Original URL: gemini://jsreed5.org/log/2022/202212/20221228-thoughts-on-privacy-exploits-in-gemini.gmi
Status Code: Success (20)
Meta: text/gemini
Capsule Response Time: 494.919489 milliseconds
Gemini-to-HTML Time: 0.619966 milliseconds

This content has been proxied by September (ba2dc).