Generating calendar events from emails

I am not the most innately organised person, so I am a fan of using a (digital) calendar to keep track of my personal life: when I am traveling, when I am meeting up with friends or going to the cinema, etc. It helps ensure I don't miss appointments or double-book myself.

But if you have a lot of bookings--for example, if you are planning a vacation and booking multiple flights, trains and hotels--it can be tedious to manually create calendar events for each booking with the correct times, location and other details. The problem is compounded when you are dealing with different timezones. Some service providers give you the option to add bookings to your calendar, but often they don't, and even when they do, it doesn't always work properly (for example, British Airways give you the option to add flights to a calendar, but in my experience they don't get the timezones correct).

For a long time, I was looking for an open source solution that would accomplish this, and considered implementing something myself. I started by writing a Python script that would parse a Ryanair flight confirmation email. It was surprisingly easy, as Ryanair's HTML confirmation emails contain tags with information conforming to schema.org's FlightReservation schema (and a number of other related schemas). However, some other airlines' confirmation emails aren't so easily parsed.

Kitinerary

Eventually I found out that a number of KDE developers have been working on a solution to this problem, in the form of software called KDE Itinerary. KDE Itinerary is a "digital travel assistant" that aims to handle many aspects of travel, including navigation and boarding pass management. One of its key features is extracting data about itinerary items from various formats, including emails and PDFs. The data extraction engine is maintained separately as a C++ library called KItinerary, which can be integrated in third party software or used from the command line via an executable called kitinerary-extractor which is available as a flatpak. Support for over 250 different service providers (airlines, train companies, booking websites, etc) is included in KItinerary and it is relatively easy to add new parsing functionality with a bit of JavaScript.

=> https://invent.kde.org/pim/kitinerary

KItinerary is a powerful piece of software, though I don't love its KDE dependencies and I would prefer something that can be run natively on non-KDE systems. Installing kitinerary-extractor via flatpak is easy but takes up a lot of disk space if you don't already have KDE dependencies installed (apparently about 1.6 GB including all dependencies on my system). That is a lot, but disk space is cheap these days and if you have it to spare then kitinerary-extractor gives you an easy way to leverage the excellent parsing and post-processing work done by the KItinerary devs.

kitinerary-extractor reads in an email file and, by default, outputs parsed data as JSON-LD. The data structure conforms to the schema.org ontology. If you provide the -o iCal argument, instead of JSON-LD, it outputs parsed data as an iCalendar file.

=> https://json-ld.org/ | https://schema.org/docs/schemas.html | https://en.wikipedia.org/wiki/ICalendar

Building a workflow

KItinerary apparently integrates with KMail as well as Nextcloud Mail, so if you use either of those applications for email, you can use it quite easily. Personally I don't, so I set out to roll my own solution. Ideally, what I want is something that periodically checks my inbox for new emails and, if it finds emails that contain information about a booking or event, adds the booking or event to my calendar. Over the course of a long weekend I hacked something together that uses mbsync to fetch emails, KItinerary to parse them into iCalendar files and Python to do some post-processing and deliver the iCalender files via email.

Fetching email with mbsync

mbsync is a tried and trusted tool for syncing two mailboxes, and can be used to download emails from a remote IMAP mailbox to local storage. Confusingly, the project is called isync but the executable itself is called mbsync. The isync package is available in the repos of most major Linux distributions.

Below is a rough outline of a configuration file (usually stored at ~/.mbsyncrc) that can be used to fetch new emails from an IMAP mailbox.

# Define a local mailbox, in the Maildir format
MaildirStore mailbox-local
Path ~/Mail/
Inbox ~/Mail/

# Define a remote IMAP mailbox
IMAPStore mailbox-remote
Host 
User 
PassCmd 

# Fetch new messages for processing
Channel mailbox-fetch
Master :mailbox-remote:
Slave :mailbox-action:
Sync Pull New ReNew

For more information on how mbsync configuration works, consult its man page. A couple of points to note:

You then simply run mbsync like so (the argument corresponds to the name of the channel we defined in the config file):

mbsync mailbox-fetch

This will download new messages from your IMAP server and store them under ~/Mail. Three new subdirectories will be created: cur/, new/ and tmp/. new/ stores messages marked as unread and cur/ stores messages marked as read.

mbsync will keep track of what it has downloaded, so it will not download the same email multiple times, even if the local version is deleted.

=> https://isync.sourceforge.io/ | https://en.wikipedia.org/wiki/Maildir | https://www.passwordstore.org/

Extracting data with KItinerary

Now that you have downloaded your emails, you can feed them to kitinerary-extractor to extract the details of the events (if any) they describe.

If you have downloaded kitinerary-extractor via flatpak, then the correct command to run it is:

flatpak run org.kde.kitinerary-extractor 

That looks a bit ugly so let's wrap it in a simple shell script which we will call

!/bin/sh

exec flatpak run org.kde.kitinerary-extractor "$@"

The calling it is simply a matter of:

kitinerary-extractor

kitinerary-extractor takes an optional `--output` argument, which can be "JsonLd" (the default) or "iCal".  Specifying iCal output will cause kitinerary-extractor to print out an iCalendar (.ics) file with information about the event described in the email:

kitinerary-extractor -o iCal my_email_file.eml

The output is rather lengthy so I won't reproduce it here, but I suggest you experiment on some emails of your own.

### Post-processing

If you call the above command on an email that doesn't contain any information that kitinerary-extractor knows how to extract, it will output an empty iCalendar file (ie, one with a root VCALENDAR object but without any VEVENT objects). There are some types of email that kitinerary-extractor will extract *some* information from, but which do not correspond to calendar events. For example, it seems to do this on emails from eBay or LinkedIn. In these cases, it will (rather unhelpfully) output an iCalendar file which contains an event (VEVENT) object, but no start or end time.

Therefore, if you are calling kitinerary-extractor on every email you receive, you will need to check the resulting iCalendar file to ensure that it contains at least one VEVENT object that has start time (DTSTART) and end time (DTEND) properties.

We can do this using Python and the popular `icalendar` library:

from icalendar import Calendar

def has_real_event(cal: Calendar) -> bool:

for evt in cal.walk("VEVENT"):

    if ("DTSTART" in evt) and ("DTEND" in evt):

        return True

return False

If you are sending the event by email, the recipient email address should be listed as an attendee. Otherwise, when (for example) I click to accept the invitation in Thunderbird, I get a dialog telling me I'm not on the guest list. It still lets me add it to my calendar, but it's annoying.

def add_attendee(cal: Calendar, email_addr: str) -> Calendar:

"""Add `email_addr` as an attendee to each event in `cal`. Modifies `cal`

in-place.

"""

for evt in cal.walk("VEVENT"):

    a = vCalAddress(f"MAILTO:{email_addr}")

    a.params["ROLE"] = vText("REQ-PARTICIPANT")

    evt.add("attendee", a, encode=0)

return cal

kitinerary-extractor outputs one iCalendar file per email that it parses. If you are processing multiple emails, it may be more convenient to get one iCalendar file with multiple events rather then multiple files. You can merge a number of VCALENDAR objects into a single VCALENDAR, like so:

def merge_calendars(cals: Collection[Calendar]) -> Calendar:

"""Merge a number of calendars into one, which has the timezone and event

info from all calendars.

"""

# Keep track of the timezones we've already added

added_tzids = set()

new_cal = Calendar()

for c in cals:

    # Add timezone definitions to new calendar (avoiding duplication)

    for tz in c.walk("VTIMEZONE"):

        tzid = tz["TZID"]

        if tzid not in added_tzids:

            new_cal.add_component(tz)

            added_tzids.add(tzid)

    # Add events to new calendar

    for evt in c.walk("VEVENT"):

        new_cal.add_component(evt)

return new_cal

To tie this all together, we use Python's `mailbox` module (part of the standard library) to iterate through the new emails we fetched with mbsync, process them one by one and merge the resulting calendars into a single calendar:

import subprocess

from datetime import datetime

from mailbox import Maildir

from email.message import Message

from typing import Optional

from icalendar import Calendar, Event, vCalAddress, vText

CMD = ["/usr/bin/flatpak", "run", "org.kde.kitinerary-extractor", "-o", "iCal"]

def process_email(email: Message) -> Optional[Calendar]:

"""Process `email`, determining whether it contains a relevant event and

adding it to `main_cal` if so.

output = subprocess.run(CMD, input=email.as_bytes(), capture_output=True)

if output.returncode:

    # kitinerary-extractor returned an error

    return

cal = Calendar.from_ical(output.stdout.decode())

if has_real_event(cal):

    return cal

def process_mailbox(

    mb: Maildir,

    email_addr: Optional[str] = None

) -> Optional[tuple[Calendar, list[str]]]:

"""Process each email in `mailbox`, returning a calendar containing all

parsed events (or None if no events were found). Also return a list of

details of emails that had events. `email_addr` will be added as an

attendee to each event.

"""

cals = []

emails = []

for msg in mb:

    c = process_email(msg)

    if c is not None:

        cals.append(c)

        emails.append(" ".join((

            msg.get("Date"),

            msg.get("From"),

            msg.get("Subject")

        )))

if cals:

    c = merge_calendars(cals)

    if email_addr:

        add_attendee(c, email_addr)

    return c, emails

The above function takes, as its first argument, a `mailbox.Maildir` object representing a Maildir directory. In the example mbsync configuration we looked at above, the Maildir directory is `~/Mail/`. You can initialise the object to pass to `process_mailbox` like so:

from mailbox import Maildir

mb = Maildir("~/Mail")

As well as returning a Calendar object containing the relevant events, the `process_mailbox` function returns a list of strings containing some basic information about the emails that were found to contain information about events.

Neither the above function nor mbsync will automatically remove fetched emails once you are finished with them, so you should do this manually to avoid repeatedly parsing the same emails every time. 

=> https://icalendar.readthedocs.io
=> https://docs.python.org/3/library/mailbox.html

### Delivery

Now that you have an iCalendar file, you need a way to actually get it into your calendar. If your calendar supports the CalDAV protocol, you may be able to do this directly using a CalDAV client. Here, we will just use Python to send the calendar as an email attachment via SMTP.

import smtplib

from email.message import Message, EmailMessage

from typing import Iterable

def email_calendar(

    to: str,

    subject: str,

    event_details: Iterable[str],

    cal: Calendar,

    sender: str,

    passwd: str,

    smtp_server: str,

    smtp_port: int = 587,

):

"""Send an email with attachment."""

msg = EmailMessage()

msg["Subject"] = subject

msg["From"] = sender

msg["To"] = to

msg.set_content('\n'.join(event_details))

msg.add_attachment(

    cal.to_ical(),

    maintype="text",

    subtype="calendar",

    filename="events.ics"

)

with smtplib.SMTP(smtp_server, smtp_port) as server:

    server.starttls()

    server.login(sender, passwd)

    server.send_message(msg)

Proxy Information
Original URL
gemini://gemini.bunburya.eu/gemlog/posts/2023-08-05-calendar-events-from-email.gmi
Status Code
Success (20)
Meta
text/gemini; lang=en-IE
Capsule Response Time
132.033556 milliseconds
Gemini-to-HTML Time
3.238599 milliseconds

This content has been proxied by September (ba2dc).