I am not the most innately organised person, so I am a fan of using a (digital) calendar to keep track of my personal life: when I am traveling, when I am meeting up with friends or going to the cinema, etc. It helps ensure I don't miss appointments or double-book myself.
But if you have a lot of bookings--for example, if you are planning a vacation and booking multiple flights, trains and hotels--it can be tedious to manually create calendar events for each booking with the correct times, location and other details. The problem is compounded when you are dealing with different timezones. Some service providers give you the option to add bookings to your calendar, but often they don't, and even when they do, it doesn't always work properly (for example, British Airways give you the option to add flights to a calendar, but in my experience they don't get the timezones correct).
For a long time, I was looking for an open source solution that would accomplish this, and considered implementing something myself. I started by writing a Python script that would parse a Ryanair flight confirmation email. It was surprisingly easy, as Ryanair's HTML confirmation emails contain tags with information conforming to schema.org's FlightReservation schema (and a number of other related schemas). However, some other airlines' confirmation emails aren't so easily parsed.
Eventually I found out that a number of KDE developers have been working on a solution to this problem, in the form of software called KDE Itinerary. KDE Itinerary is a "digital travel assistant" that aims to handle many aspects of travel, including navigation and boarding pass management. One of its key features is extracting data about itinerary items from various formats, including emails and PDFs. The data extraction engine is maintained separately as a C++ library called KItinerary, which can be integrated in third party software or used from the command line via an executable called kitinerary-extractor
which is available as a flatpak. Support for over 250 different service providers (airlines, train companies, booking websites, etc) is included in KItinerary and it is relatively easy to add new parsing functionality with a bit of JavaScript.
=> https://invent.kde.org/pim/kitinerary
KItinerary is a powerful piece of software, though I don't love its KDE dependencies and I would prefer something that can be run natively on non-KDE systems. Installing kitinerary-extractor via flatpak is easy but takes up a lot of disk space if you don't already have KDE dependencies installed (apparently about 1.6 GB including all dependencies on my system). That is a lot, but disk space is cheap these days and if you have it to spare then kitinerary-extractor gives you an easy way to leverage the excellent parsing and post-processing work done by the KItinerary devs.
kitinerary-extractor reads in an email file and, by default, outputs parsed data as JSON-LD. The data structure conforms to the schema.org ontology. If you provide the -o iCal
argument, instead of JSON-LD, it outputs parsed data as an iCalendar file.
=> https://json-ld.org/ | https://schema.org/docs/schemas.html | https://en.wikipedia.org/wiki/ICalendar
KItinerary apparently integrates with KMail as well as Nextcloud Mail, so if you use either of those applications for email, you can use it quite easily. Personally I don't, so I set out to roll my own solution. Ideally, what I want is something that periodically checks my inbox for new emails and, if it finds emails that contain information about a booking or event, adds the booking or event to my calendar. Over the course of a long weekend I hacked something together that uses mbsync to fetch emails, KItinerary to parse them into iCalendar files and Python to do some post-processing and deliver the iCalender files via email.
mbsync is a tried and trusted tool for syncing two mailboxes, and can be used to download emails from a remote IMAP mailbox to local storage. Confusingly, the project is called isync but the executable itself is called mbsync. The isync package is available in the repos of most major Linux distributions.
Below is a rough outline of a configuration file (usually stored at ~/.mbsyncrc) that can be used to fetch new emails from an IMAP mailbox.
# Define a local mailbox, in the Maildir format MaildirStore mailbox-local Path ~/Mail/ Inbox ~/Mail/ # Define a remote IMAP mailbox IMAPStore mailbox-remote HostUser PassCmd # Fetch new messages for processing Channel mailbox-fetch Master :mailbox-remote: Slave :mailbox-action: Sync Pull New ReNew
For more information on how mbsync configuration works, consult its man page. A couple of points to note:
PassCmd
value should be a shell command that can be used to get the password to your IMAP server. A common choice is to use pass
, a popular command line password manager. Instead of PassCmd, you could include your password directly in the config file using the Pass
directive, but of course this has security implications. Alternatively, you can provide neither directive, and mbsync will prompt you for a password at runtime.
Sync Pull New ReNew
, tells mbsync to pull (download) new messages since it was last run, including messages that were found on a previous run but not downloaded for some reason. mbsync is capable of two-way sync, but here we just want to pull new messages from the IMAP server and not send anything the other way.
You then simply run mbsync like so (the argument corresponds to the name of the channel we defined in the config file):
mbsync mailbox-fetch
This will download new messages from your IMAP server and store them under ~/Mail. Three new subdirectories will be created: cur/
, new/
and tmp/
. new/
stores messages marked as unread and cur/
stores messages marked as read.
mbsync will keep track of what it has downloaded, so it will not download the same email multiple times, even if the local version is deleted.
=> https://isync.sourceforge.io/ | https://en.wikipedia.org/wiki/Maildir | https://www.passwordstore.org/
Now that you have downloaded your emails, you can feed them to kitinerary-extractor to extract the details of the events (if any) they describe.
If you have downloaded kitinerary-extractor via flatpak, then the correct command to run it is:
flatpak run org.kde.kitinerary-extractor
That looks a bit ugly so let's wrap it in a simple shell script which we will call
exec flatpak run org.kde.kitinerary-extractor "$@"
The calling it is simply a matter of:
kitinerary-extractor
kitinerary-extractor takes an optional `--output` argument, which can be "JsonLd" (the default) or "iCal". Specifying iCal output will cause kitinerary-extractor to print out an iCalendar (.ics) file with information about the event described in the email:
kitinerary-extractor -o iCal my_email_file.eml
The output is rather lengthy so I won't reproduce it here, but I suggest you experiment on some emails of your own. ### Post-processing If you call the above command on an email that doesn't contain any information that kitinerary-extractor knows how to extract, it will output an empty iCalendar file (ie, one with a root VCALENDAR object but without any VEVENT objects). There are some types of email that kitinerary-extractor will extract *some* information from, but which do not correspond to calendar events. For example, it seems to do this on emails from eBay or LinkedIn. In these cases, it will (rather unhelpfully) output an iCalendar file which contains an event (VEVENT) object, but no start or end time. Therefore, if you are calling kitinerary-extractor on every email you receive, you will need to check the resulting iCalendar file to ensure that it contains at least one VEVENT object that has start time (DTSTART) and end time (DTEND) properties. We can do this using Python and the popular `icalendar` library:
from icalendar import Calendar
def has_real_event(cal: Calendar) -> bool:
for evt in cal.walk("VEVENT"):
if ("DTSTART" in evt) and ("DTEND" in evt):
return True
return False
If you are sending the event by email, the recipient email address should be listed as an attendee. Otherwise, when (for example) I click to accept the invitation in Thunderbird, I get a dialog telling me I'm not on the guest list. It still lets me add it to my calendar, but it's annoying.
def add_attendee(cal: Calendar, email_addr: str) -> Calendar:
"""Add `email_addr` as an attendee to each event in `cal`. Modifies `cal`
in-place.
"""
for evt in cal.walk("VEVENT"):
a = vCalAddress(f"MAILTO:{email_addr}")
a.params["ROLE"] = vText("REQ-PARTICIPANT")
evt.add("attendee", a, encode=0)
return cal
kitinerary-extractor outputs one iCalendar file per email that it parses. If you are processing multiple emails, it may be more convenient to get one iCalendar file with multiple events rather then multiple files. You can merge a number of VCALENDAR objects into a single VCALENDAR, like so:
def merge_calendars(cals: Collection[Calendar]) -> Calendar:
"""Merge a number of calendars into one, which has the timezone and event
info from all calendars.
"""
# Keep track of the timezones we've already added
added_tzids = set()
new_cal = Calendar()
for c in cals:
# Add timezone definitions to new calendar (avoiding duplication)
for tz in c.walk("VTIMEZONE"):
tzid = tz["TZID"]
if tzid not in added_tzids:
new_cal.add_component(tz)
added_tzids.add(tzid)
# Add events to new calendar
for evt in c.walk("VEVENT"):
new_cal.add_component(evt)
return new_cal
To tie this all together, we use Python's `mailbox` module (part of the standard library) to iterate through the new emails we fetched with mbsync, process them one by one and merge the resulting calendars into a single calendar:
import subprocess
from datetime import datetime
from mailbox import Maildir
from email.message import Message
from typing import Optional
from icalendar import Calendar, Event, vCalAddress, vText
CMD = ["/usr/bin/flatpak", "run", "org.kde.kitinerary-extractor", "-o", "iCal"]
def process_email(email: Message) -> Optional[Calendar]:
"""Process `email`, determining whether it contains a relevant event and
adding it to `main_cal` if so.
output = subprocess.run(CMD, input=email.as_bytes(), capture_output=True)
if output.returncode:
# kitinerary-extractor returned an error
return
cal = Calendar.from_ical(output.stdout.decode())
if has_real_event(cal):
return cal
def process_mailbox(
mb: Maildir,
email_addr: Optional[str] = None
) -> Optional[tuple[Calendar, list[str]]]:
"""Process each email in `mailbox`, returning a calendar containing all
parsed events (or None if no events were found). Also return a list of
details of emails that had events. `email_addr` will be added as an
attendee to each event.
"""
cals = []
emails = []
for msg in mb:
c = process_email(msg)
if c is not None:
cals.append(c)
emails.append(" ".join((
msg.get("Date"),
msg.get("From"),
msg.get("Subject")
)))
if cals:
c = merge_calendars(cals)
if email_addr:
add_attendee(c, email_addr)
return c, emails
The above function takes, as its first argument, a `mailbox.Maildir` object representing a Maildir directory. In the example mbsync configuration we looked at above, the Maildir directory is `~/Mail/`. You can initialise the object to pass to `process_mailbox` like so:
from mailbox import Maildir
mb = Maildir("~/Mail")
As well as returning a Calendar object containing the relevant events, the `process_mailbox` function returns a list of strings containing some basic information about the emails that were found to contain information about events. Neither the above function nor mbsync will automatically remove fetched emails once you are finished with them, so you should do this manually to avoid repeatedly parsing the same emails every time. => https://icalendar.readthedocs.io => https://docs.python.org/3/library/mailbox.html ### Delivery Now that you have an iCalendar file, you need a way to actually get it into your calendar. If your calendar supports the CalDAV protocol, you may be able to do this directly using a CalDAV client. Here, we will just use Python to send the calendar as an email attachment via SMTP.
import smtplib
from email.message import Message, EmailMessage
from typing import Iterable
def email_calendar(
to: str,
subject: str,
event_details: Iterable[str],
cal: Calendar,
sender: str,
passwd: str,
smtp_server: str,
smtp_port: int = 587,
):
"""Send an email with attachment."""
msg = EmailMessage()
msg["Subject"] = subject
msg["From"] = sender
msg["To"] = to
msg.set_content('\n'.join(event_details))
msg.add_attachment(
cal.to_ical(),
maintype="text",
subtype="calendar",
filename="events.ics"
)
with smtplib.SMTP(smtp_server, smtp_port) as server:
server.starttls()
server.login(sender, passwd)
server.send_message(msg)
text/gemini; lang=en-IE
This content has been proxied by September (ba2dc).