ParseMail
View Markdown Other ArticlesArticle written by a human: Mike Cardwell
I have a project that I first started building over 10 years ago and it occurred
to me recently that I have never blogged about it. It is a website called
ParseMail. I have been running email systems
personally, and at different points during my career, for going on 25 years now.
I built ParseMail, because I wanted to be able to view email content
differently. Email clients hide a lot of information, and viewing emails in mail
queues using tools like cat and vim and less isn't that useful when you're
staring at a blob of
quoted-printable encoded HTML.
Before I describe what ParseMail does, one of the things I'm proud about is how
easy it is to run a local copy. You don't need to trust my website with your
data at all. Just run docker run --rm -p 8000:8000 grepular/parsemail and
you'll find a full local copy of the website running (minus language
translation) at http://127.0.0.1:8000. It is released
under the
GPL-3.0 and
you'll find the source at
https://gitlab.com/grepular/parsemail.
If you don't trust my docker image
build, then clone the repo, read
the source code (it's a simple Python
Django app), and then run docker build -t parsemail ./ If you want to enable language translation of email content,
you'll just need to pass in an environment variable specifying where it should
store downloaded language models inside the container. E.g: -e TRANSLATE_MODELS_DIR=/tmp/translation-models. It will download and cache
different models from Argos Translate on
demand, for entirely local language translation.
So why does ParseMail exist and what does it do? Typically, when viewing an
email, you'll either be doing it in an email client, which shows you the
rendered text/html part or the text/plain part, along with a small sample of
useful headers like "Subject" and "From". Or, you'll be viewing the raw source
code from a mail queue or file on disk. With ParseMail, you paste the full raw
source of an email into a text area on the front page, and then it shows you the
following:
-
A list of all of the IP addresses mentioned in the email headers and bodies, along with the country that MaxMind's GeoIP Database thinks that IP is located in
-
A list of all of the hostnames mentioned in the email headers and bodies
-
A list of all of the email addresses mentioned in the email headers and bodies
-
A list of all of the URL's mentioned in the email headers and bodies
-
A tree representing the MIME structure of the email
-
The main email headers, and the headers of each MIME part displayed with each identified IP, Hostname, Email and URL highlighted, along with country flags for the IPs inline.
-
Each MIME part body displayed in a suitable default fashion. Attached images displayed as images, HTML parts rendered in a real browser, and then converted to PDF, which clickable links, and options to view the raw parts as text, rendered HTML as a PNG, etc. Images that are directly attached to an email, given a Content-ID and referenced from the HTML part via CID URL's, are correctly inlined in the rendered PDF and PNGs.
-
Language detection, and a button to translate text and html parts from one language to another. It will even generate a new PDF or PNG of the translated HTML part for you. A lot of Spam tends to be in a foreign language, and it's nice to be able to translate it to your local language at the click of a button.
There is no database behind ParseMail. The data is stored in flat files, and are deleted on a schedule as defined by the person uploading the content. At the point you paste in an email you get to choose how long it should be kept for before being deleted, and also whether or not remote content should be fetched when rendering HTML email. There are no cookies, no tracking, JavaScript is optional and only used to add a few basic UI effects. You'll see a pretty well locked down Content-Security-Policy, which I was able to do because there are no cross origin requests. No third parties involved. The only outgoing connections from the application are:
-
A request to fetch the Maxmind Geo IP db at startup, if you haven't mounted one into the image.
-
A request to fetch the Public Suffix List at startup, so we know what TLD's exist when doing our parsing.
-
Requests to download language models from Argos Translate as and when they are needed to translate emails (if you have turned on that feature). Translation is done entirely local. Nobody will see your email content.
So yeah. Please feel free to use the service. Feel free to run your own copy of it and use that instead. Feel free to request new features, or submit PR's to my Gitlab project.
If you do find ParseMail useful, please click the "Thumbs Up" icon at the bottom of this article, or send me a quick message. I don't think I've ever got any feedback for it, so it would be nice to find out if people are finding it useful.