# ParseMail


I have a project that I first started building over 10 years ago and it occurred
to me recently that I have never blogged about it. It is a website called
[ParseMail](https://www.parsemail.org). I have been running email systems
personally, and at different points during my career, for going on 25 years now.
I built ParseMail, because I wanted to be able to view email content
differently. Email clients hide a lot of information, and viewing emails in mail
queues using tools like `cat` and `vim` and `less` isn't that useful when you're
staring at a blob of
[quoted-printable](https://wikipedia.org/wiki/Quoted-printable) encoded HTML.

Before I describe what ParseMail does, one of the things I'm proud about is how
easy it is to run a local copy. You don't need to trust my website with your
data at all. Just run `docker run --rm -p 8000:8000 grepular/parsemail` and
you'll find a full local copy of the website running (minus language
translation) at [http://127.0.0.1:8000](http://127.0.0.1:8000). It is released
under the
[GPL-3.0](https://gitlab.com/grepular/parsemail/-/raw/master/COPYING.txt) and
you'll find the source at
[https://gitlab.com/grepular/parsemail](https://gitlab.com/grepular/parsemail).
If you don't trust [my docker image
build](https://hub.docker.com/r/grepular/parsemail), then clone the repo, read
the source code (it's a simple [Python](https://www.python.org/)
[Django](https://www.djangoproject.com/) app), and then run `docker build -t
parsemail ./` If you want to enable language translation of email content,
you'll just need to pass in an environment variable specifying where it should
store downloaded language models inside the container. E.g: `-e
TRANSLATE_MODELS_DIR=/tmp/translation-models`. It will download and cache
different models from [Argos Translate](https://www.argosopentech.com) on
demand, for entirely local language translation.

So why does ParseMail exist and what does it do? Typically, when viewing an
email, you'll either be doing it in an email client, which shows you the
rendered `text/html` part or the `text/plain` part, along with a small sample of
useful headers like "Subject" and "From". Or, you'll be viewing the raw source
code from a mail queue or file on disk. With ParseMail, you paste the full raw
source of an email into a text area on the front page, and then it shows you the
following:

1. A list of all of the IP addresses mentioned in the email headers and bodies,
   along with the country that [MaxMind's GeoIP Database](https://www.maxmind.com/en/geoip-databases)
   thinks that IP is located in

2. A list of all of the hostnames mentioned in the email headers and bodies

3. A list of all of the email addresses mentioned in the email headers and
   bodies

4. A list of all of the URL's mentioned in the email headers and bodies

5. A tree representing the MIME structure of the email

6. The main email headers, and the headers of each MIME part displayed with each
   identified IP, Hostname, Email and URL highlighted, along with country flags
   for the IPs inline.

7. Each MIME part body displayed in a suitable default fashion. Attached images
   displayed as images, HTML parts rendered in a real browser, and then
   converted to PDF, which clickable links, and options to view the raw parts as
   text, rendered HTML as a PNG, etc. Images that are directly attached to an
   email, given a Content-ID and referenced from the HTML part via CID URL's,
   are correctly inlined in the rendered PDF and PNGs.

8. Language detection, and a button to translate text and html parts from one
   language to another. It will even generate a new PDF or PNG of the translated
   HTML part for you. A lot of Spam tends to be in a foreign language, and it's
   nice to be able to translate it to your local language at the click of
   a button.

There is no database behind ParseMail. The data is stored in flat files, and are
deleted on a schedule as defined by the person uploading the content. At the
point you paste in an email you get to choose how long it should be kept for
before being deleted, and also whether or not remote content should be fetched
when rendering HTML email. There are no cookies, no tracking, JavaScript is
optional and only used to add a few basic UI effects. You'll see a pretty well
locked down
[Content-Security-Policy](https://developer.mozilla.org/docs/Web/HTTP/Guides/CSP),
which I was able to do because there are no cross origin requests. No third
parties involved. The only outgoing connections from the application are:

1. A request to fetch the Maxmind Geo IP db at startup, if you haven't mounted
   one into the image.

2. A request to fetch the [Public Suffix List](https://publicsuffix.org/list/effective_tld_names.dat)
   at startup, so we know what TLD's exist when doing our parsing.

3. Requests to download language models from [Argos Translate](https://www.argosopentech.com)
   as and when they are needed to translate emails (if you have turned on that
   feature). Translation is done entirely local. Nobody will see your
   email content.

So yeah. Please feel free to use the service. Feel free to run your own copy of
it and use that instead. Feel free to request new features, or submit PR's to
[my Gitlab project](https://gitlab.com/grepular/parsemail).

If you do find ParseMail useful, please click the "Thumbs Up" icon at the bottom
of this article, or [send me a quick message](/#contact). I don't think I've ever
got any feedback for it, so it would be nice to find out if people are finding
it useful.