PHP library for parsing plain text email content.
EmailReplyParser is a PHP library for parsing plain text email content,
based on GitHub’s email_reply_parser
library written in Ruby.
The recommended way to install EmailReplyParser is through
Composer:
composer require willdurand/email-reply-parser
Instantiate an EmailParser
object and parse your email:
<?php
use EmailReplyParser\Parser\EmailParser;
$email = (new EmailParser())->parse($emailContent);
You get an Email
object that contains a set of Fragment
objects. The Email
class exposes two methods:
getFragments()
: returns all fragments;getVisibleText()
: returns a string which represents the content consideredThe Fragment
represents a part of the full email content, and has the
following API:
<?php
$fragment = current($email->getFragments());
$fragment->getContent();
$fragment->isSignature();
$fragment->isQuoted();
$fragment->isHidden();
$fragment->isEmpty();
Alternatively, you can rely on the EmailReplyParser
to either parse an email
or get its visible content in a single line of code:
$email = \EmailReplyParser\EmailReplyParser::read($emailContent);
$visibleText = \EmailReplyParser\EmailReplyParser::parseReply($emailContent);
Quoted headers aren’t picked up if there’s an extra line break:
On <date>, <author> wrote:
> blah
Also, they’re not picked up if the email client breaks it up into
multiple lines. GMail breaks up any lines over 80 characters for you.
On <date>, <author>
wrote:
> blah
The above On ....wrote:
can be cleaned up with the following regex:
$fragment_without_date_author = preg_replace(
'/\nOn(.*?)wrote:(.*?)$/si',
"",
$fragment->getContent()
);
Note though that we’re search for “on” and “wrote”. Therefore, it won’t work
with other languages.
Possible solution: Remove “[email protected]” lines…
Lines starting with -
or _
sometimes mark the beginning of
signatures:
Hello
--
Rick
Not everyone follows this convention:
Hello
Mr Rick Olson
Galactic President Superstar Mc Awesomeville
GitHub
**********************DISCLAIMER***********************************
* Note: blah blah blah *
**********************DISCLAIMER***********************************
Apparently, prefixing lines with >
isn’t universal either:
Hello
--
Rick
________________________________________
From: Bob [[email protected]]
Sent: Monday, March 14, 2011 6:16 PM
To: Rick
Setup the test suite using Composer:
$ composer install
Run it using PHPUnit:
$ ./vendor/bin/simple-phpunit
See CONTRIBUTING file.
EmailReplyParser is released under the MIT License. See the bundled LICENSE
file for details.