Как получить текстовый контент из многостраничного письма?

#!/usr/bin/php -q <?php $savefile = "savehere.txt"; $sf = fopen($savefile, 'a') or die("can't open file"); ob_start(); // read from stdin $fd = fopen("php://stdin", "r"); $email = ""; while (!feof($fd)) { $email .= fread($fd, 1024); } fclose($fd); // handle email $lines = explode("\n", $email); // empty vars $from = ""; $subject = ""; $headers = ""; $message = ""; $splittingheaders = true; for ($i=0; $i < count($lines); $i++) { if ($splittingheaders) { // this is a header $headers .= $lines[$i]."\n"; // look out for special headers if (preg_match("/^Subject: (.*)/", $lines[$i], $matches)) { $subject = $matches[1]; } if (preg_match("/^From: (.*)/", $lines[$i], $matches)) { $from = $matches[1]; } if (preg_match("/^To: (.*)/", $lines[$i], $matches)) { $to = $matches[1]; } } else { // not a header, but message $message .= $lines[$i]."\n"; } if (trim($lines[$i])=="") { // empty line, header section has ended $splittingheaders = false; } } /*$headers is ONLY included in the result at the last section of my question here*/ fwrite($sf,"$message"); ob_end_clean(); fclose($sf); ?>

Это пример моей попытки. Проблема в том, что я слишком много в файле. Вот что записывается в файл: (Я просто послал кучу мусора, как вы можете видеть)

 From xxxxxxxxxxxxx Tue Sep 07 16:26:51 2010 Received: from xxxxxxxxxxxxxxx ([xxxxxxxxxxx]:3184 helo=xxxxxxxxxxx) by xxxxxxxxxxxxx with esmtpa (Exim 4.69) (envelope-from <xxxxxxxxxxxxxxxx>) id 1Ot4kj-000115-SP for xxxxxxxxxxxxxxxxxxx; Tue, 07 Sep 2010 16:26:50 -0400 Message-ID: <EE3B7E26298140BE8700D9AE77CB339D@xxxxxxxxxxx> From: "xxxxxxxxxxxxx" <xxxxxxxxxxxxxx> To: <xxxxxxxxxxxxxxxxxxxxx> Subject: stackoverflow is helping me Date: Tue, 7 Sep 2010 16:26:46 -0400 MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_NextPart_000_0169_01CB4EA9.773DF5E0" X-Priority: 3 X-MSMail-Priority: Normal Importance: Normal X-Mailer: Microsoft Windows Live Mail 14.0.8089.726 X-MIMEOLE: Produced By Microsoft MimeOLE V14.0.8089.726 This is a multi-part message in MIME format. ------=_NextPart_000_0169_01CB4EA9.773DF5E0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable 111 222 333 444 ------=_NextPart_000_0169_01CB4EA9.773DF5E0 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <HTML><HEAD> <META content=3Dtext/html;charset=3Diso-8859-1 = http-equiv=3DContent-Type> <META name=3DGENERATOR content=3D"MSHTML 8.00.6001.18939"></HEAD> <BODY style=3D"PADDING-LEFT: 10px; PADDING-RIGHT: 10px; PADDING-TOP: = 15px"=20 id=3DMailContainerBody leftMargin=3D0 topMargin=3D0 = CanvasTabStop=3D"true"=20 name=3D"Compose message area"> <DIV><FONT face=3DCalibri>111</FONT></DIV> <DIV><FONT face=3DCalibri>222</FONT></DIV> <DIV><FONT face=3DCalibri>333</FONT></DIV> <DIV><FONT face=3DCalibri>444</FONT></DIV></BODY></HTML> ------=_NextPart_000_0169_01CB4EA9.773DF5E0--

Я нашел это во время поиска, но понятия не имею, как реализовать или куда вставлять в мой код, или если он будет работать.

 preg_match("/boundary=\".*?\"/i", $headers, $boundary); $boundaryfulltext = $boundary[0]; if ($boundaryfulltext!="") { $find = array("/boundary=\"/i", "/\"/i"); $boundarytext = preg_replace($find, "", $boundaryfulltext); $splitmessage = explode("--" . $boundarytext, $message); $fullmessage = ltrim($splitmessage[1]); preg_match('/\n\n(.*)/is', $fullmessage, $splitmore); if (substr(ltrim($splitmore[0]), 0, 2)=="--") { $actualmessage = $splitmore[0]; } else { $actualmessage = ltrim($splitmore[0]); } } else { $actualmessage = ltrim($message); } $clean = array("/\n--.*/is", "/=3D\n.*/s"); $cleanmessage = trim(preg_replace($clean, "", $actualmessage));

Итак, как я могу получить только текстовую область электронной почты в свой файл или скрипт для работы с файлом?

Заранее спасибо. stackoverflow отлично!

Вам необходимо будет выполнить четыре шага, чтобы изолировать часть обычного текста вашего тела электронной почты:

1. Получите строку границы MIME

Мы можем использовать регулярное выражение для поиска ваших заголовков (предположим, что они находятся в отдельной переменной, $headers ):

 $matches = array(); preg_match('#Content-Type: multipart\/[^;]+;\s*boundary="([^"]+)"#i', $headers, $matches); list(, $boundary) = $matches;

Регулярное выражение будет искать заголовок Content-Type , содержащий граничную строку, а затем захватить его в первую группу захвата . Затем мы копируем эту группу захвата в переменную $boundary .

2. Разделите тело электронной почты на сегменты

Как только у нас будет граница, мы можем разделить тело на различные части (в вашем теле сообщения, тело будет предваряться -- каждый раз, когда оно появляется). Согласно спецификации MIME , все до первой границы следует игнорировать.

 $email_segments = explode('--' . $boundary, $message); array_shift($email_segments); // drop everything before the first boundary

Это оставит нас с массивом, содержащим все сегменты, со всем, до игнорирования первой границы.

3. Определите, какой сегмент является простым текстом.

Сегмент, являющийся открытым текстом, будет иметь заголовок Content-Type с text/plain MIME-типом. Теперь мы можем искать каждый сегмент для первого сегмента с этим заголовком:

 foreach ($email_segments as $segment) { if (stristr($segment, "Content-Type: text/plain") !== false) { // We found the segment we're looking for! } }

Поскольку то, что мы ищем, является константой, мы можем использовать stristr (который вместо обычного выражения находит первый экземпляр подстроки в строке, регистр без учета регистра). Если заголовок Content-Type найден, у нас есть наш сегмент.

4. Удалите все заголовки из сегмента

Теперь нам нужно удалить любые заголовки из найденного нами сегмента, так как нам нужен только фактический контент сообщения. Существует четыре заголовка MIME, которые могут отображаться здесь: Content-Type как мы видели ранее, Content-ID , Content-Disposition и Content-Transfer-Encoding . Заголовки завершаются \r\n поэтому мы можем использовать это для определения конца заголовков:

 $text = preg_replace('/Content-(Type|ID|Disposition|Transfer-Encoding):.*?\r\n/is', "", $segment);

Модификатор s в конце регулярного выражения делает точку совпадающей с любой новой строкой. .*? будет собирать как можно меньше символов (т. е. все до \r\n ); ? – это ленивый модификатор .* .

И после этого пункта, $text будет содержать ваш контент сообщения электронной почты.

Итак, все это вместе с вашим кодом:

 <?php // read from stdin $fd = fopen("php://stdin", "r"); $email = ""; while (!feof($fd)) { $email .= fread($fd, 1024); } fclose($fd); $matches = array(); preg_match('#Content-Type: multipart\/[^;]+;\s*boundary="([^"]+)"#i', $email, $matches); list(, $boundary) = $matches; $text = ""; if (isset($boundary) && !empty($boundary)) // did we find a boundary? { $email_segments = explode('--' . $boundary, $email); foreach ($email_segments as $segment) { if (stristr($segment, "Content-Type: text/plain") !== false) { $text = trim(preg_replace('/Content-(Type|ID|Disposition|Transfer-Encoding):.*?\r\n/is', "", $segment)); break; } } } // At this point, $text will either contain your plain text body, // or be an empty string if a plain text body couldn't be found. $savefile = "savehere.txt"; $sf = fopen($savefile, 'a') or die("can't open file"); fwrite($sf, $text); fclose($sf); ?>

Здесь есть один ответ:

Вам нужно только изменить эти 2 строки:

 require_once('/path/to/class/rfc822_addresses.php'); require_once('/path/to/class/mime_parser.php');